Types of pipes
There are two types of pipes to work with on your pipe builder canvas: a Big pipe or a Standard pipe.
Important
If you are interested in using the Big pipe, please contact your Varicent Customer Success Manager.
A Standard pipe is a fast, efficient pipe suitable for most scenarios. This is the recommended option.
A Big pipe is a robust pipe suitable for complex or high-volume data. Use this option when you have 10 million plus rows of data.
Note
You may require a Big pipe depending on your data volume and pipe performance. The Big pipe is built for performance; however, some tools are incompatible. The following tools are unavailable in the Big pipe:
Assignment optimization | Oversample |
Combination matcher | Pipe |
Confusion matrix | Repeat |
Extract | Select word |
Forecast | Smart matcher |
Hierarchy validator | Text between |
Lookup | Text classifier |
Merge columns | Text grade level |
Monte Carlo | Text sentiment |
Most common | Trim tags |
Outlier | Undersample |
Hints and tips
Standard pipes may eventually fail at high volumes, but performance degradation can occur before a hard failure. High volumes can affect memory usage.
Certain factors affect memory usage. Memory is not shared across a pipe. Each tool operation incrementally increases memory as data flows through the pipe. The following factors can affect memory usage:
Column data type: Floats use more memory than integers.
String content: In-memory string size depends on each individual string's length.
Number of columns: The more columns in a data set, the more memory is used.
Column fill rate: How many populated (non-null) columns affect the memory used.
Tools: Some tools and operations are more memory-intensive.
Note
The Cross join option in the Join tool may create memory performance issues if the data source is too large. We limit the Cross join to a maximum of 100 columns and 100 million rows to avoid performance issues.
Trim the memory usage by using the Filter and Select tools in your pipe to reduce any unused columns in your dataset. For example, if you are using the Join tool, use the Filter tool before a Join to restrict data volume. Then use the Select tool to trim unused columns.