Skip to main content

Varicent ELT Assistant

Types of pipes

There are two types of pipes to work with on your pipe builder canvas: a Big pipe or a Standard pipe.

Important

If you are interested in using the Big pipe, please contact your Varicent Customer Success Manager.

A Standard pipe is a fast, efficient pipe suitable for most scenarios. This is the recommended option.

A Big pipe is a robust pipe suitable for complex or high-volume data. Use this option when you have 10 million plus rows of data.

Note

You may require a Big pipe depending on your data volume and pipe performance. The Big pipe is built for performance; however, some tools are incompatible. The following tools are unavailable in the Big pipe:

Assignment optimization

Oversample

Combination matcher

Pipe

Confusion matrix

Repeat

Extract

Select word

Forecast

Smart matcher

Hierarchy validator

Text between

Lookup

Text classifier

Merge columns

Text grade level

Monte Carlo

Text sentiment

Most common

Trim tags

Outlier

Undersample

Hints and tips

Standard pipes may eventually fail at high volumes, but performance degradation can occur before a hard failure. High volumes can affect memory usage.

Certain factors affect memory usage. Memory is not shared across a pipe. Each tool operation incrementally increases memory as data flows through the pipe. The following factors can affect memory usage:

  • Column data type: Floats use more memory than integers.

  • String content: In-memory string size depends on each individual string's length.

  • Number of columns: The more columns in a data set, the more memory is used.

  • Column fill rate: How many populated (non-null) columns affect the memory used.

  • Tools: Some tools and operations are more memory-intensive.

    Note

    The Cross join option in the Join tool may create memory performance issues if the data source is too large. We limit the Cross join to a maximum of 100 columns and 100 million rows to avoid performance issues.

Trim the memory usage by using the Filter and Select tools in your pipe to reduce any unused columns in your dataset. For example, if you are using the Join tool, use the Filter tool before a Join to restrict data volume. Then use the Select tool to trim unused columns.