Skip to main content

Varicent ELT Help Center

Category encoder

Abstract

Map unique text column values to sequential numbers.

Map unique text column values to sequential numbers.

When to use this tool

Use when you want to categorize values.

Configuration

Use the following configuration to use the Category encoder.

Configuring Category encoder
  1. In your , add your data source.

  2. Click symon_add_icon.png + Add tool.

  3. Click See all tools.

  4. In the search bar, search for Category encoder. Click Add tool.

    Tip

    You can also find the Category encoder tool in the Clean section.

  5. Connect the tool to your data set.

  6. In the configuration pane, enter the following information:

    Table 44. Category encoder tool configuration

    Field

    Description

    Category column

    Select the column to map the values from.

    Alias

    Enter the name of the column.



  7. (Optional) Click + Selection to add a new selection.

Encoding 2 values to 1 number

There is a case where the Category encoder will encode two values to one number. When you build the pipe, the encoder learns the unique values in the column, and assigns a numeric value to each column. When you export the pipe, the encoder assigns the same numeric value for the values it already knows about. If there’s a value that the Category encoder did not learn in the build, it assigns it a particular value, such as max value + 1.

For example, if the Category encoder learns about A, B and C in the build, then the Category encoder sees A, D and E in the export. The encoder will assign A to 0, D to 3 and E to 3.

A blank empty string or text value in the build and in the export gets assigned to that particular value. If in the build there’s A, B and blank, the encoder assigns A to 0, B to 1, and blank to 2. Then, in export, if there’s A, blank, D and E, it will assign A to 0, blank to 2, D to 2 and E to 2.