Category encoder
Map unique text column values to sequential numbers.
Map unique text column values to sequential numbers.
When to use this tool
Use when you want to categorize values.
Configuration
Use the following configuration to use the Category encoder.
In your , add your data source.
Click + Add tool.
Click See all tools.
In the search bar, search for Category encoder. Click Add tool.
Tip
You can also find the Category encoder tool in the Clean section.
Connect the tool to your data set.
In the configuration pane, enter the following information:
Table 44. Category encoder tool configurationField
Description
Category column
Select the column to map the values from.
Alias
Enter the name of the column.
(Optional) Click + Selection to add a new selection.
Encoding 2 values to 1 number
There is a case where the Category encoder will encode two values to one number. When you build the pipe, the encoder learns the unique values in the column, and assigns a numeric value to each column. When you export the pipe, the encoder assigns the same numeric value for the values it already knows about. If there’s a value that the Category encoder did not learn in the build, it assigns it a particular value, such as max value + 1.
For example, if the Category encoder learns about A
, B
and C
in the build, then the Category encoder sees A
, D
and E
in the export. The encoder will assign A
to 0
, D
to 3
and E
to 3
.
A blank empty string or text value in the build and in the export gets assigned to that particular value. If in the build there’s A
, B
and blank
, the encoder assigns A
to 0
, B
to 1
, and blank to 2
. Then, in export, if there’s A
, blank
, D
and E
, it will assign A
to 0
, blank
to 2
, D
to 2
and E
to 2
.