Undersample
Use the Undersample tool to address classification problems in your data or the imbalance in data labels, and reduce the majority classes to the minority classes.
Tip
You can only use it if you have a sufficient amount of data. Undersampling small data sets can cause you to lose useful data.
Configuration
Use the following configuration options to help configure the Undersample tool.
Go to the Pipes module from the side navigation bar.
From the Pipes tab, click an existing pipe to open, or create a new pipe. To create a new pipe, read the Creating a pipe documentation.
In the Pipe builder, add a data source to your pipe. For more information on adding a data source, see the Data Input tool.
Click
+ Tool.The Tools modal opens, where you can add tools, such as the Aggregate tool, to your pipe.
In the Tools modal search bar, type Undersample, and then click + Add Tool.
Tip
You can also find the Undersample tool in the Data section.
Click the tool node and drag the line to the next tool to connect the tools. If you need to undo the action, click the line and then click Unlink.
In the configuration pane, under Target column, select the target column to use as an undersample of data.
Click on the tool name to rename your tool node to a meaningful name. Name your tools in a way that describes the function, not the object or the data action. For example, use “Look up rate” instead of “Join to rate table”.