Oversample
Use the Oversample tool to generate more data or address classification problems in your data. To address the imbalance in data, supplement the data with multiple copies of some of the minority classes.
A new column is added to indicate that the Oversample tool generated the record.
Configuration
Use the following configuration options to help configure the Oversample tool.
Go to the Pipes module from the side navigation bar.
From the Pipes tab, click an existing pipe to open, or create a new pipe. To create a new pipe, read the Creating a pipe documentation.
In the Pipe builder, add a data source to your pipe. For more information on adding a data source, see the Data Input tool.
Click
+ Tool.The Tools modal opens, where you can add tools, such as the Aggregate tool, to your pipe.
In the Tools modal search bar, type Oversample, and then click + Add tool.
Tip
You can also find the Oversample tool in the Data section.
Click the tool node and drag the line to the next tool to connect the tools. If you need to undo the action, click the line and then click Unlink.
In the configuration pane, select the column to use as the target column.
Under Oversample column, enter the name to use for the oversample column.
Click on the tool name to rename your tool node to a meaningful name. Name your tools in a way that describes the function, not the object or the data action. For example, use “Look up rate” instead of “Join to rate table”.