Smart Matcher

Varicent ELT Help Center

Smart Matcher

Abstract

The Smart Matcher is a smart fuzzy-matching tool that you can use to train a model to recognize matching data between two data sets.

Input and output

This tool takes two data sets. The top input is the imperfect data containing multiple rows that you want to match to a single row in the bottom data source. You can replace the top data set in later exports. The bottom input is the data set that contains unique IDs. This is the data used to match and join the two data sets.

In practical terms, the top input is "messy" data with incorrect or missing information in some columns. The user must pre-label the top input with IDs that match the reference bottom table. Then, you must add an Export node and build it, changing the top input to messy data to get predictions on the new data. The build results are a good indicator of how well the model has learned what matches. The bottom input is the data set that serves as an "answer key."

When configuring the tool, the Target column is used to match the "messy" data to the "answer key" data. The Matching columns help train the tool to correctly match the rows.

This tool joins the two data sets using a many-to-one method. Each row from the top input is mapped to, at most, one row from the bottom input. The output also adds a Probability column that shows the likelihood of the match being correct.

When to use this tool

Use this tool when you want to train a model to fuzzy-match data sets. The tool gets better at matching as you add more data and build the pipe.

Usage example

Let's say you have two data sets. The first contains transaction data of customer purchases. This information is entered manually, so there are some misspellings or missing information in some rows. The second data set contains customer or account information. This data set is the "answer key," where each row represents a unique customer ID with no missing or incorrect information.

Without cleaning up the transaction data set, Varicent ELT would treat a misspelled name and a correctly-spelled name as 2 different customers. We want to use the Smart Matcher to fuzzy-match those rows to the same customer ID. When you run the tool, it fuzzy-matches each row in the top data set to, at most, one row in the bottom data set.

Tip

Create a Smart Matcher blueprint from the Blueprints module for a more detailed explanation using a practical example.

Configuration

Use the following configuration options to configure the Smart Matcher tool.

Configuring the Smart Matcher tool

Go to the Pipes module from the side navigation bar.
From the Pipes tab, click an existing pipe to open, or create a new pipe. To create a new pipe, read the Creating a pipe documentation.
In your Pipe builder, add your data sources.
Click + Tool.
In the Tools modal search bar, type Smart Matcher. Click + Add Tool.
Note
You can also find the Smart Matcher tool in the Combine section.
Connect the tool to your data sets.

In the configuration pane, enter the following information:

Note

The Target columns and the Match columns must be different.

Table 70. Smart Matcher tool configuration

Field	Description
Target column
Top column	Select a column from your top data set.
Bottom column	Select a column from your bottom data set.
Match 1
Top column to match	Select a column from your top data set.
Bottom column to match	Select a column from your bottom data set.
+ New match column	Click to add another match column.

Was this helpful?

Would you like to provide feedback? Just click here to suggest edits.

Varicent ELT Help Center

Smart Matcher

Input and output

When to use this tool

Usage example

Tip

Configuration

Note

Note

Search results