Skip to main content

Varicent ELT Help Center

Smart Matcher

Abstract

The Smart Matcher is a smart fuzzy-matching tool that you can use to train a model to recognize matching data between two data sets.

The Smart Matcher is a smart fuzzy-matching tool that you can use to train a model to recognize matching data between two data sets.

Input and output

This tool takes two data sets. The top input is the imperfect data containing multiple rows that you want to match to a single row in the bottom data source. You can replace the top data set in later exports. The bottom input is the data set that contains unique IDs. This is the data used to match and join the two data sets.

In practical terms, the top input is "messy" data with incorrect or missing information in some columns. The user must pre-label the top input with IDs that match the reference bottom table. Then, you must add an Export node and build it, changing the top input to messy data to get predictions on the new data. The build results are a good indicator of how well the model has learned what matches. The bottom input is the data set that serves as an "answer key."

When configuring the tool, the Target column is used to match the "messy" data to the "answer key" data. The Matching columns help train the tool to correctly match the rows.

This tool joins the two data sets using a many-to-one method. Each row from the top input is mapped to, at most, one row from the bottom input. The output also adds a Probability column that shows the likelihood of the match being correct.

When to use this tool

Use this tool when you want to train a model to fuzzy-match data sets. The tool gets better at matching as you add more data and build the pipe.

Usage example

Let's say you have two data sets. The first contains transaction data of customer purchases. This information is entered manually, so there are some misspellings or missing information in some rows. The second data set contains customer or account information. This data set is the "answer key," where each row represents a unique customer ID with no missing or incorrect information.

Without cleaning up the transaction data set, Varicent ELT would treat a misspelled name and a correctly-spelled name as 2 different customers. We want to use the Smart Matcher to fuzzy-match those rows to the same customer ID. When you run the tool, it fuzzy-matches each row in the top data set to, at most, one row in the bottom data set.

Tip

Create a Smart Matcher blueprint from the  Blueprints  module for a more detailed explanation using a practical example.

Configuration

Use the following configuration options to configure the Smart Matcher tool.

Configuring the Smart Matcher tool
  1. In your pipe, add your data sources.

  2. Click symon_add_icon.png + Tool.

  3. In the Tools modal search bar, type Smart Matcher. Click + Add Tool.

    Note

    You can also find the Smart Matcher tool in the Combine section.

  4. Connect the tool to your data sets.

  5. In the configuration pane, enter the following information:

    Note

    The Target columns and the Match columns must be different.

    Table 58. Smart Matcher tool configuration

    Field

    Description

    Target column

    Top column

    Select a column from your top data set.

    Bottom column

    Select a column from your bottom data set.

    Match 1

    Top column to match

    Select a column from your top data set.

    Bottom column to match

    Select a column from your bottom data set.

    + New match column

    Click to add another match column.