Skip to main content

Varicent ELT Assistant

Fuzzy Matcher

Use the Fuzzy Matcher tool to recognize matching data between two data sets. Use this tool when you want to detect matching data between two data sets.

The Match Level slider specifies your desired degree of fuzziness. When Match Level is set to 100%, fuzzy matching is case-sensitive; otherwise, it isn't.

Note

A low number indicates a higher fuzziness, and in turn a broader scope for matching data. A higher number indicates a lower fuzziness, and in turn a more exact match when comparing data. Set the Match Level slider to 100 for an exact, case-sensitive match.

Input and output

This tool takes two data sets. You can select the data source to be used for matching, and this is the data Varicent ELT will use to match and join the two data sets. One input is the "messy" imperfect data containing multiple rows that you want to match to a single row in the other data source. The other input is the data set that contains unique IDs, and serves as an "answer key".

When configuring the tool, the Source column is used to match the "messy" data to the "answer key" data. The Match column helps train the tool to correctly match the rows.

This tool joins the two data sets using a many-to-one method. Each row from the first input is mapped to, at most, one row from the second input. The output also adds a Similarity Measure column that shows the likelihood of the match being correct.

Note

The order of the conditions does not affect the end result of your match.

Configuration

Use the following configuration options to configure the Fuzzy Matcher tool.

  1. Go to the Pipes module from the side navigation bar.

  2. From the Pipes tab, click an existing pipe to open, or create a new pipe. To create a new pipe, read the Creating a pipe documentation.

  3. In the Pipe builder, add a data source to your pipe. For more information on adding a data source, see the Data Input tool.Data Input

  4. Click symon_add_icon.png + Tool.

    The Tools modal opens, where you can add tools, such as the  Aggregate  tool, to your pipe.

  5. In the Tools modal, search for Fuzzy Matcher and then click + Add tool.

    Tip

    You can also find Fuzzy Matcher in the Combine section.

  6. Click the tool node and drag the line to the next tool to connect the tools. If you need to undo the action, click the line and then click Unlink.

  7. In the configuration pane, under the Target section, under Fuzzy matcher source, select the source to compare against to determine exact matches.

  8. Under Fuzzy match type, select the type to match.

  9. Under Similarity measure column name, enter the name for the column, such as Similarity measure.

  10. In the Condition 1 section, under Source column, select the column to compare against.

  11. Under Match column, select the column to match.

  12. Under Match level, enter the percentage to match, or use the arrows to increase or decrease the percentage.

  13. Optionally, click + New Condition to add another condition to match against.

  14. Click on the tool name to rename your tool node to a meaningful name. Name your tools in a way that describes the function, not the object or the data action. For example, use “Look up rate” instead of “Join to rate table”.

Usage example

Let's say you have two data sets. The first contains transaction data of customer purchases. This information is entered manually, so there are some misspellings or missing information in some rows. The second data set contains customer or account information. This data set is the "answer key," where each row represents a unique customer ID with no missing or incorrect information.

Without cleaning up the transaction data set, Varicent ELT would treat a misspelled name and a correctly-spelled name as 2 different customers. We want to use the Fuzzy Matcher to fuzzy-match those rows to the same customer ID. When you run the tool, it fuzzy-matches each row in one data set to a maximum of one row in the other data set.

Tip

For a more detailed explanation using a practical example, create a Fuzzy Matcher Example blueprint from the Apps tab.