Skip to main content

Symon.AI help center

Clean Character

Abstract

Clean and convert any non-ASCII characters in your data set to ASCII compliant characters.

The Clean Character tool helps to transform any non-ASCII characters in your data set. This tool simplifies the process of cleaning up non-ASCII characters by replacing or removing the unsupported characters. This tool helps you prepare your data for downstream systems that don't support these special characters.

For example, the Clean Character tool would take áëîõü and transform it to aeiou or æ to ae.

Note

Symon.AI supports the ASCII character set only, not the Extended ASCII character set.

Any characters that Symon.AI cannot map will be transformed to [?].

If you want to remove these transformed characters, use the Replace tool.

When to use this tool

Use the Clean Character tool to select text columns to clean the non-ASCII characters. All non-text columns are disabled in the column drop-down.

The default is to replace the existing column. However, if you choose add new column, the new output column is appended with the suffix _cleaned. For example, the new column could be named column_cleaned.

Configuration

After you add your data source to your pipe, you can add and configure the Clean Character tool to clean your data set.

Configuring the Clean Character tool
  1. Add the Clean Character tool to your pipe.

  2. Connect the tool to your data set.

  3. Complete the required fields to configure the tool:

    • Columns: Select the desired columns to clean. Choose either all columns or individual columns.

    • Clean method: Select either Convert to transform to an ASCII equivalent or Remove to remove the character.

    • Output column(s): In the Advanced section, select to either Replace selected column(s) or to Add new column(s) in your data set.

  4. Add as many conditions to clean characters as desired.