Clean Character
Clean and convert any non-ASCII characters in your data set to ASCII compliant characters.
The Clean Character tool helps to transform any non-ASCII characters in your data set. This tool simplifies the process of cleaning up non-ASCII characters by replacing or removing the unsupported characters. This tool helps you prepare your data for downstream systems that don't support these special characters.
For example, the Clean Character tool would take áëîõü
and transform it to aeiou
or æ
to ae
.
Note
Symon.AI supports the ASCII character set only, not the Extended ASCII character set.
Any characters that Symon.AI cannot map will be transformed to [?]
.
If you want to remove these transformed characters, use the Replace tool.
When to use this tool
Use the Clean Character tool to select text columns to clean the non-ASCII characters. All non-text columns are disabled in the column drop-down.
The default is to replace the existing column. However, if you choose add new column, the new output column is appended with the suffix _cleaned
. For example, the new column could be named column_cleaned
.
Configuration
After you add your data source to your pipe, you can add and configure the Clean Character tool to clean your data set.
Add the Clean Character tool to your Pipe builder.
Connect the tool to your data set.
Complete the required fields to configure the tool:
Columns: Select the desired columns to clean. Choose either all columns or individual columns.
Clean method: Select either Convert to transform to an ASCII equivalent or Remove to remove the character.
Output column(s): In the Advanced section, select to either Replace selected column(s) or to Add new column(s) in your data set.
Add as many conditions to clean characters as desired.