Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: 
 SQL Server 
 SSIS Integration Runtime in Azure Data Factory
To add and configure a Fuzzy Grouping transformation, the package must already include at least one Data Flow task and a source.
To implement Fuzzy Grouping transformation in a data flow
- In SQL Server Data Tools (SSDT), open the Integration Services project that contains the package you want. 
- In Solution Explorer, double-click the package to open it. 
- Click the Data Flow tab, and then, from the Toolbox, drag the Fuzzy Grouping transformation to the design surface. 
- Connect the Fuzzy Grouping transformation to the data flow by dragging the connector from the data source or a previous transformation to the Fuzzy Grouping transformation. 
- Double-click the Fuzzy Grouping transformation. 
- In the Fuzzy Grouping Transformation Editor dialog box, on the Connection Manager tab, select an OLE DB connection manager that connects to a SQL Server database. - Note - The transformation requires a connection to a SQL Server database to create temporary tables and indexes. 
- Click the Columns tab and, in the Available Input Columns list, select the check box of the input columns to use to identify similar rows in the dataset. 
- Select the check box in the Pass Through column to identify the input columns to pass through to the transformation output. Pass-through columns are not included in the process of identification of duplicate rows. - Note - Input columns that are used for grouping are automatically selected as pass-through columns, and they cannot be unselected while used for grouping. 
- Optionally, update the names of output columns in the Output Alias column. 
- Optionally, update the names of cleaned columns in the Group OutputAlias column. - Note - The default names of columns are the names of the input columns with a "_clean" suffix. 
- Optionally, update the type of match to use in the Match Type column. - Note - At least one column must use fuzzy matching. 
- Specify the minimum similarity level columns in the Minimum Similarity column. The value must be between 0 and 1. The closer the value is to 1, the more similar the values in the input columns must be to form a group. A minimum similarity of 1 indicates an exact match. 
- Optionally, update the names of similarity columns in the Similarity Output Alias column. 
- To specify the handling of numbers in data values, update the values in the Numerals column. 
- To specify how the transformation compares the string data in a column, modify the default selection of comparison options in the Comparison Flags column. 
- Click the Advanced tab to modify the names of the columns that the transformation adds to the output for the unique row identifier (_key_in), the duplicate row identifier (_key_out), and the similarity value (_score). 
- Optionally, adjust the similarity threshold by moving the slider bar. 
- Optionally, clear the token delimiter check boxes to ignore delimiters in the data. 
- Click OK. 
- To save the updated package, click Save Selected Items on the File menu. 
See Also
Fuzzy Grouping Transformation
Integration Services Transformations
Integration Services Paths
Data Flow Task