适用于:SQL Server 2025(17.x) 预览版
Azure SQL 数据库
Azure SQL 托管实例
Microsoft Fabric 预览版中的 SQL 数据库
使用模糊或近似字符串匹配来检查两个字符串是否相似,并计算两个字符串之间的差异。 使用此功能可识别因字符损坏而可能不同的字符串。 损坏原因可能包括拼写错误、转置字符、缺少字符或缩写。 模糊字符串匹配使用算法来检测声音相似的字符串。
Note
- 模糊字符串匹配目前处于试用阶段。
- SQL Server 在 SQL Server 2025(17.x)预览版中引入了对模糊字符串匹配的支持。
- 模糊字符串匹配在具有 SQL Server 2025 或 Always-up-to-date更新策略的 Azure SQL 托管实例中可用。
模糊函数
| Function | Description |
|---|---|
| EDIT_DISTANCE | 计算将一个字符串转换为另一个字符串所需的插入、删除、替换和调换数。 |
| EDIT_DISTANCE_SIMILARITY | 计算从 0(表示不匹配)到 100(表示完全匹配)的相似性值。 |
| JARO_WINKLER_DISTANCE | 计算两个字符串之间的编辑距离,优先考虑在设定的前缀长度内从头开始匹配的字符串。 |
| JARO_WINKLER_SIMILARITY | 计算从 0(表示不匹配)到 100(表示完全匹配)的相似性值。 |
Note
目前,这些函数不遵循排序规则设置定义的比较语义,例如不区分大小写和其他特定于排序规则的规则。 实现对排序规则的支持后,函数的输出将反映这些语义,并可能会相应地更改。
Examples
以下示例演示模糊字符串匹配函数。
示例表
在运行示例查询之前,请创建并填充示例表。
若要创建并填充示例表,请连接到非生产用户数据库,并运行以下脚本:
-- Step 1: Create the table
CREATE TABLE WordPairs (
WordID INT IDENTITY(1,1) PRIMARY KEY, -- Auto-incrementing ID
WordUK NVARCHAR(50), -- UK English word
WordUS NVARCHAR(50) -- US English word
);
-- Step 2: Insert the data
INSERT INTO WordPairs (WordUK, WordUS) VALUES
('Colour', 'Color'),
('Flavour', 'Flavor'),
('Centre', 'Center'),
('Theatre', 'Theater'),
('Organise', 'Organize'),
('Analyse', 'Analyze'),
('Catalogue', 'Catalog'),
('Programme', 'Program'),
('Metre', 'Meter'),
('Honour', 'Honor'),
('Neighbour', 'Neighbor'),
('Travelling', 'Traveling'),
('Grey', 'Gray'),
('Defence', 'Defense'),
('Practise', 'Practice'), -- Verb form in UK
('Practice', 'Practice'), -- Noun form in both
('Aluminium', 'Aluminum'),
('Cheque', 'Check'); -- Bank cheque vs. check
示例 EDIT_DISTANCE
SELECT WordUK, WordUS, EDIT_DISTANCE(WordUK, WordUS) AS Distance
FROM WordPairs
WHERE EDIT_DISTANCE(WordUK, WordUS) <= 2
ORDER BY Distance ASC;
Returns:
WordUK WordUS Distance
------------------------------ ------------------------------ -----------
Practice Practice 0
Aluminium Aluminum 1
Honour Honor 1
Neighbour Neighbor 1
Travelling Traveling 1
Grey Gray 1
Defence Defense 1
Practise Practice 1
Colour Color 1
Flavour Flavor 1
Organise Organize 1
Analyse Analyze 1
Catalogue Catalog 2
Programme Program 2
Metre Meter 2
Centre Center 2
Theatre Theater 2
示例 EDIT_DISTANCE_SIMILARITY
SELECT WordUK, WordUS, EDIT_DISTANCE_SIMILARITY(WordUK, WordUS) AS Similarity
FROM WordPairs
WHERE EDIT_DISTANCE_SIMILARITY(WordUK, WordUS) >=75
ORDER BY Similarity DESC;
Returns:
WordUK WordUS Similarity
------------------------------ ------------------------------ -----------
Practice Practice 100
Travelling Traveling 90
Aluminium Aluminum 89
Neighbour Neighbor 89
Organise Organize 88
Practise Practice 88
Defence Defense 86
Analyse Analyze 86
Flavour Flavor 86
Colour Color 83
Honour Honor 83
Catalogue Catalog 78
Programme Program 78
Grey Gray 75
示例 JARO_WINKLER_DISTANCE
SELECT WordUK, WordUS, JARO_WINKLER_DISTANCE(WordUK, WordUS) AS Distance
FROM WordPairs
WHERE JARO_WINKLER_DISTANCE(WordUK, WordUS) <= .05
ORDER BY Distance ASC;
Returns:
WordUK WordUS Distance
------------------------------ ------------------------------ -----------
Practice Practice 0
Travelling Traveling 0.02
Neighbour Neighbor 0.0222222222222223
Aluminium Aluminum 0.0222222222222223
Theatre Theater 0.0285714285714286
Flavour Flavor 0.0285714285714286
Centre Center 0.0333333333333333
Colour Color 0.0333333333333333
Honour Honor 0.0333333333333333
Catalogue Catalog 0.0444444444444444
Programme Program 0.0444444444444444
Metre Meter 0.0466666666666667
示例 JARO_WINKLER_SIMILARITY
SELECT WordUK, WordUS, JARO_WINKLER_SIMILARITY(WordUK, WordUS) AS Similarity
FROM WordPairs
WHERE JARO_WINKLER_SIMILARITY(WordUK, WordUS) > 90
ORDER BY Similarity DESC;
Returns:
WordUK WordUS Similarity
------------------------------ ------------------------------ -----------
Practice Practice 100
Aluminium Aluminum 98
Neighbour Neighbor 98
Travelling Traveling 98
Colour Color 97
Flavour Flavor 97
Centre Center 97
Theatre Theater 97
Honour Honor 97
Catalogue Catalog 96
Programme Program 96
Metre Meter 95
Organise Organize 95
Practise Practice 95
Analyse Analyze 94
Defence Defense 94
包含所有函数的示例查询
以下查询演示了当前可用的所有正则表达式函数。
SELECT T.source_string,
T.target_string,
EDIT_DISTANCE(T.source_string, T.target_string) as ED_Distance,
JARO_WINKLER_DISTANCE(T.source_string, T.target_string) as JW_Distance,
EDIT_DISTANCE_SIMILARITY(T.source_string, T.target_string) as ED_Similarity,
JARO_WINKLER_SIMILARITY(T.source_string, T.target_string) as JW_Similarity
FROM (VALUES('Black', 'Red'),
('Colour', 'Yellow'),
('Colour', 'Color'),
('Microsoft', 'Msft'),
('Regex', 'Regex')) as T(source_string, target_string);
Returns:
source_string target_string ED_Distance JW_Distance ED_Similarity JW_Similarity
-------------- -------------- -------------- --------------------- -------------- --------------
Black Red 5 1 0 0
Colour Yellow 5 0.444444444444445 17 55
Colour Color 1 0.0333333333333333 83 96
Microsoft Msft 5 0.491666666666667 44 50
Regex Regex 0 0 100 100
清理
使用示例数据后,请删除示例表。
IF OBJECT_ID('dbo.WordPairs', 'U') IS NOT NULL
BEGIN
DROP TABLE dbo.WordPairs;
END