你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
Azure AI 搜索提供相关性优化策略,用于改进经典 RAG 解决方案中搜索结果的相关性。 相关性优化是交付符合用户期望的 RAG 解决方案的一个重要因素。
注意
现在建议对 RAG 工作流进行 代理检索 ,但经典 RAG 更简单。 如果它满足应用程序要求,它仍然是一个不错的选择。
在 Azure AI 搜索中,相关性优化包括 L2 语义排名和计分配置文件。 若要实现这些功能,请重新访问索引架构,以添加语义排名和评分配置文件的配置。 然后使用新结构来重新运行查询。
在本教程中,你将修改现有搜索索引和查询以使用:
- L2 语义排名
- 用于文档加权的评分配置文件
本教程更新了索引管道创建的搜索索引。 更新不会影响现有内容,因此无需重新生成,也无需重新运行索引器。
Prerequisites
- Azure AI 搜索、基本层或更高版本,用于托管标识和语义排名。 
- 部署了 text-embedding-3-small 和 gpt-4o 的 Azure OpenAI。 
下载示例
示例笔记本包含更新的索引和查询请求。
运行基线查询进行比较
让我们从一个新的查询开始,“是否存在特定于海洋和大型水体的云层?”。
要比较添加相关性功能后的结果,请在添加语义排名或计分配置文件之前,根据现有索引架构运行查询。
对于 Azure 政府云,请将令牌提供程序上的 API 终结点修改为 "https://cognitiveservices.azure.us/.default"。
from azure.search.documents import SearchClient
from openai import AzureOpenAI
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )
deployment_name = "gpt-4o"
search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""
# Focused query on cloud formations and bodies of water
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")
search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])
response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)
print(response.choices[0].message.content)
此请求的输出结果可能与下面的示例相似。
Yes, there are cloud formations specific to oceans and large bodies of water. 
A notable example is "cloud streets," which are parallel rows of clouds that form over 
the Bering Strait in the Arctic Ocean. These cloud streets occur when wind blows from 
a cold surface like sea ice over warmer, moister air near the open ocean, leading to 
the formation of spinning air cylinders. Clouds form along the upward cycle of these cylinders, 
while skies remain clear along the downward cycle (Source: page-21.pdf).
更新语义排名和计分配置文件的索引
在之前的教程中,你为 RAG 工作负荷设计了一个索引架构。 我们特意省略了该架构中的相关性增强功能,以便你可以专注于基础知识。 将相关性推迟到另一项练习中进行,这样就能在更新后对搜索结果的质量进行前后比较。
- 更新 import 语句以包含语义排名和评分配置文件的类。 - from azure.identity import DefaultAzureCredential from azure.identity import get_bearer_token_provider from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import ( SearchField, SearchFieldDataType, VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile, AzureOpenAIVectorizer, AzureOpenAIVectorizerParameters, SearchIndex, SemanticConfiguration, SemanticPrioritizedFields, SemanticField, SemanticSearch, ScoringProfile, TagScoringFunction, TagScoringParameters )
- 在搜索索引中添加以下语义配置。 此示例位于笔记本的更新架构步骤中。 - # New semantic configuration semantic_config = SemanticConfiguration( name="my-semantic-config", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="title"), keywords_fields=[SemanticField(field_name="locations")], content_fields=[SemanticField(field_name="chunk")] ) ) # Create the semantic settings with the configuration semantic_search = SemanticSearch(configurations=[semantic_config])- 语义配置有一个名称和一个优先字段列表,有助于优化语义排序器的输入。 有关详细信息,请参阅配置语义排名。 
- 接下来,添加计分概要文件定义。 与语义配置一样,计分概要文件可随时添加到索引架构中。 此示例也在笔记本的更新架构步骤中,紧随语义配置之后。 - # New scoring profile scoring_profiles = [ ScoringProfile( name="my-scoring-profile", functions=[ TagScoringFunction( field_name="locations", boost=5.0, parameters=TagScoringParameters( tags_parameter="tags", ), ) ] ) ]- 此配置文件使用标记功能,可提高在位置字段中找到匹配项的文档的分数。 回想一下,搜索索引有一个矢量字段和多个非矢量字段,分别代表标题、块和位置。 位置字段是一个字符串集合,而字符串集合可以使用计分概要文件中的标记函数进行增强。 有关详细信息,请参阅添加计分概要文件和通过文档提升增强搜索相关性(博客文章)。 
- 更新搜索服务上的索引定义。 - # Update the search index with the semantic configuration index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles) result = index_client.create_or_update_index(index) print(f"{result.name} updated")
更新语义排名和计分配置文件的查询
在之前的教程中,你在搜索引擎上执行了运行查询,并将响应和其他信息传递给 LLM 以完成聊天。
此示例修改了查询请求,以包含语义配置和计分概要文件。
对于 Azure 政府云,请将令牌提供程序上的 API 终结点修改为 "https://cognitiveservices.azure.us/.default"。
# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )
deployment_name = "gpt-4o"
search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )
# Prompt is unchanged in this update
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""
# Queries are unchanged in this update
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")
# Add query_type semantic and semantic_configuration_name
# Add scoring_profile and scoring_parameters
search_results = search_client.search(
    query_type="semantic",
    semantic_configuration_name="my-semantic-config",
    scoring_profile="my-scoring-profile",
    scoring_parameters=["tags-ocean, 'sea surface', seas, surface"],
    search_text=query,
    vector_queries= [vector_query],
    select="title, chunk, locations",
    top=5,
)
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])
response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)
print(response.choices[0].message.content)
语义排名和增强查询的输出结果可能与下面的示例相似。
Yes, there are specific cloud formations influenced by oceans and large bodies of water:
- **Stratus Clouds Over Icebergs**: Low stratus clouds can frame holes over icebergs, 
such as Iceberg A-56 in the South Atlantic Ocean, likely due to thermal instability caused 
by the iceberg (source: page-39.pdf).
- **Undular Bores**: These are wave structures in the atmosphere created by the collision 
of cool, dry air from a continent with warm, moist air over the ocean, as seen off the 
coast of Mauritania (source: page-23.pdf).
- **Ship Tracks**: These are narrow clouds formed by water vapor condensing around tiny 
particles from ship exhaust. They are observed over the oceans, such as in the Pacific Ocean 
off the coast of California (source: page-31.pdf).
These specific formations are influenced by unique interactions between atmospheric conditions 
and the presence of large water bodies or objects within them.
添加语义排名和计分概要文件可促进符合评分标准和语义相关的结果,从而对 LLM 的响应产生积极影响。
现在,你已经对索引和查询设计有了更好的了解,让我们继续优化速度和简洁性。 我们会重新审视架构定义,以实现量化和存储优化,而其余的管道和模型保持不变。