ToolCallAccuracyEvaluator Class
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
The Tool Call Accuracy evaluator assesses how accurately an AI uses tools by examining:
Relevance to the conversation.
Parameter correctness according to tool definitions.
Parameter value extraction from the conversation.
The evaluator uses a scoring rubric of 1 to 5:
Score 1: The tool calls are irrelevant
Score 2: The tool calls are partially relevant, but not enough tools were called or the parameters were not correctly passed.
Score 3: The tool calls are relevant, but there were unnecessary, excessive tool calls made.
Score 4: The tool calls are relevant, but some tools returned errors and agent retried calling them again and succeeded.
Score 5: The tool calls are relevant, and all parameters were correctly passed.
This evaluation focuses on measuring whether tool calls meaningfully contribute to addressing user needs while properly following tool definitions and using information present in the conversation history.
Note
To align with our support of a diverse set of models, an output key without the gpt_ prefix has been added.
To maintain backwards compatibility, the old key with the gpt_ prefix is still be present in the output;
however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
Constructor
ToolCallAccuracyEvaluator(model_config, *, threshold=3, credential=None, **kwargs)
Parameters
| Name | Description |
|---|---|
|
model_config
Required
|
Configuration for the Azure OpenAI model. |
Keyword-Only Parameters
| Name | Description |
|---|---|
|
threshold
|
Default value: 3
|
|
credential
|
Default value: None
|
Examples
Initialize and call ToolCallAccuracyEvaluator using Azure AI Project URL in the following format https://{resource_name}.services.ai.azure.com/api/projects/{project_name}
import os
from azure.ai.evaluation import ToolCallAccuracyEvaluator
model_config = {
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"), # https://<account_name>.services.ai.azure.com
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}
tool_call_accuracy_evaluator = ToolCallAccuracyEvaluator(model_config=model_config)
tool_call_accuracy_evaluator(
query="How is the weather in New York?",
response="The weather in New York is sunny.",
tool_calls={
"type": "tool_call",
"tool_call": {
"id": "call_eYtq7fMyHxDWIgeG2s26h0lJ",
"type": "function",
"function": {"name": "fetch_weather", "arguments": {"location": "New York"}},
},
},
tool_definitions={
"id": "fetch_weather",
"name": "fetch_weather",
"description": "Fetches the weather information for the specified location.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
},
},
)
Attributes
id
Evaluator identifier, experimental and to be used only with evaluation in cloud.
id = 'azureai://built-in/evaluators/tool_call_accuracy'