ToolCallAccuracyEvaluator Class

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

The Tool Call Accuracy evaluator assesses how accurately an AI uses tools by examining:

Relevance to the conversation.
Parameter correctness according to tool definitions.
Parameter value extraction from the conversation.

The evaluator uses a scoring rubric of 1 to 5:

Score 1: The tool calls are irrelevant
Score 2: The tool calls are partially relevant, but not enough tools were called or the parameters were not correctly passed.
Score 3: The tool calls are relevant, but there were unnecessary, excessive tool calls made.
Score 4: The tool calls are relevant, but some tools returned errors and agent retried calling them again and succeeded.
Score 5: The tool calls are relevant, and all parameters were correctly passed.

This evaluation focuses on measuring whether tool calls meaningfully contribute to addressing user needs while properly following tool definitions and using information present in the conversation history.

Note

To align with our support of a diverse set of models, an output key without the gpt_ prefix has been added.

To maintain backwards compatibility, the old key with the gpt_ prefix is still be present in the output;

however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.

Constructor

ToolCallAccuracyEvaluator(model_config, *, threshold=3, credential=None, **kwargs)

Parameters

Name	Description
model_config Required	Union[AzureOpenAIModelConfiguration, OpenAIModelConfiguration] Configuration for the Azure OpenAI model.

Keyword-Only Parameters

Name	Description
threshold	Default value: 3
credential	Default value: None

Examples

Initialize and call ToolCallAccuracyEvaluator using Azure AI Project URL in the following format https://{resource_name}.services.ai.azure.com/api/projects/{project_name}


   import os
   from azure.ai.evaluation import ToolCallAccuracyEvaluator

   model_config = {
       "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),  # https://<account_name>.services.ai.azure.com
       "api_key": os.environ.get("AZURE_OPENAI_KEY"),
       "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
   }

   tool_call_accuracy_evaluator = ToolCallAccuracyEvaluator(model_config=model_config)
   tool_call_accuracy_evaluator(
       query="How is the weather in New York?",
       response="The weather in New York is sunny.",
       tool_calls={
           "type": "tool_call",
           "tool_call": {
               "id": "call_eYtq7fMyHxDWIgeG2s26h0lJ",
               "type": "function",
               "function": {"name": "fetch_weather", "arguments": {"location": "New York"}},
           },
       },
       tool_definitions={
           "id": "fetch_weather",
           "name": "fetch_weather",
           "description": "Fetches the weather information for the specified location.",
           "parameters": {
               "type": "object",
               "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
           },
       },
   )

Attributes

id

Evaluator identifier, experimental and to be used only with evaluation in cloud.

id = 'azureai://built-in/evaluators/tool_call_accuracy'

Feedback

Was this page helpful?