Share via


AzureOpenAIPythonGrader Class

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Wrapper class for OpenAI's Python code graders.

Enables custom Python-based evaluation logic with flexible scoring and pass/fail thresholds. The grader executes user-provided Python code to evaluate outputs against custom criteria.

Supplying a PythonGrader to the evaluate method will cause an asynchronous request to evaluate the grader via the OpenAI API. The results of the evaluation will then be merged into the standard evaluation results.

] :param name: The name of the grader. :type name: str :param image_tag: The image tag for the Python execution environment. :type image_tag: str :param pass_threshold: Score threshold for pass/fail classification.

Scores >= threshold are considered passing.

Constructor

AzureOpenAIPythonGrader(*, model_config: AzureOpenAIModelConfiguration | OpenAIModelConfiguration, name: str, image_tag: str, pass_threshold: float, source: str, **kwargs: Any)

Parameters

Name Description
model_config
Required

The model configuration to use for the grader.

source
Required
str

Python source code containing the grade function. Must define: def grade(sample: dict, item: dict) -> float

kwargs
Required
Any

Additional keyword arguments to pass to the grader.

Keyword-Only Parameters

Name Description
model_config
Required
name
Required
image_tag
Required
pass_threshold
Required
source
Required

Examples

Using AzureOpenAIPythonGrader for custom evaluation logic.


   from azure.ai.evaluation import AzureOpenAIPythonGrader, evaluate
   from azure.ai.evaluation._model_configurations import AzureOpenAIModelConfiguration
   import os

   # Configure your Azure OpenAI connection
   model_config = AzureOpenAIModelConfiguration(
       azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
       api_key=os.environ["AZURE_OPENAI_API_KEY"],
       api_version=os.environ["AZURE_OPENAI_API_VERSION"],
       azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
   )

   # Create a Python grader with custom evaluation logic
   python_grader = AzureOpenAIPythonGrader(
       model_config=model_config,
       name="custom_accuracy",
       image_tag="2025-05-08",
       pass_threshold=0.8,  # 80% threshold for passing
       source="""
   def grade(sample: dict, item: dict) -> float:
       \"\"\"
       Custom grading logic that compares model output to expected label.
       
       Args:
           sample: Dictionary that is typically empty in Azure AI Evaluation
           item: Dictionary containing ALL the data including model output and ground truth
       
       Returns:
           Float score between 0.0 and 1.0
       \"\"\"
       # Important: In Azure AI Evaluation, all data is in 'item', not 'sample'
       # The 'sample' parameter is typically an empty dictionary
       
       # Get the model's response/output from item
       output = item.get("response", "") or item.get("output", "") or item.get("output_text", "")
       output = output.lower()
       
       # Get the expected label/ground truth from item
       label = item.get("ground_truth", "") or item.get("label", "") or item.get("expected", "")
       label = label.lower()
       
       # Handle empty cases
       if not output or not label:
           return 0.0
       
       # Exact match gets full score
       if output == label:
           return 1.0
       
       # Partial match logic (customize as needed)
       if output in label or label in output:
           return 0.5
       
       return 0.0
   """,
   )

   # Run evaluation
   evaluation_result = evaluate(
       data="evaluation_data.jsonl",  # JSONL file with columns: query, response, ground_truth, etc.
       evaluators={"custom_accuracy": python_grader},
   )

   # Access results
   print(f"Pass rate: {evaluation_result['metrics']['custom_accuracy.pass_rate']}")

Methods

get_client

Construct an appropriate OpenAI client using this grader's model configuration. Returns a slightly different client depending on whether or not this grader's model configuration is for Azure OpenAI or OpenAI.

get_client

Construct an appropriate OpenAI client using this grader's model configuration. Returns a slightly different client depending on whether or not this grader's model configuration is for Azure OpenAI or OpenAI.

get_client() -> Any

Returns

Type Description
[<xref:openai.OpenAI>, <xref:openai.AzureOpenAI>]

The OpenAI client.

Attributes

id

id = 'azureai://built-in/evaluators/azure-openai/python_grader'