Share via


F1ScoreEvaluator Class

Calculates the F1 score for a given response and ground truth or a multi-turn conversation.

F1 Scores range from 0 to 1, with 1 being the best possible score.

The F1-score computes the ratio of the number of shared words between the model generation and the ground truth. Ratio is computed over the individual words in the generated response against those in the ground truth answer. The number of shared words between the generation and the truth is the basis of the F1 score: precision is the ratio of the number of shared words to the total number of words in the generation, and recall is the ratio of the number of shared words to the total number of words in the ground truth.

Use the F1 score when you want a single comprehensive metric that combines both recall and precision in your model's responses. It provides a balanced evaluation of your model's performance in terms of capturing accurate information in the response.

Constructor

F1ScoreEvaluator(*, threshold=0.5)

Parameters

Name Description
threshold
Required

The threshold for the F1 score evaluator. Default is 0.5.

Keyword-Only Parameters

Name Description
threshold
Default value: 0.5

Examples

Initialize with threshold and call an F1ScoreEvaluator.


   from azure.ai.evaluation import F1ScoreEvaluator

   f1_evaluator = F1ScoreEvaluator(threshold=0.6)
   f1_evaluator(response="Lyon is the capital of France.", ground_truth="Paris is the capital of France.")

Attributes

id

Evaluator identifier, experimental and to be used only with evaluation in cloud.

id = 'azureai://built-in/evaluators/f1_score'