F1ScoreEvaluator Class
Calculates the F1 score for a given response and ground truth or a multi-turn conversation.
F1 Scores range from 0 to 1, with 1 being the best possible score.
The F1-score computes the ratio of the number of shared words between the model generation and the ground truth. Ratio is computed over the individual words in the generated response against those in the ground truth answer. The number of shared words between the generation and the truth is the basis of the F1 score: precision is the ratio of the number of shared words to the total number of words in the generation, and recall is the ratio of the number of shared words to the total number of words in the ground truth.
Use the F1 score when you want a single comprehensive metric that combines both recall and precision in your model's responses. It provides a balanced evaluation of your model's performance in terms of capturing accurate information in the response.
Constructor
F1ScoreEvaluator(*, threshold=0.5)
Parameters
| Name | Description |
|---|---|
|
threshold
Required
|
The threshold for the F1 score evaluator. Default is 0.5. |
Keyword-Only Parameters
| Name | Description |
|---|---|
|
threshold
|
Default value: 0.5
|
Examples
Initialize with threshold and call an F1ScoreEvaluator.
from azure.ai.evaluation import F1ScoreEvaluator
f1_evaluator = F1ScoreEvaluator(threshold=0.6)
f1_evaluator(response="Lyon is the capital of France.", ground_truth="Paris is the capital of France.")
Attributes
id
Evaluator identifier, experimental and to be used only with evaluation in cloud.
id = 'azureai://built-in/evaluators/f1_score'