Share via


RougeScoreEvaluator Class

Calculates the ROUGE score for a given response and ground truth.

The ROUGE score (Recall-Oriented Understudy for Gisting Evaluation) evaluates the similarity between the generated text and reference text based on n-gram overlap, including ROUGE-N (unigram, bigram, etc.), and ROUGE-L (longest common subsequence). It calculates precision, recall, and F1 scores to capture how well the generated text matches the reference text. Rouge type options are "rouge1" (Unigram overlap), "rouge2" (Bigram overlap), "rouge3" (Trigram overlap), "rouge4" (4-gram overlap), "rouge5" (5-gram overlap), "rougeL" (L-graph overlap)

Use the ROUGE score when you need a robust evaluation metric for text summarization, machine translation, and other natural language processing tasks, especially when focusing on recall and the ability to capture relevant information from the reference text.

ROUGE scores range from 0 to 1, with higher scores indicating better quality. :param rouge_type: The type of ROUGE score to calculate. Default is "rouge1". :type rouge_type: str :param precision_threshold: The threshold value to determine if the precision evaluation passes or fails. Default is 0.5. :type precision_threshold: float :param recall_threshold: The threshold value to determine if the recall evaluation passes or fails. Default is 0.5. :type recall_threshold: float :param f1_score_threshold: The threshold value to determine if the F1 score evaluation passes or fails. Default is 0.5. :type f1_score_threshold: float

Constructor

RougeScoreEvaluator(rouge_type: RougeType, *, precision_threshold: float = 0.5, recall_threshold: float = 0.5, f1_score_threshold: float = 0.5)

Parameters

Name Description
rouge_type
Required

Keyword-Only Parameters

Name Description
precision_threshold
Default value: 0.5
recall_threshold
Default value: 0.5
f1_score_threshold
Default value: 0.5

Examples

Initialize with a specified threshold and call a RougeScoreEvaluator with a four-gram rouge type.


   from azure.ai.evaluation import RougeScoreEvaluator, RougeType

   rouge_evaluator = RougeScoreEvaluator(
       rouge_type=RougeType.ROUGE_4, precision_threshold=0.5, recall_threshold=0.5, f1_score_threshold=0.5
   )
   rouge_evaluator(response="Paris is the capital of France.", ground_truth="France's capital is Paris.")

Attributes

id

Evaluator identifier, experimental and to be used only with evaluation in cloud.

id = 'azureai://built-in/evaluators/rouge_score'