Share via


Creating custom LLM Judges

While MLflow's built-in LLM Judges offer excellent starting points for common quality dimensions in simpler applications, you'll need to create custom LLM judges as your application becomes more complex and to tune your evaluation criteria to meet the specific, nuanced business requirements of your use case and align with your domain expert's judgement. MLflow provides robust and flexible ways to create custom LLM judges tailored to these unique requirements.

Custom prompt judges

  • Best for: Complex, nuanced evaluations where you need full control over the judge's prompt or need to have the judge specify multiple output values, for example, "great", "ok", "bad".
  • How it works: You provide a prompt template that defines your evaluation criteria and has placeholders for specific fields in your app's trace. You define the output choices the judge can select. An LLM then selects the appropiate output choice and provides a rationale for its selection.

Get started with custom prompt judges

Next steps

Continue your journey with these recommended actions and tutorials.

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.