Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
While MLflow's built-in LLM Judges offer excellent starting points for common quality dimensions in simpler applications, you'll need to create custom LLM judges as your application becomes more complex and to tune your evaluation criteria to meet the specific, nuanced business requirements of your use case and align with your domain expert's judgement. MLflow provides robust and flexible ways to create custom LLM judges tailored to these unique requirements.
Custom prompt judges
- Best for: Complex, nuanced evaluations where you need full control over the judge's prompt or need to have the judge specify multiple output values, for example, "great", "ok", "bad".
- How it works: You provide a prompt template that defines your evaluation criteria and has placeholders for specific fields in your app's trace. You define the output choices the judge can select. An LLM then selects the appropiate output choice and provides a rationale for its selection.
Get started with custom prompt judges
Next steps
Continue your journey with these recommended actions and tutorials.
- Create guidelines judges - Define evaluation criteria using natural language rules (recommended)
- Create custom prompt judges - Build complex judges with custom prompts and output choices
- Run judges in production - Deploy your custom judges for continuous monitoring
Reference guides
Explore detailed documentation for concepts and features mentioned in this guide.
- LLM judges - Understand how LLM judges work and their architecture
- Guidelines judges - Deep dive into guidelines-based evaluation
- Custom prompt judges - Technical details on judges with custom prompts