Share via


Databricks-hosted foundation models available in Foundation Model APIs

This article describes the state-of-the-art open models that are supported by Databricks Foundation Model APIs.

Note

See Supported foundation models on Mosaic AI Model Serving for region availability of these models and the supported feature areas.

You can send query requests to these models using the pay-per-token endpoints available in your Databricks workspace. See Use foundation models and pay-per-token supported models table for the names of the model endpoints to use.

In addition to supporting models in pay-per-token mode, Foundation Model APIs also offers provisioned throughput mode. Databricks recommends provisioned throughput for production workloads. This mode supports all models of a model architecture family (for example, DBRX models), including the fine-tuned and custom pre-trained models supported in pay-per-token mode. See Provisioned throughput Foundation Model APIs for the list of supported architectures.

You can interact with these supported models using the AI Playground.

OpenAI GPT-5

Important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

GPT-5 is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 128K token context window. The model is built for coding, chat, reasoning and agent-driven tasks.

As with other large language models, GPT-5 output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. within the Databricks security perimeter.

OpenAI GPT-5 mini

Important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

GPT-5 mini is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 128K token context window. The model is cost-optimized for reasoning and chat workloads and excels at well-defined tasks that require reliable reasoning, precise language, and rapid output for text and images.

As with other large language models, GPT-5 output might omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. within the Databricks security perimeter.

OpenAI GPT-5 nano

Important

Customers are responsible for ensuring their compliance with the terms of OpenAI's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

GPT-5 nano is a state-of-the-art, general purpose large language model and reasoning model built and trained by OpenAI. It supports multimodal inputs and features a 128K token context window. The model excels at high-throughput tasks like simple instruction-following or classification for routine business processes or mobile applications.

As with other large language models, GPT-5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. within the Databricks security perimeter.

OpenAI GPT OSS 120B

Important

OpenAI GPT OSS 120B is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.

GPT OSS 120B is a state-of-the-art, reasoning model with chain-of-thought and adjustable reasoning effort levels built and trained by OpenAI. It is OpenAI's flagship open-weight model and features a 128K token context window. The model is built for high-quality reasoning tasks.

As with other large language models, GPT OSS 120B output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

OpenAI GPT OSS 20B

Important

OpenAI GPT OSS 20B is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.

GPT OSS 20B is a state-of-the-art, lightweight reasoning model built and trained by OpenAI. This model has a 128K token context window and excels at real-time copilots and batch inference tasks.

As with other large language models, GPT OSS 20B output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Google Gemma 3 12B

Important

See Applicable model developer licenses and terms for the Gemma 3 Community License and Acceptable Use Policy.

Gemma 3 12B is a 12-billion parameter multimodal and vision language model developed by Google as part of the Gemma 3 family. Gemma 3 has up to a 128K token context and provides multilingual support for over 140 languages. This model is designed to handle both text and image inputs and generate text outputs, and is optimized for dialogue use cases, text generation and image understanding tasks, including question answering.

As with other large language models, Gemma 3 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 4 Maverick

Important

See Applicable model developer licenses and terms for the Llama 4 Community License and Acceptable Use Policy.

Llama 4 Maverick is a state-of-the-art large language model built and trained by Meta. It is the first of the Llama model family to use a mixture of experts architecture for compute efficiency. Llama 4 Maverick supports multiple languages and is optimized for precise image and text understanding use cases. Currently, Databricks support of Llama 4 Maverick is limited to text understanding use cases. Learn more about Llama 4 Maverick.

As with other large language models, Llama 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.3 70B Instruct

Important

Starting December 11, 2024, Meta-Llama-3.3-70B-Instruct replaces support for Meta-Llama-3.1-70B-Instruct in Foundation Model APIs pay-per-token endpoints.

See Applicable model developer licenses and terms for the LLama 3.3 Community License and Acceptable Use Policy.

Meta-Llama-3.3-70B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.3.

Similar to other large language models, Llama-3's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.1 405B Instruct

Important

The use of this model with Foundation Model APIs is in Public Preview. Reach out to your Databricks account team if you encounter endpoint failures or stabilization errors when using this model.

See Applicable model developer licenses and terms for the Llama 3.1 Community License and Acceptable Use Policy.

Meta-Llama-3.1-405B-Instruct is the largest openly available state-of-the-art large language model, built and trained by Meta, and is distributed by Azure Machine Learning using the AzureML Model Catalog. The use of this model enables customers to unlock new capabilities, such as advanced, multi-step reasoning and high-quality synthetic data generation. This model is competitive with GPT-4-Turbo in terms of quality.

Like Meta-Llama-3.1-70B-Instruct, this model has a context of 128,000 tokens and support across ten languages. It aligns with human preferences for helpfulness and safety, and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1 models.

Similar to other large language models, Llama-3.1's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Meta Llama 3.1 8B Instruct

Important

See Applicable model developer licenses and terms for the LLama 3.1 Community License and Acceptable Use Policy.

Meta-Llama-3.1-8B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1.

Similar to other large language models, Llama-3's output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

Anthropic Claude Sonnet 4.5

Important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

Claude Sonnet 4.5 is Anthropic’s most advanced hybrid reasoning model. It offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4.5 specializes in application that require a balance of practical throughput and advanced thinking such as such as customer-facing agents, production coding workflows, and content generation at scale.

As with other large language models, Claude Sonnet 4.5 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

Anthropic Claude Sonnet 4

Important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

Claude Sonnet 4 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This model offers two modes: near-instant responses and extended thinking for deeper reasoning based on the complexity of the task. Claude Sonnet 4 is optimized for various tasks such as code development, large-scale content analysis, and agent application development.

As with other large language models, Claude Sonnet 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

Anthropic Claude Opus 4.1

Important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

Claude Opus 4.1 is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. This general purpose large language model is designed for both complex reasoning and real-world applications at enterprise scale. It supports text and image input, with a 200K token context window and 32K output token capabilities. This model excels at tasks like code generation, research and content creation, and multi-step agents workflows without constant human intervention.

As with other large language models, Claude Opus 4.1 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

Anthropic Claude 3.7 Sonnet

Important

Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.

Claude 3.7 Sonnet is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. It is a large language model and reasoning model that is able to rapidly respond or extend its reasoning based on the complexity of the task. When in extended thinking mode, Claude 3.7 Sonnet's reasoning steps are visible to the user. Claude 3.7 Sonnet is optimized for various tasks such as code generation, mathematical reasoning and instruction following.

As with other large language models, Claude 3.7 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.

This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.

GTE Large (En)

Important

GTE Large (En) is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.

General Text Embedding (GTE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 8192 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and does not generate normalized embeddings.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. GTE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

BGE Large (En)

BAAI General Embedding (BGE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 512 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and generates normalized embeddings.

Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. BGE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.

In RAG applications, you may be able to improve the performance of your retrieval system by including an instruction parameter. The BGE authors recommend trying the instruction "Represent this sentence for searching relevant passages:" for query embeddings, though its performance impact is domain dependent.

Additional resources