Develop applications with LangChain and Azure AI Foundry

2025-10-07

LangChain is a developer ecosystem that makes it easier to build reasoning applications. It includes multiple components, and most of them can be used independently, allowing you to pick and choose the pieces you need.

Models deployed to Azure AI Foundry can be used with LangChain in two ways:

Use the Azure AI Model Inference API: All models deployed in Azure AI Foundry support the Model Inference API, which offers a common set of capabilities across most models in the catalog. Because the API is consistent, switching models is as simple as changing the deployment; no code changes are required. With LangChain, install the langchain-azure-ai integration.
Use the model provider’s API: Some models, such as OpenAI, Cohere, or Mistral, offer their own APIs and LangChain extensions. These extensions might include model-specific capabilities and are suitable if you need to use them. Install the extension for your chosen model, such as langchain-openai or langchain-cohere.

This tutorial shows how to use the langchain-azure-ai package with LangChain.

Prerequisites

To run this tutorial, you need:

An Azure account with an active subscription. If you don't have one, create a free Azure account, which includes a free trial subscription.
A model deployment that supports the Model Inference API. This article uses a Mistral-Large-2411 deployment available in the Azure AI Foundry model catalog.
Python 3.9 or later installed, including pip.
LangChain installed. You can install it with:
```
pip install langchain
```
Install the Azure AI Foundry integration:
```
pip install -U langchain-azure-ai
```

Configure the environment

To use LLMs deployed in Azure AI Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use:

Tip

Because you can customize the left pane in the Azure AI Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

Go to the Azure AI Foundry.
Open the project where the model is deployed, if it isn't already open.
Go to Models + endpoints and select the model you deployed as indicated in the prerequisites.
Copy the endpoint URL and the key.

Tip

If your model was deployed with Microsoft Entra ID support, you don't need a key.

In this scenario, set the endpoint URL and key as environment variables. (If the endpoint you copied includes additional text after /models, remove it so the URL ends at /models as shown below.)

export AZURE_INFERENCE_ENDPOINT="https://<resource>.services.ai.azure.com/models"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"

After configuration, create a client to connect to the chat model using init_chat_model. For Azure OpenAI models, see Use Azure OpenAI models.

from langchain.chat_models import init_chat_model

llm = init_chat_model(model="Mistral-Large-2411", model_provider="azure_ai")

You can also use the class AzureAIChatCompletionsModel directly.

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
)

Caution

Breaking change: Parameter model_name was renamed model in version 0.1.3.

You can use the following code to create the client if your endpoint supports Microsoft Entra ID:

import os
from azure.identity import DefaultAzureCredential
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=DefaultAzureCredential(),
    model="Mistral-Large-2411",
)

Note

When using Microsoft Entra ID, make sure that the endpoint was deployed with that authentication method and that you have the required permissions to invoke it.

If you plan to use asynchronous calls, use the asynchronous version of the credentials:

from azure.identity.aio import (
    DefaultAzureCredential as DefaultAzureCredentialAsync,
)
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=DefaultAzureCredentialAsync(),
    model="Mistral-Large-2411",
)

If your endpoint serves a single model (for example, serverless API deployments), omit the model parameter:

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
)

Use chat completion models

Use the model directly. ChatModels are instances of the LangChain Runnable interface, which provides a standard way to interact with them. To call the model, pass a list of messages to the invoke method.

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]

model.invoke(messages)

Compose operations as needed in chains. Use a prompt template to translate sentences:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

system_template = "Translate the following into {language}:"
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{text}")]
)

This chain takes language and text inputs. Now, create an output parser:

from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

Combine the template, model, and output parser using the pipe (|) operator:

chain = prompt_template | model | parser

Invoke the chain by providing language and text values using the invoke method:

chain.invoke({"language": "italian", "text": "hi"})

Chain multiple LLMs together

Because models in Azure AI Foundry expose a common Model Inference API, you can chain multiple LLM operations and choose the model best suited to each step.

In the following example, we create two model clients: one producer and one verifier. To make the distinction clear, we use a multi-model endpoint such as the Model Inference API and pass the model parameter to use Mistral-Large for generation and Mistral-Small for verification. Producing content generally requires a larger model, while verification can use a smaller one.

from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

producer = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
)

verifier = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="mistral-small",
)

Tip

Review the model card for each model to understand the best use cases.

The following example generates a poem written by an urban poet:

from langchain_core.prompts import PromptTemplate

producer_template = PromptTemplate(
    template="You are an urban poet, your job is to come up \
             verses based on a given topic.\n\
             Here is the topic you have been asked to generate a verse on:\n\
             {topic}",
    input_variables=["topic"],
)

verifier_template = PromptTemplate(
    template="You are a verifier of poems, you are tasked\
              to inspect the verses of poem. If they consist of violence and abusive language\
              report it. Your response should be only one word either True or False.\n \
              Here is the lyrics submitted to you:\n\
              {input}",
    input_variables=["input"],
)

Chain the pieces:

chain = producer_template | producer | parser | verifier_template | verifier | parser

The previous chain returns only the output of the verifier step. To access the intermediate result generated by the producer, use a RunnablePassthrough to output that intermediate step.

from langchain_core.runnables import RunnablePassthrough, RunnableParallel

generate_poem = producer_template | producer | parser
verify_poem = verifier_template | verifier | parser

chain = generate_poem | RunnableParallel(poem=RunnablePassthrough(), verification=RunnablePassthrough() | verify_poem)

Invoke the chain using the invoke method:

chain.invoke({"topic": "living in a foreign country"})

Use embedding models

Create an embeddings client similarly. Set the environment variables to point to an embeddings model:

export AZURE_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"

Create the client:

import os
from langchain_azure_ai.embeddings import AzureAIEmbeddingsModel

embed_model = AzureAIEmbeddingsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="text-embedding-3-large",
)

Use an in-memory vector store:

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embed_model)

Add documents:

from langchain_core.documents import Document

document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})

documents = [document_1, document_2]
vector_store.add_documents(documents=documents)

Search by similarity:

results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

Use Azure OpenAI models

When using Azure OpenAI models with the langchain-azure-ai package, use the following endpoint format:

from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

llm = AzureAIChatCompletionsModel(
    endpoint="https://<resource>.openai.azure.com/openai/v1",
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="gpt-4o"
)

Debugging and troubleshooting

If you need to debug your application and understand the requests sent to models in Azure AI Foundry, use the integration’s debug capabilities:

First, configure logging to the desired level:

import sys
import logging

# Acquire the logger for this client library. Use 'azure' to affect both
# 'azure.core` and `azure.ai.inference' libraries.
logger = logging.getLogger("azure")

# Set the desired logging level. logging.INFO or logging.DEBUG are good options.
logger.setLevel(logging.DEBUG)

# Direct logging output to stdout:
handler = logging.StreamHandler(stream=sys.stdout)
# Or direct logging output to a file:
# handler = logging.FileHandler(filename="sample.log")
logger.addHandler(handler)

# Optional: change the default logging format. Here we add a timestamp.
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
handler.setFormatter(formatter)

To see request payloads, pass logging_enable=True in client_kwargs when instantiating the client:

import os
from langchain_azure_ai.chat_models import AzureAIChatCompletionsModel

model = AzureAIChatCompletionsModel(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
    model="Mistral-Large-2411",
    client_kwargs={"logging_enable": True},
)

Use the client as usual in your code.

Tracing

Use tracing in Azure AI Foundry by creating a tracer. Logs are stored in Azure Application Insights and can be queried at any time using Azure Monitor or the Azure AI Foundry portal. Each AI hub has an associated Azure Application Insights instance.

Get your instrumentation connection string

Tip

You can configure your application to send telemetry to Azure Application Insights either by:

Using the connection string to Azure Application Insights directly:
1. Go to Azure AI Foundry portal and select Tracing.
2. Select Manage data source. In this screen you can see the instance that is associated with the project.
3. Copy the value at Connection string and set it to the following variable:
```
import os

application_insights_connection_string = "instrumentation...."
```

Using the Azure AI Foundry SDK and the Foundry Project endpoint:

Ensure you have the package azure-ai-projects installed in your environment.
Go to Azure AI Foundry portal.

Copy your Azure AI Foundry project endpoint URL and set it in the following code:

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

project_client = AIProjectClient(
    credential=DefaultAzureCredential(),
    endpoint="<your-foundry-project-endpoint-url>",
)

application_insights_connection_string = project_client.telemetry.get_application_insights_connection_string()

Configure tracing for Azure AI Foundry

The following code creates a tracer connected to the Azure Application Insights behind an Azure AI Foundry project. The enable_content_recording parameter is set to True, which captures inputs and outputs across the application, including intermediate steps. This is helpful when debugging and building applications, but you might want to disable it in production environments. You can also control this via the AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED environment variable:

from langchain_azure_ai.callbacks.tracers import AzureAIOpenTelemetryTracer

azure_tracer = AzureAIOpenTelemetryTracer(
    connection_string=application_insights_connection_string,
    enable_content_recording=True,
)

Pass the tracer via config in the invoke operation:

chain.invoke({"topic": "living in a foreign country"}, config={"callbacks": [azure_tracer]})

To configure the chain itself for tracing, use the .with_config() method:

chain = chain.with_config({"callbacks": [azure_tracer]})

Then use the invoke() method as usual:

chain.invoke({"topic": "living in a foreign country"})

View traces

To see traces:

Go to Azure AI Foundry portal.
Navigate to Tracing section.
Identify the trace you created. It may take a few seconds to appear.

Learn more about how to visualize and manage traces.

Next steps

Feedback

Was this page helpful?