Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
.NET developers need to integrate and interact with a growing variety of artificial intelligence (AI) services in their apps. The Microsoft.Extensions.AI libraries provide a unified approach for representing generative AI components, and enable seamless integration and interoperability with various AI services. This article introduces the libraries and provides in-depth usage examples to help you get started.
The packages
The 📦 Microsoft.Extensions.AI.Abstractions package provides the core exchange types, including IChatClient and IEmbeddingGenerator<TInput,TEmbedding>. Any .NET library that provides an LLM client can implement the IChatClient interface to enable seamless integration with consuming code.
The 📦 Microsoft.Extensions.AI package has an implicit dependency on the Microsoft.Extensions.AI.Abstractions package. This package enables you to easily integrate components such as automatic function tool invocation, telemetry, and caching into your applications using familiar dependency injection and middleware patterns. For example, it provides the UseOpenTelemetry(ChatClientBuilder, ILoggerFactory, String, Action<OpenTelemetryChatClient>) extension method, which adds OpenTelemetry support to the chat client pipeline.
Which package to reference
Libraries that provide implementations of the abstractions typically reference only Microsoft.Extensions.AI.Abstractions.
To also have access to higher-level utilities for working with generative AI components, reference the Microsoft.Extensions.AI package instead (which itself references Microsoft.Extensions.AI.Abstractions). Most consuming applications and services should reference the Microsoft.Extensions.AI package along with one or more libraries that provide concrete implementations of the abstractions.
Install the packages
For information about how to install NuGet packages, see dotnet package add or Manage package dependencies in .NET applications.
API usage examples
The following subsections show specific IChatClient usage examples:
- Request a chat response
- Request a streaming chat response
- Tool calling
- Cache responses
- Use telemetry
- Provide options
- Pipelines of functionality
- Custom IChatClientmiddleware
- Dependency injection
- Stateless vs. stateful clients
The following sections show specific IEmbeddingGenerator usage examples:
The IChatClient interface
The IChatClient interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (such as text, images, and audio), either as a complete set or streamed incrementally. Additionally, it allows for retrieving strongly typed services provided by the client or its underlying services.
.NET libraries that provide clients for language models and services can provide an implementation of the IChatClient interface. Any consumers of the interface are then able to interoperate seamlessly with these models and services via the abstractions. You can see a simple implementation at Sample implementations of IChatClient and IEmbeddingGenerator.
Request a chat response
With an instance of IChatClient, you can call the IChatClient.GetResponseAsync method to send a request and get a response. The request is composed of one or more messages, each of which is composed of one or more pieces of content. Accelerator methods exist to simplify common cases, such as constructing a request for a single piece of text content.
using Microsoft.Extensions.AI;
using OllamaSharp;
IChatClient client = new OllamaApiClient(
    new Uri("http://localhost:11434/"), "phi3:mini");
Console.WriteLine(await client.GetResponseAsync("What is AI?"));
The core IChatClient.GetResponseAsync method accepts a list of messages. This list represents the history of all messages that are part of the conversation.
Console.WriteLine(await client.GetResponseAsync(
[
    new(ChatRole.System, "You are a helpful AI assistant"),
    new(ChatRole.User, "What is AI?"),
]));
The ChatResponse that's returned from GetResponseAsync exposes a list of ChatMessage instances that represent one or more messages generated as part of the operation. In common cases, there is only one response message, but in some situations, there can be multiple messages. The message list is ordered, such that the last message in the list represents the final message to the request. To provide all of those response messages back to the service in a subsequent request, you can add the messages from the response back into the messages list.
List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new(ChatRole.User, Console.ReadLine()));
    ChatResponse response = await client.GetResponseAsync(history);
    Console.WriteLine(response);
    history.AddMessages(response);
}
Request a streaming chat response
The inputs to IChatClient.GetStreamingResponseAsync are identical to those of GetResponseAsync. However, rather than returning the complete response as part of a ChatResponse object, the method returns an IAsyncEnumerable<T> where T is ChatResponseUpdate, providing a stream of updates that collectively form the single response.
await foreach (ChatResponseUpdate update in client.GetStreamingResponseAsync("What is AI?"))
{
    Console.Write(update);
}
Tip
Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its IAsyncEnumerable<T> support, allowing for a natural and efficient way to stream data.
As with GetResponseAsync, you can add the updates from IChatClient.GetStreamingResponseAsync back into the messages list. Because the updates are individual pieces of a response, you can use helpers like ToChatResponse(IEnumerable<ChatResponseUpdate>) to compose one or more updates back into a single ChatResponse instance.
Helpers like AddMessages compose a ChatResponse and then extract the composed messages from the response and add them to a list.
List<ChatMessage> chatHistory = [];
while (true)
{
    Console.Write("Q: ");
    chatHistory.Add(new(ChatRole.User, Console.ReadLine()));
    List<ChatResponseUpdate> updates = [];
    await foreach (ChatResponseUpdate update in
        client.GetStreamingResponseAsync(chatHistory))
    {
        Console.Write(update);
        updates.Add(update);
    }
    Console.WriteLine();
    chatHistory.AddMessages(updates);
}
Tool calling
Some models and services support tool calling. To gather additional information, you can configure the ChatOptions with information about tools (usually .NET methods) that the model can request the client to invoke. Instead of sending a final response, the model requests a function invocation with specific arguments. The client then invokes the function and sends the results back to the model with the conversation history. The Microsoft.Extensions.AI.Abstractions library includes abstractions for various message content types, including function call requests and results. While IChatClient consumers can interact with this content directly, Microsoft.Extensions.AI provides helpers that can enable automatically invoking the tools in response to corresponding requests. The Microsoft.Extensions.AI.Abstractions and Microsoft.Extensions.AI libraries provide the following types:
- AIFunction: Represents a function that can be described to an AI model and invoked.
- AIFunctionFactory: Provides factory methods for creating AIFunctioninstances that represent .NET methods.
- FunctionInvokingChatClient: Wraps an IChatClientas anotherIChatClientthat adds automatic function-invocation capabilities.
The following example demonstrates a random function invocation (this example depends on the 📦 OllamaSharp NuGet package):
using Microsoft.Extensions.AI;
using OllamaSharp;
string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining";
IChatClient client = new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1");
client = ChatClientBuilderChatClientExtensions
    .AsBuilder(client)
    .UseFunctionInvocation()
    .Build();
ChatOptions options = new() { Tools = [AIFunctionFactory.Create(GetCurrentWeather)] };
var response = client.GetStreamingResponseAsync("Should I wear a rain coat?", options);
await foreach (var update in response)
{
    Console.Write(update);
}
The preceding code:
- Defines a function named GetCurrentWeatherthat returns a random weather forecast.
- Instantiates a ChatClientBuilder with an OllamaSharp.OllamaApiClientand configures it to use function invocation.
- Calls GetStreamingResponseAsyncon the client, passing a prompt and a list of tools that includes a function created with Create.
- Iterates over the response, printing each update to the console.
You can also use Model Context Protocol (MCP) tools with your IChatClient. For more information, see Build a minimal MCP client.
Cache responses
If you're familiar with Caching in .NET, it's good to know that Microsoft.Extensions.AI provides other such delegating IChatClient implementations. The DistributedCachingChatClient is an IChatClient that layers caching around another arbitrary IChatClient instance. When a novel chat history is submitted to the DistributedCachingChatClient, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the DistributedCachingChatClient returns the cached response rather than forwarding the request along the pipeline.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OllamaSharp;
var sampleChatClient = new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1");
IChatClient client = new ChatClientBuilder(sampleChatClient)
    .UseDistributedCache(new MemoryDistributedCache(
        Options.Create(new MemoryDistributedCacheOptions())))
    .Build();
string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];
foreach (var prompt in prompts)
{
    await foreach (var update in client.GetStreamingResponseAsync(prompt))
    {
        Console.Write(update);
    }
    Console.WriteLine();
}
This example depends on the 📦 Microsoft.Extensions.Caching.Memory NuGet package. For more information, see Caching in .NET.
Use telemetry
Another example of a delegating chat client is the OpenTelemetryChatClient. This implementation adheres to the OpenTelemetry Semantic Conventions for Generative AI systems. Similar to other IChatClient delegators, it layers metrics and spans around other arbitrary IChatClient implementations.
using Microsoft.Extensions.AI;
using OllamaSharp;
using OpenTelemetry.Trace;
// Configure OpenTelemetry exporter.
string sourceName = Guid.NewGuid().ToString();
TracerProvider tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();
IChatClient ollamaClient = new OllamaApiClient(
    new Uri("http://localhost:11434/"), "phi3:mini");
IChatClient client = new ChatClientBuilder(ollamaClient)
    .UseOpenTelemetry(
        sourceName: sourceName,
        configure: c => c.EnableSensitiveData = true)
    .Build();
Console.WriteLine((await client.GetResponseAsync("What is AI?")).Text);
(The preceding example depends on the 📦 OpenTelemetry.Exporter.Console NuGet package.)
Alternatively, the LoggingChatClient and corresponding UseLogging(ChatClientBuilder, ILoggerFactory, Action<LoggingChatClient>) method provide a simple way to write log entries to an ILogger for every request and response.
Provide options
Every call to GetResponseAsync or GetStreamingResponseAsync can optionally supply a ChatOptions instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as ChatOptions.Temperature. Other parameters can be supplied by name in a weakly typed manner, via the ChatOptions.AdditionalProperties dictionary, or via an options instance that the underlying provider understands, via the ChatOptions.RawRepresentationFactory property.
You can also specify options when building an IChatClient with the fluent ChatClientBuilder API by chaining a call to the ConfigureOptions(ChatClientBuilder, Action<ChatOptions>) extension method. This delegating client wraps another client and invokes the supplied delegate to populate a ChatOptions instance for every call. For example, to ensure that the ChatOptions.ModelId property defaults to a particular model name, you can use code like the following:
using Microsoft.Extensions.AI;
using OllamaSharp;
IChatClient client = new OllamaApiClient(new Uri("http://localhost:11434"));
client = ChatClientBuilderChatClientExtensions.AsBuilder(client)
    .ConfigureOptions(options => options.ModelId ??= "phi3")
    .Build();
// Will request "phi3".
Console.WriteLine(await client.GetResponseAsync("What is AI?"));
// Will request "llama3.1".
Console.WriteLine(await client.GetResponseAsync("What is AI?", new() { ModelId = "llama3.1" }));
Functionality pipelines
IChatClient instances can be layered to create a pipeline of components that each add additional functionality. These components can come from Microsoft.Extensions.AI, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the IChatClient in various ways to meet your specific needs. Consider the following code snippet that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:
// Explore changing the order of the intermediate "Use" calls.
IChatClient client = new ChatClientBuilder(new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
    .UseFunctionInvocation()
    .UseOpenTelemetry(sourceName: sourceName, configure: c => c.EnableSensitiveData = true)
    .Build();
Custom IChatClient middleware
To add additional functionality, you can implement IChatClient directly or use the DelegatingChatClient class. This class serves as a base for creating chat clients that delegate operations to another IChatClient instance. It simplifies chaining multiple clients, allowing calls to pass through to an underlying client.
The DelegatingChatClient class provides default implementations for methods like GetResponseAsync, GetStreamingResponseAsync, and Dispose, which forward calls to the inner client. A derived class can then override only the methods it needs to augment the behavior, while delegating other calls to the base implementation. This approach is useful for creating flexible and modular chat clients that are easy to extend and compose.
The following is an example class derived from DelegatingChatClient that uses the System.Threading.RateLimiting library to provide rate-limiting functionality.
using Microsoft.Extensions.AI;
using System.Runtime.CompilerServices;
using System.Threading.RateLimiting;
public sealed class RateLimitingChatClient(
    IChatClient innerClient, RateLimiter rateLimiter)
        : DelegatingChatClient(innerClient)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");
        return await base.GetResponseAsync(messages, options, cancellationToken)
            .ConfigureAwait(false);
    }
    public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");
        await foreach (var update in base.GetStreamingResponseAsync(messages, options, cancellationToken)
            .ConfigureAwait(false))
        {
            yield return update;
        }
    }
    protected override void Dispose(bool disposing)
    {
        if (disposing)
            rateLimiter.Dispose();
        base.Dispose(disposing);
    }
}
As with other IChatClient implementations, the RateLimitingChatClient can be composed:
using Microsoft.Extensions.AI;
using OllamaSharp;
using System.Threading.RateLimiting;
var client = new RateLimitingChatClient(
    new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1"),
    new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));
Console.WriteLine(await client.GetResponseAsync("What color is the sky?"));
To simplify the composition of such components with others, component authors should create a Use* extension method for registering the component into a pipeline. For example, consider the following UseRateLimiting extension method:
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder,
        RateLimiter rateLimiter) =>
        builder.Use(innerClient =>
            new RateLimitingChatClient(innerClient, rateLimiter)
        );
}
Such extensions can also query for relevant services from the DI container; the IServiceProvider used by the pipeline is passed in as an optional parameter:
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using System.Threading.RateLimiting;
public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder,
        RateLimiter? rateLimiter = null) =>
        builder.Use((innerClient, services) =>
            new RateLimitingChatClient(
                innerClient,
                services.GetRequiredService<RateLimiter>())
        );
}
Now it's easy for the consumer to use this in their pipeline, for example:
HostApplicationBuilder builder = Host.CreateApplicationBuilder(args);
IChatClient client = new OllamaApiClient(
    new Uri("http://localhost:11434/"),
    "phi3:mini");
builder.Services.AddChatClient(services =>
        client
        .AsBuilder()
        .UseDistributedCache()
        .UseRateLimiting()
        .UseOpenTelemetry()
        .Build(services));
The previous extension methods demonstrate using a Use method on ChatClientBuilder. ChatClientBuilder also provides Use overloads that make it easier to write such delegating handlers. For example, in the earlier RateLimitingChatClient example, the overrides of GetResponseAsync and GetStreamingResponseAsync only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of Use that accepts a delegate that's used for both GetResponseAsync and GetStreamingResponseAsync, reducing the boilerplate required:
using Microsoft.Extensions.AI;
using OllamaSharp;
using System.Threading.RateLimiting;
RateLimiter rateLimiter = new ConcurrencyLimiter(new()
{
    PermitLimit = 1,
    QueueLimit = int.MaxValue
});
IChatClient client = new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1");
client = ChatClientBuilderChatClientExtensions
    .AsBuilder(client)
    .UseDistributedCache()
    .Use(async (messages, options, nextAsync, cancellationToken) =>
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken).ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");
        await nextAsync(messages, options, cancellationToken);
    })
    .UseOpenTelemetry()
    .Build();
For scenarios where you need a different implementation for GetResponseAsync and GetStreamingResponseAsync in order to handle their unique return types, you can use the Use(Func<IEnumerable<ChatMessage>,ChatOptions,IChatClient,CancellationToken,
Task<ChatResponse>>, Func<IEnumerable<ChatMessage>,ChatOptions,
IChatClient,CancellationToken,IAsyncEnumerable<ChatResponseUpdate>>) overload that accepts a delegate for each.
Dependency injection
IChatClient implementations are often provided to an application via dependency injection (DI). In this example, an IDistributedCache is added into the DI container, as is an IChatClient. The registration for the IChatClient uses a builder that creates a pipeline containing a caching client (which then uses an IDistributedCache retrieved from DI) and the sample client. The injected IChatClient can be retrieved and used elsewhere in the app.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using OllamaSharp;
// App setup.
var builder = Host.CreateApplicationBuilder();
builder.Services.AddDistributedMemoryCache();
builder.Services.AddChatClient(new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseDistributedCache();
var host = builder.Build();
// Elsewhere in the app.
var chatClient = host.Services.GetRequiredService<IChatClient>();
Console.WriteLine(await chatClient.GetResponseAsync("What is AI?"));
What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.
Stateless vs. stateful clients
Stateless services require all relevant conversation history to be sent back on every request. In contrast, stateful services keep track of the history and require only additional messages to be sent with a request. The IChatClient interface is designed to handle both stateless and stateful AI services.
When working with a stateless service, callers maintain a list of all messages. They add in all received response messages and provide the list back on subsequent interactions.
List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new(ChatRole.User, Console.ReadLine()));
    var response = await client.GetResponseAsync(history);
    Console.WriteLine(response);
    history.AddMessages(response);
}
For stateful services, you might already know the identifier used for the relevant conversation. You can put that identifier into ChatOptions.ConversationId. Usage then follows the same pattern, except there's no need to maintain a history manually.
ChatOptions statefulOptions = new() { ConversationId = "my-conversation-id" };
while (true)
{
    Console.Write("Q: ");
    ChatMessage message = new(ChatRole.User, Console.ReadLine());
    Console.WriteLine(await client.GetResponseAsync(message, statefulOptions));
}
Some services might support automatically creating a conversation ID for a request that doesn't have one, or creating a new conversation ID that represents the current state of the conversation after incorporating the last round of messages. In such cases, you can transfer the ChatResponse.ConversationId over to the ChatOptions.ConversationId for subsequent requests. For example:
ChatOptions options = new();
while (true)
{
    Console.Write("Q: ");
    ChatMessage message = new(ChatRole.User, Console.ReadLine());
    ChatResponse response = await client.GetResponseAsync(message, options);
    Console.WriteLine(response);
    options.ConversationId = response.ConversationId;
}
If you don't know ahead of time whether the service is stateless or stateful, you can check the response ConversationId and act based on its value. If it's set, then that value is propagated to the options and the history is cleared so as to not resend the same history again. If the response ConversationId isn't set, then the response message is added to the history so that it's sent back to the service on the next turn.
List<ChatMessage> chatHistory = [];
ChatOptions chatOptions = new();
while (true)
{
    Console.Write("Q: ");
    chatHistory.Add(new(ChatRole.User, Console.ReadLine()));
    ChatResponse response = await client.GetResponseAsync(chatHistory);
    Console.WriteLine(response);
    chatOptions.ConversationId = response.ConversationId;
    if (response.ConversationId is not null)
    {
        chatHistory.Clear();
    }
    else
    {
        chatHistory.AddMessages(response);
    }
}
The IEmbeddingGenerator interface
The IEmbeddingGenerator<TInput,TEmbedding> interface represents a generic generator of embeddings. For the generic type parameters, TInput is the type of input values being embedded, and TEmbedding is the type of generated embedding, which inherits from the Embedding class.
The Embedding class serves as a base class for embeddings generated by an IEmbeddingGenerator. It's designed to store and manage the metadata and data associated with embeddings. Derived types, like Embedding<T>, provide the concrete embedding vector data. For example, an Embedding<float> exposes a ReadOnlyMemory<float> Vector { get; } property for access to its embedding data.
The IEmbeddingGenerator interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.
Most users don't need to implement the IEmbeddingGenerator interface. However, if you're a library author, you can see a simple implementation at Sample implementations of IChatClient and IEmbeddingGenerator.
Create embeddings
The primary operation performed with an IEmbeddingGenerator<TInput,TEmbedding> is embedding generation, which is accomplished with its GenerateAsync method.
using Microsoft.Extensions.AI;
using OllamaSharp;
IEmbeddingGenerator<string, Embedding<float>> generator =
    new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini");
foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}
Accelerator extension methods also exist to simplify common cases, such as generating an embedding vector from a single input.
ReadOnlyMemory<float> vector = await generator.GenerateVectorAsync("What is AI?");
Pipelines of functionality
As with IChatClient, IEmbeddingGenerator implementations can be layered. Microsoft.Extensions.AI provides a delegating implementation for IEmbeddingGenerator for caching and telemetry.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OllamaSharp;
using OpenTelemetry.Trace;
// Configure OpenTelemetry exporter
string sourceName = Guid.NewGuid().ToString();
TracerProvider tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();
// Explore changing the order of the intermediate "Use" calls to see
// what impact that has on what gets cached and traced.
IEmbeddingGenerator<string, Embedding<float>> generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
        new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini"))
    .UseDistributedCache(
        new MemoryDistributedCache(
            Options.Create(new MemoryDistributedCacheOptions())))
    .UseOpenTelemetry(sourceName: sourceName)
    .Build();
GeneratedEmbeddings<Embedding<float>> embeddings = await generator.GenerateAsync(
[
    "What is AI?",
    "What is .NET?",
    "What is AI?"
]);
foreach (Embedding<float> embedding in embeddings)
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}
The IEmbeddingGenerator enables building custom middleware that extends the functionality of an IEmbeddingGenerator. The DelegatingEmbeddingGenerator<TInput,TEmbedding> class is an implementation of the IEmbeddingGenerator<TInput, TEmbedding> interface that serves as a base class for creating embedding generators that delegate their operations to another IEmbeddingGenerator<TInput, TEmbedding> instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as GenerateAsync and Dispose, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
The following is an example implementation of such a delegating embedding generator that rate-limits embedding generation requests:
using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;
public class RateLimitingEmbeddingGenerator(
    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
{
    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);
        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }
        return await base.GenerateAsync(values, options, cancellationToken);
    }
    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            rateLimiter.Dispose();
        }
        base.Dispose(disposing);
    }
}
This can then be layered around an arbitrary IEmbeddingGenerator<string, Embedding<float>> to rate limit all embedding generation operations.
using Microsoft.Extensions.AI;
using OllamaSharp;
using System.Threading.RateLimiting;
IEmbeddingGenerator<string, Embedding<float>> generator =
    new RateLimitingEmbeddingGenerator(
        new OllamaApiClient(new Uri("http://localhost:11434/"), "phi3:mini"),
        new ConcurrencyLimiter(new()
        {
            PermitLimit = 1,
            QueueLimit = int.MaxValue
        }));
foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}
In this way, the RateLimitingEmbeddingGenerator can be composed with other IEmbeddingGenerator<string, Embedding<float>> instances to provide rate-limiting functionality.
Build with Microsoft.Extensions.AI
You can start building with Microsoft.Extensions.AI in the following ways:
- Library developers: If you own libraries that provide clients for AI services, consider implementing the interfaces in your libraries. This allows users to easily integrate your NuGet package via the abstractions. For example implementations, see Sample implementations of IChatClient and IEmbeddingGenerator.
- Service consumers: If you're developing libraries that consume AI services, use the abstractions instead of hardcoding to a specific AI service. This approach gives your consumers the flexibility to choose their preferred provider.
- Application developers: Use the abstractions to simplify integration into your apps. This enables portability across models and services, facilitates testing and mocking, leverages middleware provided by the ecosystem, and maintains a consistent API throughout your app, even if you use different services in different parts of your application.
- Ecosystem contributors: If you're interested in contributing to the ecosystem, consider writing custom middleware components.
For more samples, see the dotnet/ai-samples GitHub repository. For an end-to-end sample, see eShopSupport.