How do you use RAG in the REST API with GPT-5?

Question

How do you use RAG in the REST API with GPT-5?

Ronald Wyman 76

I have added an Azure Search Service index with the options for semantic configuration and a Vector Profile. When I execute the following REST request. The following error is raised. Does anyone have some insight into this result?

[Error Message] An error occurred when calling Azure OpenAI: Server responded with status 400. Error message: {

"error": {

"message": "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.",

"type": "invalid_request_error",

"param": "max_tokens",

"code": "unsupported_parameter"

}

As you can see, I make no reference to max_tokens. When I add max_completion_tokens, reasoning_effort, or verbosity, an error is raised indicating that these options are not supported.

My JSON Request
{

"messages": [

{

  "role": "system",

  "content": "My prompt here"

},

{

  "role": "user",

  "content": "show a bar chart"

}

],

"stop": null,

"stream": false,

"frequency_penalty": 0,

"presence_penalty": 0,

"temperature": 1,

"data_sources": [

{

  "type": "azure_search",

  "parameters": {

    "endpoint": "https:\/\/....search.windows.net",

    "index_name": "aisearch-indexset-42984f3678dd446fa51ee376e792c125",

    "query_type": "semantic",

    "semantic_configuration": "semantic-config",

    "strictness": 3,

    "top_n_documents": 10,

    "in_scope": true,

    "filter": "search.ismatch('LIB00000000000000*', 'dbdefid')",

    "fields_mapping": {

      "content_fields_separator": "\\n",

      "content_fields": [

        "content"

      ],

      "filepath_field": "dxlink",

      "title_field": "exportfilename",

      "url_field": "dxlink",

      "vector_fields": [

        "vector_content"

      ]

    },

    "authentication": {

      "type": "api_key",

      "key": "...."

    }

  }

}

]

}

Thank you in advance.
Ron

7 answers

Your answer

Answer 1

Thanks for reaching out. This appears to be a multi-faceted issue related to using Retrieval-Augmented Generation (RAG) with Azure OpenAI’s FPT-5 via the REST API.

1. Solution for unsupported parameter: max_tokens’ Error

This is the most explicit error message and points to a change in API parameter naming for newer models like GPT-5, o1, and gpt-4o/mini in Azure openAI.

The fix:

You should remove the unsupported parameter and replace it with the correct one in your JSON request body.

· Remove: “max_tokens”:[value]

· Use Instead: “max_completion_tokens”:[value]

Note on the other unsupported parameters: you also mentioned issues with reasoning_effort and verbosity.

· Reasoning_effort: This is parameter specific to certain advanced reasoning models (like the full GPT-5) and may not be supported or available in the standard Chat Completions API and RAG integration, or it might be a model-specific parameter. If you are getting an error, it’s best to remove it unless the specific model documentation confirms its support in this context.

· Verbosity: This parameter is also model specific and generally used to control the verbosity of certain advanced model outputs (like tool-calling or reasoning steps). Remove it if you receive an error.

Reference link: Azure OpenAI in Azure AI Foundry Models v1 REST API reference - Azure OpenAI | Microsoft Learn

2. Solution for Status 400 error (Bad request)

A status 400 with a generic error message, after fixing parameter name, often indicates a fundamentals issue with the request structure or the RAG configuration itself.

Common causes for a 400 error in Azure OpenAI RAG:

· Missing or incorrect Authentication in data_sources: It shows that authentication is set to api_key but the key is masked as …. Double check that the API key provided for your Azure AI search resource is correct and has the necessary permissions.

· Required parameter missing for search type: you configuration is using query_type: ”semantic” which requires a semantic configuration name (semantic_configuration: “semantic-config”), which you have. However, sometimes for certain models or search configurations, other parameters like Embedding Endpoint are required, especially if your data source has vector fields and the model is not handling the embedding internally.

o Check for embedding Endpoint requirement: if your data source index relies on a vector store, ensure the REST API call correctly specifies how the embeddings are handled, or if a separate embeddingEndpoint is now required for your model/API version combination.

Action for status 400:

1. Verify API Key: Ensure the key under authentication in data_sources is the correct admin key for your Azure AI search service.

2. Verify Index Name: Double-check that the index-name exists and is correct in your Azure AI search instance.

3. Check for embedding Endpoint (if using vector search): if the RAG service requires an embedding model deployment URI for vector search/reranking, you might need to add an embedding Endpoint parameter within the data_sources section or ensure you are using a model that handles this implicitly.

Reference links:

· Azure OpenAI On Your Data Python & REST API reference - Azure OpenAI | Microsoft Learn

· Quickstart: Generative Search (RAG) - Azure AI Search | Microsoft Learn

· Connect using API keys - Azure AI Search | Microsoft Learn

Let me know if you need any further help with this. We'll be happy to assist.

If you find this helpful, please mark this as answered.

Answer 2

Ronald Wyman 76

Hi Aibid,

max_tokens is not referenced in the request.

Thx

Ron

Answer 3

Thank you for reaching out

Why This Happens

You’re right — even though your request JSON doesn’t explicitly include max_tokens, the error still appears. This is because the SDK or client library you are using is automatically adding max_tokens behind the scenes when building the request.

For newer models such as GPT-5 and the GPT-4o family, the Azure OpenAI API no longer accepts the max_tokens parameter. Instead, these models require the newer parameter max_completion_tokens. If the older one is present in the request (even if injected automatically), the API rejects it.

This explains why you’re seeing the error even though you didn’t type max_tokens in your JSON.

How to Resolve

1. Use the correct parameter

Replace max_tokens with max_completion_tokens.
Remove unsupported parameters like reasoning_effort or verbosity unless your specific model documentation confirms support.

2. Check your SDK or library version

Many older SDKs and client wrappers still inject max_tokens.
Update to the latest Azure OpenAI SDK or try sending the request manually through REST/Postman to confirm the behavior.

3. Inspect the raw request

Enable request logging or use a tool like Postman/Fiddler to capture the exact JSON being sent.
This will confirm whether max_tokens is being added automatically.

4. Test step by step

First, send a simple chat completion request without RAG enabled to verify the parameter issue is fixed.
Once that succeeds, add your data_sources block back in to confirm the Retrieval-Augmented Generation configuration works as expected.

5. Check RAG configuration if 400 errors persist

Verify your Azure AI Search index name, semantic configuration, and vector settings are correct.
Make sure your search service has a valid API key with the right permissions.
If you are using vector fields, confirm that your index has a vectorizer configured, otherwise an embedding endpoint may be needed.

References

Here are the official reference links you can include:

Azure OpenAI Service – Chat Completions REST API https://free.blessedness.top/en-us/azure/ai-foundry/openai/reference
Use Azure OpenAI on Your Data (RAG) https://free.blessedness.top/en-us/azure/ai-foundry/openai/concepts/use-your-data?tabs=ai-search%2Ccopilot
Azure AI Search – Vector Search https://free.blessedness.top/en-us/azure/search/vector-search-overview

In short: the error isn’t from your JSON, but from the SDK inserting max_tokens. Updating your client, switching to max_completion_tokens, and inspecting the raw request should solve it.

Let me know if you need any further help with this. We'll be happy to assist.

If you find this helpful, please mark this as answered

Answer 4

Ronald Wyman 76

Hi Varsha,

I'm not using any library, and the JSON I provided is what is placed in the body of the request.

Thx

Ron

Varsha Dundigalla(INFOSYS LIMITED) 2,700 Reputation points Microsoft External Staff

2025-09-30T12:01:33.5233333+00:00
Thanks for sharing the details. To better understand why the API is still returning a max_tokens error, could you please provide the raw HTTP request body that was actually sent? Also, let us know which tool or method you used to send the request. Please make sure to redact any sensitive information.

The most important details for troubleshooting are:

Whether the request body includes a hidden max_tokens field (some tools add it automatically)

The exact API version you're calling

This will help us determine whether the issue is caused by an extra parameter being passed or an API version mismatch.

Answer 5

Ronald Wyman 76

Hi Varsha,

I provided this as part of the initial posting.

Ron

Varsha Dundigalla(INFOSYS LIMITED) 2,700 Reputation points Microsoft External Staff

2025-09-30T13:51:03.24+00:00

You're correct that your JSON doesn’t explicitly include max_tokens. However, the error occurs because newer models like GPT-5 and the o-series do not support max_tokens at all. For these models deployed via the Chat Completions API, the correct parameter to use is max_completion_tokens.

This behavior is documented by Microsoft, which notes that reasoning and o-series models only accept max_completion_tokens in Chat Completions. If you're using an older API version, the service may still reject max_tokens — even if it's passed as null or automatically injected by the client.

There are also multiple community reports showing the same error: “Unsupported parameter: maxtokens… Use maxcompletion_tokens instead” when calling these models.

To resolve this, I recommend using to max_completion_tokens in your request and ensuring you're using a recent API version. This is the supported way to control output length for GPT-5 and related models.

Share via

How do you use RAG in the REST API with GPT-5?

7 answers

Your answer