Hi there
In Azure OpenAI, context retention is limited by the model’s maximum token window, so after several exchanges, older messages get truncated. The best practice is to manage conversation history manually: store prior messages and selectively include relevant parts in each prompt rather than the full chat. You can also use summarization to compress earlier context into a shorter form. Additionally, for multi-turn bots, consider embedding key conversation points in a vector store and retrieving them as context for the model — this helps maintain continuity without hitting token limits. There isn’t a built-in “infinite memory,” so careful prompt engineering and context management are essential for longer conversations.
If this helps kindly accept the answr