Thanks for sharing the detailed context and code snippet!
You’re absolutely right by default, Azure AI Foundry agents currently resend the entire thread context (including user messages, model outputs, and intermediate tool responses) with each new run. This is by design, as Foundry’s thread-based memory mechanism maintains full conversational continuity for consistent model behavior.
That said, if you’d like to optimize or control memory usage, here are a few approaches you can consider:
1. Manage conversation memory manually You can implement a custom memory handler by storing only selected parts of the conversation (for example, just the last N turns) before calling messages.create(). At the moment, Foundry doesn’t expose a public API to replace the internal memory manager directly memory threads are handled automatically.
2. Use a new thread for short or stateless interactions If you don’t need the full history every time, create a fresh thread_id per query. This avoids sending long histories to the model, which helps reduce token consumption.
3. Use lightweight summarization Some developers maintain a condensed context by periodically summarizing previous interactions and sending that summary as a system message this helps maintain continuity while keeping the payload smaller.
Currently, Foundry doesn’t yet provide an official configuration to switch or limit the internal memory manager, but this capability is under consideration as part of SDK enhancements.
To help us understand your setup better and guide you further, could you please share a bit more info?
What does your current agent configuration look like (especially how messages are being handled)?
Are you able to check how many previous turns are currently retained in the conversation history?
Which SDK version of azure-ai-agents and azure-ai-projects are you using (just confirming your pip list)?
Once we have that, we can suggest a more tailored approach for managing memory efficiently.
Hope this helps!