Service Bus Azure Function repeatedly fails with timeout when trying to start a Durable Functions Orchestration
Service Bus Azure Function repeatedly fails with timeout when trying to start a Durable Functions Orchestration
We have a service bus function that listens for a service bus message, and starts a durable function orchestration. It will, maybe a couple times a die, start repeatedly failing with timeout RpcExceptions, assumedly when calling the underlying functions runtime. From the Azure troubleshooting, it looks like there are "Port 4001 in use errors" that would also point to the Azure Functions runtime failing. The only way to fix it is to restart the entire Functions App.
Time period: 2025-10-23 – 8:15 AM - 3:00PM US/Chicago/Central Time
In the logging below - we restarted the Functions App at ~8:50AM and ~3:00PM. The auto scale instance spikes are, I assume, the Service Bus functions starting a bunch of Durable Orchestrations.
It seems like we scale down to one auto scale instance, that instance's functions runtime dies and can't restart, then our functions become stuck until we restart. When multiple instances are running, it seems more robust, since as long as there is at least one instance with the functions runtime healthy, it can push through Orchestration starts.
NOTE - I can provide some error log screen shots/etc, but it isn't letting me include them.