Long latency and dropped messages to the Front-end

Question

Long latency and dropped messages to the Front-end

Mikhail Soloviev 0

we're experiencing delays on the front-end side with latency and messages dropped. This is a major disruption of our app function. This behaviour looks very similar to the one we had when we hit Cosmos DB quotas. Now however, there are no 429 Throttling messages and increasing the units does not help either.
MQTT messages from the IoT devices come normally. Using Azure explorer we can see that they arrive as they should.

Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-22T18:19:08.6+00:00

Hello Mikhail Soloviev,

Thanks for sharing your situation.

From your description, device messages arrive at Azure IoT Hub and are visible in Azure Explorer, so the ingestion side seems to be working fine. The problem is happening downstream, somewhere between ingestion and front-end delivery you’re seeing high latency and messages getting dropped.

What you can try next:

Enable end-to-end telemetry across the pipeline (use Application Insights / Azure Monitor) so you can trace how long each component takes (IoT Hub → ingest → storage/processing → API → front-end).

Check Cosmos DB diagnostics: look at request latency, RU consumption per partition, any “throttled retries” even if they didn’t surface as 429s, and partition key load distribution.

Investigate the streaming / processing layer: is there a backlog? Are there bottlenecks on consumers of events? Is the front-end API scaling appropriately?

Enable detailed logging for the front-end app and the APIs that deliver the messages, and measure queue sizes or pending messages in the event hub/bus.

Note: This doesn’t look like a basic ingestion error, it’s more likely a downstream / read-path performance bottleneck or routing/processing delay. Once you narrow down where the “pause” or “drop” happens in the chain, you’ll have a clearer fix.

Hope this helps! If you find some specific telemetry data feel free to share and we can help interpret it.
Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-23T20:41:24.31+00:00

Hello Mikhail Soloviev,

Did you get any chance to review the above response.

Thank you!
Mikhail Soloviev 0 Reputation points

2025-10-24T06:54:30.77+00:00

we inspected all metrics and did not find any abnormalities there. Yet the issue persisted all day yesterday. Now it subsided but we do not know the reason. It means it will come back again. The same story we had with throttling in Cosmos we had a month ago.
Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-24T20:31:59.61+00:00
Hello Mikhail Soloviev,
Thanks for the update and for checking all the metrics that really helps narrow things down.

If all telemetry (IoT Hub, Event Hub, Cosmos DB, and front-end) looked normal but the latency still persisted for several hours and then went away on its own, it’s likely that this was a transient backend performance issue or regional service degradation somewhere in the data path. These types of slowdowns sometimes happen when there’s temporary load balancing or storage latency on the underlying Azure infrastructure, even though your own quotas and metrics look healthy.

Since you’ve already faced a similar issue earlier with throttling, please doing two things going forward:

Enable end-to-end diagnostics or Application Insights tracing across each layer (IoT Hub → Event Hub → Cosmos → API → front-end). That way, if the issue reappears, you’ll have timestamps showing exactly where latency spikes start.

Check Azure Service Health for your region and set up alerts this will notify you automatically if there’s a regional incident or transient performance degradation.

For now, since it has subsided, there’s not much to fix immediately, but having tracing and health alerts in place will make it easier to confirm whether the next occurrence is service-side or workload-specific.

Hope this helps clarify what might be happening!

1 answer

Your answer

Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-22T18:19:08.6+00:00

Hello Mikhail Soloviev,

Thanks for sharing your situation.

From your description, device messages arrive at Azure IoT Hub and are visible in Azure Explorer, so the ingestion side seems to be working fine. The problem is happening downstream, somewhere between ingestion and front-end delivery you’re seeing high latency and messages getting dropped.

What you can try next:

Enable end-to-end telemetry across the pipeline (use Application Insights / Azure Monitor) so you can trace how long each component takes (IoT Hub → ingest → storage/processing → API → front-end).

Check Cosmos DB diagnostics: look at request latency, RU consumption per partition, any “throttled retries” even if they didn’t surface as 429s, and partition key load distribution.

Investigate the streaming / processing layer: is there a backlog? Are there bottlenecks on consumers of events? Is the front-end API scaling appropriately?

Enable detailed logging for the front-end app and the APIs that deliver the messages, and measure queue sizes or pending messages in the event hub/bus.

Note: This doesn’t look like a basic ingestion error, it’s more likely a downstream / read-path performance bottleneck or routing/processing delay. Once you narrow down where the “pause” or “drop” happens in the chain, you’ll have a clearer fix.

Hope this helps! If you find some specific telemetry data feel free to share and we can help interpret it.
Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-23T20:41:24.31+00:00

Hello Mikhail Soloviev,

Did you get any chance to review the above response.

Thank you!
Mikhail Soloviev 0 Reputation points

2025-10-24T06:54:30.77+00:00

we inspected all metrics and did not find any abnormalities there. Yet the issue persisted all day yesterday. Now it subsided but we do not know the reason. It means it will come back again. The same story we had with throttling in Cosmos we had a month ago.
Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator

2025-10-24T20:31:59.61+00:00

Hello Mikhail Soloviev,
Thanks for the update and for checking all the metrics that really helps narrow things down.

If all telemetry (IoT Hub, Event Hub, Cosmos DB, and front-end) looked normal but the latency still persisted for several hours and then went away on its own, it’s likely that this was a transient backend performance issue or regional service degradation somewhere in the data path. These types of slowdowns sometimes happen when there’s temporary load balancing or storage latency on the underlying Azure infrastructure, even though your own quotas and metrics look healthy.

Since you’ve already faced a similar issue earlier with throttling, please doing two things going forward:

Enable end-to-end diagnostics or Application Insights tracing across each layer (IoT Hub → Event Hub → Cosmos → API → front-end). That way, if the issue reappears, you’ll have timestamps showing exactly where latency spikes start.

Check Azure Service Health for your region and set up alerts this will notify you automatically if there’s a regional incident or transient performance degradation.

For now, since it has subsided, there’s not much to fix immediately, but having tracing and health alerts in place will make it easier to confirm whether the next occurrence is service-side or workload-specific.

Hope this helps clarify what might be happening!

Answer 1

Hi ,

Thanks for reaching out to Microsoft Q&A.

The application is experiencing high latency and message drops on the front end, severely affecting functionality. MQTT messages from IoT devices are confirmed to arrive normally, as verified through Azure Explorer, indicating that the issue is not with device connectivity or message ingestion.

Previously, similar symptoms occurred when CosmosDB throughput limits were reached, resulting in 429 throttling errors. However, this time, no throttling events are observed, and increasing the request units (RUs) does not improve performance, ruling out direct quota saturation.

The issue likely lies between the ingestion and front-end delivery layers, potentially in the message processing, event routing (eventhub, stream analytics, or service bus), or API response path. Bottlenecks in query latency, partition key design, or hot partitions within cosmosDB could also contribute without triggering throttling.

Next steps to check:

Check end-to-end telemetry (App Insights, Azure Monitor) for latency spikes.

Review Cosmos DB diagnostics for request latency, partition load, and RU consumption patterns.

Verify if the API or streaming layer has backlog or scaling issues.

Enable detailed logging to identify where message loss or delay begins.

This is likely a downstream processing or read-path performance issue rather than ingestion failure.

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Share via

Long latency and dropped messages to the Front-end

1 answer

Your answer