IoT Hub Connectivity Disruption

Tim A. Smith 46 Reputation points
2025-10-16T18:52:34.59+00:00

I do not understand this message:

What happened The IoT Hub resource 'sc364' is experiencing performance and scalability issues, with metrics showing declines in cloud-to-device message deliveries, device-to-cloud telemetry, connected devices, and data usage. These issues are affecting throughput and reliability, potentially disrupting device communication and data transfer. Possible explanation A decline in 'C2D message deliveries completed', 'Telemetry messages sent', 'Connected devices', and related metrics on 'sc364' may suggest device-side network connectivity issues or authentication problems, reducing communication with the IoT hub. This could also affect metrics like 'Total device data usage', as disconnected or unauthenticated devices fail in data transfer or message delivery, worsening the observed drops. Limited information prevents a definitive conclusion. What can be done next Investigate Alerts and Anomalies Use Azure Monitor to check 'scc3644' for alerts or anomalies related to device connectivity or authentication failures. Review Logs and Metrics Analyze logs and metrics in 'scc3644' for patterns in failed authentications or network errors affecting device communication. Validate Network Configurations Ensure device-side network configurations and connectivity are correctly set up for reliable connections to 'scc3644'. Monitor Device Metrics Track device-specific metrics in 'scc3644' to identify issues affecting particular devices or groups.

This was generated by the Investigate link in my email alert. Nothing changed. I did not change any Azure Function, publish any code, or change any settings. I got two alerts: 1:34-1:49 AM and 1:59-2:23 AM. My IoT devices are very geographically diverse. I have about 300 devices across 7 US states. Approximately half use AT&T modems and another almost half use Verizon. A small percentage are wired connections to a local ISP. For this to be a network level event on my end would be unlikely. All the devices use a SAS Token and a cert for connections to my hub. The tokens were all created at different times with 10 year expiration dates. It'll be at least 3 years before one expires. And since everything is running normally now, I highly doubt it is an authentication issue on my end. The big "common connection" between all these devices is the IoT hub.

I have 2 questions:

  1. What happened?
  2. What can I do about it?
Azure IoT Hub
Azure IoT Hub
An Azure service that enables bidirectional communication between internet of things (IoT) devices and applications.
{count} votes

Answer accepted by question author
  1. SRILAKSHMI C 8,545 Reputation points Microsoft External Staff Moderator
    2025-10-23T04:50:57.7166667+00:00

    Hi Tim A. Smith,

    I completely understand your frustration you’re absolutely right that “looking at logs after the fact” isn’t a satisfying or proactive solution, especially when uptime directly impacts your customers. Let’s make sure you have the right tools and setup so that next time, you get notified immediately and have more visibility into what’s happening in real time.

    Here are some actionable steps you can take to improve detection, alerting, and resilience:

    1. Set Up Real-Time Alerts
    • Use Azure Monitor Alerts on key IoT Hub metrics such as:
      • Connected devices
      • C2D messages completed
      • Telemetry messages sent
      • Throttled requests
    • Configure Action Groups to send SMS, voice call, or push notifications (via the Azure mobile app) this way you’re alerted right away, not just by email. please refer this Create and manage action groups in Azure Monitor
    1. Enable IoT Hub Diagnostic Settings

    Enable diagnostic logs and send them to Log Analytics or Event Hub for near real-time tracking of connection state changes, authentication failures, or throttling.

    • You can then build custom alerts on specific log patterns for example, if a large number of devices disconnect within a short window.
    1. Use Azure Service Health for Regional Outages

    Subscribe to Azure Service Health alerts for IoT Hub and its dependent services in your region.

    1. Add Application-Level Resilience

    Even though this specific incident was transient on the Azure side, adding the following at the device/application level can help:

    • Implement automatic retry with exponential backoff in device SDKs.
    • Cache telemetry locally for short outages and resend once the connection recovers.
    • Optionally, use multiple IoT Hubs (primary + secondary) for high availability in critical scenarios.

    Thank you!

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. SRILAKSHMI C 8,545 Reputation points Microsoft External Staff Moderator
    2025-10-17T06:50:12.2366667+00:00

    Hello Tim A. Smith,

    Welcome to Microsoft Q&A & Thank you for reaching out to us.

    It sounds like you’re dealing with some confusing connectivity issues with your IoT Hub, and I understand how frustrating that must be. I’ve reviewed the alert you received and the details you provided about your devices, and I’ve put together a breakdown of what likely happened and recommended next steps.

    What Happened

    The message you received indicates that your IoT Hub ‘sc364’ experienced performance and scalability issues during the early morning hours (1:34–1:49 AM and 1:59–2:23 AM). Metrics during this period showed declines in cloud-to-device (C2D) message deliveries, device-to-cloud telemetry, connected devices, and total device data usage. These drops suggest that device communication to and from the hub was temporarily affected, reducing throughput and reliability.

    Based on your setup with ~300 devices spread across multiple states and different networks (AT&T, Verizon, and wired connections) it’s unlikely that the issue originated solely on your end. While network or authentication issues could contribute in some cases, the alert points toward a transient performance or capacity issue on the Azure IoT Hub itself. This can occur due to internal throttling, maintenance, or temporary scaling limits within the Azure region.

    What You Can Do

    Even though things are running normally now, here are steps to investigate and prevent similar incidents in the future:

    Use Azure Monitor to review any alerts or anomalies for ‘sc364’ during the periods you observed. Look specifically for device connectivity drops, throttling, or authentication failures.

    Review IoT Hub metrics and logs for patterns of failed authentications, throttled requests, or network errors. This can confirm whether the hub was experiencing temporary performance issues or if specific devices were affected.

    Check that your devices’ network settings are correct and stable for consistent connections to ‘sc364’. While this seems unlikely given your distributed setup, verifying configurations ensures that no device-side issues contribute to connectivity drops.

    Track device-specific metrics to identify any unusual behavior in particular devices or groups. This can help isolate if a subset of devices is experiencing intermittent failures.

    Enable and review IoT Hub diagnostic logs. Look for error codes such as 404104 (device connection closed remotely) that may provide guidance on required troubleshooting steps.

    Since your devices are geographically diverse, external factors such as regional network events or Azure infrastructure updates could temporarily impact connectivity.

    If possible, retrieve logs directly from your devices to see if they report any errors or unusual behavior during the affected periods. This can provide further context and help distinguish between hub-side versus device-side issues.

    The alert points to a transient IoT Hub performance event rather than a configuration or authentication issue on your end. While everything is operating normally now, monitoring metrics, enabling alerts, validating device networks, and reviewing diagnostic logs are recommended to ensure ongoing reliability. If similar issues occur repeatedly, contacting Microsoft Support is advisable so they can review regional IoT Hub service health logs for your instance.

    Please refer this Unexpectedly disconnected from IoT Hub

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.