Health Probe Failing After Azure Load Balancer SKU Upgrade – One SQL Server Unhealthy

GOPINATH M 60 Reputation points
2025-10-03T12:10:31.6133333+00:00

We have an Azure setup with a SQL pool containing two SQL servers (SQLServer-0 and SQLServer-1) behind a Load Balancer. The Load Balancer is configured with a custom TCP health probe on port 59999.

Everything was working as expected prior to upgrading the Load Balancer from Basic SKU to Standard SKU. After the SKU upgrade and IP change, one of the backend SQL servers (SQLServer-0) consistently reports as unhealthy, while the other (SQLServer-1) remains healthy.The servers are in the same subnet and have identical NSG, firewall, and health probe configurations.

Validated that no internal app or process is listening on 59999 on the affected server.

Noticed the same behavior in other customer environments after similar SKU upgrades, always one server in the pool appears unhealthy.

Troubleshooting Taken:

  1. Verified NSG and OS firewall rules, both servers are configured identically.
  2. On the unhealthy SQL VM (SQLServer-0), port 59999 is not in a listening state.
  3. Checked for port conflicts or usage, nothing is bound to port 59999.
  4. Telnet tests between servers:
    • From SQLServer-1 → SQLServer-0 (port 59999) -- Fails
    • From SQLServer-0 → SQLServer-1 (port 59999) -- Succeed

Could this be a known issue or bug related to Standard SKU Load Balancer behavior, especially regarding probe source IP or firewall behavior?

Are there any known issues with probe traffic routing or port binding after SKU migration?

We would appreciate any insights or recommended steps to resolve this issue, as it's now impacting multiple SQL load-balanced deployments.

User's image

SQL Server on Azure Virtual Machines
0 comments No comments
{count} votes

Answer accepted by question author
  1. Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator
    2025-10-09T06:13:36.6366667+00:00

    Hello GOPINATH M

    Thank you for asking your question on the Microsoft Q&A Portal.

    You’re seeing one SQL Server (SQLServer-0) consistently marked unhealthy after upgrading your Load Balancer from Basic to Standard SKU, even though both servers have identical NSG, firewall, and probe settings and port 59999 is not in use on the unhealthy server.

    This is a known behavior with Standard Load Balancer: health probes originate from the internal load balancer IP (not from an Azure infrastructure IP like Basic SKU), which can cause connectivity issues if backend VMs have restrictive NSGs or OS firewalls that don’t allow traffic from the LB’s internal IP.

    Could you please share more details that will help me narrow this down:

    1. Can you confirm the exact source IP address the health probe is using? You can capture it by enabling NSG flow logs or running tcpdump on the unhealthy server during a probe.
    2. Are there any “Deny” rules in the NSG applied to SQLServer-0 that might block traffic from the Load Balancer’s frontend IP or backend pool subnet?
    3. Have you tested opening port 59999 to “Any” source temporarily just to verify if the issue resolves? (Don’t leave it open in production.)
    4. Is the Load Balancer’s frontend IP in the same VNet/subnet as the backend VMs? If not, ensure route tables and NSGs allow cross-subnet traffic.

    Reference links (all validated and working):

    Let me know details for the above questions. Will try to provide more curated response as per your inputs.

    Thanks

    Pratyush


1 additional answer

Sort by: Most helpful
  1. JimmySalian-2011 44,696 Reputation points
    2025-10-04T08:33:38.7033333+00:00

    Hi Gopinath,

    The configuration you have is supported and this is not a unique setup it is simple LB with 2 Backend services/servers. Did you check the log file after the upgrade process? How did you carried out the upgrade process ?

    On the unhealthy SQL VM (SQLServer-0), port 59999 is not in a listening state.

    I think the issue is related to the SQL AVG, if the port is not up and running it will cause the probe to fail and state will be down - Please check the troubleshooting steps for this - https://free.blessedness.top/en-us/troubleshoot/sql/database-engine/availability-groups/troubleshooting-availability-group-failover

    Hope this helps.

    JS

    ==

    Please Accept the answer if the information helped you. This will help us and others in the community as well.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.