Health Probe Failing After Azure Load Balancer SKU Upgrade – One SQL Server Unhealthy

Question

Health Probe Failing After Azure Load Balancer SKU Upgrade – One SQL Server Unhealthy

GOPINATH M 60

We have an Azure setup with a SQL pool containing two SQL servers (SQLServer-0 and SQLServer-1) behind a Load Balancer. The Load Balancer is configured with a custom TCP health probe on port 59999.

Everything was working as expected prior to upgrading the Load Balancer from Basic SKU to Standard SKU. After the SKU upgrade and IP change, one of the backend SQL servers (SQLServer-0) consistently reports as unhealthy, while the other (SQLServer-1) remains healthy.The servers are in the same subnet and have identical NSG, firewall, and health probe configurations.

Validated that no internal app or process is listening on 59999 on the affected server.

Noticed the same behavior in other customer environments after similar SKU upgrades, always one server in the pool appears unhealthy.

Troubleshooting Taken:

Verified NSG and OS firewall rules, both servers are configured identically.
On the unhealthy SQL VM (SQLServer-0), port 59999 is not in a listening state.
Checked for port conflicts or usage, nothing is bound to port 59999.
Telnet tests between servers:
- From SQLServer-1 → SQLServer-0 (port 59999) -- Fails
- From SQLServer-0 → SQLServer-1 (port 59999) -- Succeed

Could this be a known issue or bug related to Standard SKU Load Balancer behavior, especially regarding probe source IP or firewall behavior?

Are there any known issues with probe traffic routing or port binding after SKU migration?

We would appreciate any insights or recommended steps to resolve this issue, as it's now impacting multiple SQL load-balanced deployments.

User's image

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Pratyush Vashistha 4,255 Microsoft External Staff Moderator

Hello GOPINATH M

Thank you for asking your question on the Microsoft Q&A Portal.

You’re seeing one SQL Server (SQLServer-0) consistently marked unhealthy after upgrading your Load Balancer from Basic to Standard SKU, even though both servers have identical NSG, firewall, and probe settings and port 59999 is not in use on the unhealthy server.

This is a known behavior with Standard Load Balancer: health probes originate from the internal load balancer IP (not from an Azure infrastructure IP like Basic SKU), which can cause connectivity issues if backend VMs have restrictive NSGs or OS firewalls that don’t allow traffic from the LB’s internal IP.

Could you please share more details that will help me narrow this down:

Can you confirm the exact source IP address the health probe is using? You can capture it by enabling NSG flow logs or running tcpdump on the unhealthy server during a probe.
Are there any “Deny” rules in the NSG applied to SQLServer-0 that might block traffic from the Load Balancer’s frontend IP or backend pool subnet?
Have you tested opening port 59999 to “Any” source temporarily just to verify if the issue resolves? (Don’t leave it open in production.)
Is the Load Balancer’s frontend IP in the same VNet/subnet as the backend VMs? If not, ensure route tables and NSGs allow cross-subnet traffic.

Reference links (all validated and working):

Health probe behavior in Standard Load Balancer: https://free.blessedness.top/en-us/azure/load-balancer/load-balancer-custom-probe-overview
Troubleshoot health probe failures: https://free.blessedness.top/en-us/azure/load-balancer/load-balancer-troubleshoot-health-probe-status
Source IP for health probes in Standard SKU: https://free.blessedness.top/en-us/azure/load-balancer/load-balancer-standard-diagnostics#health-probe-source-ip

Let me know details for the above questions. Will try to provide more curated response as per your inputs.

Thanks

Pratyush

Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-10T03:05:17.06+00:00

Hello GOPINATH M,

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks

Pratyush
GOPINATH M 60 Reputation points

2025-10-10T06:02:57.6966667+00:00
Hi @Pratyush Vashistha ,

Thanks for your valuable inputs.

As I mentioned earlier, both server are in same subnet range (10.0.12.4 &10.0.12.5). No more NSG or route being placed in that.

Even Port 59999 is open and allowed in both server as everything is similar as I verified well. However this port 59999 is only listening and establishing the connection in working SQL server(10.0.12.4), not in unhealthy server(10.0.12.5).

Same pattern to all other customer infra as well.

Here is the answer for your question:

Can you confirm the exact source IP address the health probe is using? You can capture it by enabling NSG flow logs or running tcpdump on the unhealthy server during a probe. ---Yes, we are using exact IP addresses and Port 59999

Are there any “Deny” rules in the NSG applied to SQLServer-0 that might block traffic from the Load Balancer’s frontend IP or backend pool subnet? --No more segregation among the server as we applied rule to subnet level.

Have you tested opening port 59999 to “Any” source temporarily just to verify if the issue resolves? (Don’t leave it open in production.) ---
I tested the port check connection, below the result,

from SQLServer-1 → SQLServer-0 (port 59999) -- Fails

From SQLServer-0 → SQLServer-1 (port 59999) -- Succeed

Is the Load Balancer’s frontend IP in the same VNet/subnet as the backend VMs? If not, ensure route tables and NSGs allow cross-subnet traffic. -- Frontend IP (10.0.12.9) is same subnet range of backend pool.
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-10T10:13:58.59+00:00
Hello GOPINATH M,

Thanks for sharing the detailed context and troubleshooting steps—really helpful in narrowing down the issue.

Given the behavior you're observing post-upgrade to the Standard SKU Load Balancer, I wanted to check a few things that might help identify any unsupported configuration scenarios:

Are both SQLServer-0 and SQLServer-1 part of the same availability set? If yes, are there any other VMs (SQL or otherwise) in that availability set that are not behind the same load balancer?

On the affected VM’s NIC (SQLServer-0), does the secondary IP configuration have Floating IP enabled? This setting can impact probe behavior, especially after SKU upgrades.

These questions stem from known unsupported scenarios during Basic to Standard Load Balancer upgrades. You can refer to this Microsoft documentation for more details:

https://free.blessedness.top/en-us/azure/load-balancer/upgrade-basic-standard-with-powershell#unsupported-scenarios

It’s possible that the health probe issue is related to one of these configurations. Let me know what you find, and I’ll be happy to help further based on that.

Quick Reference:

https://free.blessedness.top/en-us/azure/azure-sql/virtual-machines/windows/availability-group-load-balancer-portal-configure?view=azuresql
GOPINATH M 60 Reputation points

2025-10-10T13:35:28.3566667+00:00
Hi @Pratyush Vashistha

Thanks for your input.

Are both SQLServer-0 and SQLServer-1 part of the same availability set? If yes, are there any other VMs (SQL or otherwise) in that availability set that are not behind the same load balancer? --Yes, There are 3 VMs are in same availability, out of all, we used only 2 VM in the backend pool.

On the affected VM’s NIC (SQLServer-0), does the secondary IP configuration have Floating IP enabled? This setting can impact probe behavior, especially after SKU upgrades. -- Only primary IP with floating IP enabled.

And I dont think, this Availability set would be the problem with this, because checked with other client who has the same setup issue like this and availability set has only 2 SQL VM. There also having this unhealthy problem with always 1 SQL VM.
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-10T15:49:08.21+00:00

This seems to be the issue of the following.

Are both SQLServer-0 and SQLServer-1 part of the same availability set? If yes, are there any other VMs (SQL or otherwise) in that availability set that are not behind the same load balancer? --Yes, There are 3 VMs are in same availability, out of all, we used only 2 VM in the backend pool.

the issue likely originated from the unsupported scenario outlined in the document. And to fix the issue, we may need to go through the configuration of SQL server availability group and SQL server firewall.

The root cause of the issue of unhealthy probe which is on the unhealthy SQL VM (SQLServer-0), port 59999 is not in a listening state.

If the port is not in listening state in the affected server, then health probe of LB will fail. This is by design.

Is it possible for you to share the load balancer upgrade logs. Check the log file Start-AzBasicLoadBalancerUpgrade.log for details?

Documentation I am referring to is as follows:

https://free.blessedness.top/en-us/azure/azure-sql/virtual-machines/windows/availability-group-load-balancer-portal-configure?view=azuresql

Thanks

Pratyush
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-13T03:10:06.6333333+00:00

Hello Gopinath,

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thanks

Pratyush
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-15T05:13:04.4+00:00

Hi Gopinath,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Please "Accept as Answer" if the answer provided is useful, so that you can help others in the community looking for remediation for similar issues.

Thanks

Pratyush
GOPINATH M 60 Reputation points

2025-10-15T07:32:46.78+00:00

Hi @Pratyush Vashistha

Thanks for your input.

Yet we are trying to workaround your script, so please give us sometime as we are working with the customer.
GOPINATH M 60 Reputation points

2025-10-16T11:48:53.89+00:00

Hi @Pratyush Vashistha

As per your request, we ran the script and verified as both SQL server configured and allowed with the port of 59999.

And I believe it is not related with the availability group.

Please let us know If you have any suggestions or idea to resolve the issue.
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-16T17:22:42.42+00:00
Hi GOPINATH M,

Thank you for the update and for running the verification.

Since you’ve confirmed this is not an Availability Group, we treat both VMs as standalone instances, which means each must independently respond to health probes. There is no “shared” probe logic.

Please run this exact command on both SQL servers (SQLServer-0 and SQLServer-1):

Get-NetTCPConnection -LocalPort 59999 -ErrorAction SilentlyContinue | Select-Object LocalAddress,LocalPort,State

Or, from an elevated Command Prompt:

netstat -ano | findstr :59999

What to look for:

On the healthy server (SQLServer-1): you should see a line with LISTENING.

On the unhealthy server (SQLServer-0): if you see nothing, then no service is listening—and that is why the probe fails.

This is the most common root cause in Standard SKU migrations, and it matches your symptom exactly:

“One server always unhealthy, even though configs appear identical.”

If the output shows no listener on SQLServer-0:

You must deploy a lightweight listener on port 59999 on that VM. Example (run as a background service):

$port = 59999 $listener = New-Object System.Net.Sockets.TcpListener([System.Net.IPAddress]::Any, $port) $listener.Start() Write-Host "Health probe listener active on port $port" while ($true) { Start-Sleep -Seconds 60 }

Note: If this is a production system, wrap this in a Windows service or use a supported probe responder (e.g., via a custom app or cluster resource).

If both servers do show LISTENING:

Then we need to check:

Is the listener bound to all IPs (0.0.0.0 or [::]) or only 127.0.0.1? → It must bind to 0.0.0.0 to accept external probes.

Is there a host-based firewall (Windows Firewall or third-party) blocking inbound traffic despite NSG rules?

Are there multiple NICs or IP configurations causing binding issues?

Kindly excuse for any typo or syntax error in the commands. Please share the output of the netstat or Get-NetTCPConnection command from both servers then will dig down further into this.

Reference:

Azure Load Balancer health probe overview

Thanks

Pratyush
GOPINATH M 60 Reputation points

2025-10-17T10:15:27.9+00:00

Tons of thanks to you @Pratyush Vashistha .

I ran the listener script and it fixed the issue as a result unhealthy server started to listening the port 59999 and it became healthy.

Thanks for your support all over through. Have a great day!
Pratyush Vashistha 4,255 Reputation points Microsoft External Staff Moderator

2025-10-17T10:17:50.39+00:00

Glad to help you GOPINATH M, please accept this as an answer so that it will help others in the community too.

Answer 2

JimmySalian-2011 44,696

Hi Gopinath,

The configuration you have is supported and this is not a unique setup it is simple LB with 2 Backend services/servers. Did you check the log file after the upgrade process? How did you carried out the upgrade process ?

On the unhealthy SQL VM (SQLServer-0), port 59999 is not in a listening state.

I think the issue is related to the SQL AVG, if the port is not up and running it will cause the probe to fail and state will be down - Please check the troubleshooting steps for this - https://free.blessedness.top/en-us/troubleshoot/sql/database-engine/availability-groups/troubleshooting-availability-group-failover

Hope this helps.

JS

==

Please Accept the answer if the information helped you. This will help us and others in the community as well.

GOPINATH M 60 Reputation points

2025-10-06T07:38:29.0366667+00:00

I recently upgraded the Load Balancer using a script, after ensuring all the necessary pre-requisite steps were completed.

I don’t believe this issue is related to the SQL Server side, as we have over 30 Load Balancer setups across different customers using SQL-based load balancing, and all of them are experiencing the same issue, one backend server consistently shows as unhealthy in the pool.

Additionally, I’ve confirmed that port 59999 is not being used by any other service.

Share via

Health Probe Failing After Azure Load Balancer SKU Upgrade – One SQL Server Unhealthy

1 additional answer

Your answer