Hello @Martin Kutlák,
Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.
I understand your question about the crashes and performance issues with your Application Gateway and the WAF running in detection mode. The throughput on the gateway is around 100 requests per second and remains fairly consistent throughout the day, as these are IoT device requests from across the EU.
To help enhance resilience and availability, here are some recommended next steps:
Activate and assess diagnostics:
Enable Application Gateway diagnostics, including Access, Performance, and Firewall logs, and direct them to Log Analytics.
Monitor metrics such as Compute Unit utilization, Unhealthy host count, and Failed requests to analyze WAF activity in relation to traffic spikes.
Check backend probe logs to ensure that backend servers are not causing the degraded health status.
Adjust the WAF Policy Settings: In detection mode, use the logs to identify false positives or unnecessary inspections.
1. Exclude non-critical headers or body fields for IoT traffic.
2. Disable or customize rule groups that generate high false positives (for example, bot detection or SQLi checks if they don’t apply).
3. Ensure you’re using the latest OWASP ruleset (e.g., 3.2.5 or later) for improved performance and fewer false positives.
Revise the scaling strategy as needed:
Make sure you are using the Application Gateway v2 SKU, as it supports autoscaling and zone redundancy.
Set a minimum instance count (for example, 2–4) to help prevent instability during scale-in and allow a higher maximum (up to 10–20 if necessary).
Enabling zone redundancy is advised to improve resiliency across different regions.
Additional Resilience Steps:
You may want to implement rate limiting or set up custom WAF rules to manage high-traffic devices.
If stability problems continue, consider whether placing Azure Front Door in front of AGW could help with traffic inspection.
Next step: Could you please confirm the following:
Which SKU (v1 or v2) and region is your AGW deployed in?
What WAF ruleset version are you currently using?
With this information, I can offer more accurate guidance. If you still encounter issues after these steps, I suggest submitting a Microsoft support case with your diagnostic data for further analysis.
Kindly let us know if the above helps or you need further assistance on this issue.
Please do not forget to "Accept the answer” and “up-vote it” wherever the information provided helps you, this can be beneficial to other community members__.__ It would be greatly appreciated and helpful to others.