502 Bad Gateway Error: Complete Troubleshooting Guide for Nginx, Azure, AWS & Cloudflare
Fix 502 Bad Gateway errors across Nginx, Azure Application Gateway, AWS ALB, and Cloudflare. Step-by-step solutions for developers and DevOps engineers.
- 502 Bad Gateway occurs when a proxy server receives an invalid response from an upstream server
- Common causes include backend server failures, timeout configurations, network connectivity issues, and SSL/TLS mismatches
- Check backend server health first, then verify proxy configurations and network connectivity
- Platform-specific solutions exist for Nginx (upstream config), Azure Application Gateway (health probes), AWS ALB (target groups), and Cloudflare (origin server settings)
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Check Backend Server Health | First troubleshooting step | 1-5 minutes | Low |
| Restart Proxy Service | Quick fix for temporary issues | 2-3 minutes | Medium |
| Adjust Timeout Settings | Persistent timeout errors | 5-10 minutes | Low |
| Review Proxy Configuration | Configuration-related issues | 10-30 minutes | Medium |
| Update SSL/TLS Settings | Certificate or protocol issues | 15-45 minutes | High |
| Scale Backend Resources | High load scenarios | 30-60 minutes | Low |
Understanding the 502 Bad Gateway Error
A 502 Bad Gateway error occurs when a server acting as a gateway or proxy receives an invalid response from an upstream server. This is an HTTP status code that indicates a server-to-server communication problem, not a client-side issue.
The error manifests differently across platforms:
- Nginx: "502 Bad Gateway nginx/1.x.x"
- Azure Application Gateway: "502 - Web server received an invalid response while acting as a gateway or proxy server"
- AWS Application Load Balancer: "502 Bad Gateway" with CloudWatch metrics showing target failures
- Cloudflare: "502 Bad gateway" with Cloudflare branding
Common Root Causes
Backend Server Issues
- Application crashes or hangs
- Out of memory conditions
- Database connection failures
- Process limits exceeded
Network Connectivity Problems
- Firewall blocking requests
- DNS resolution failures
- Network partitions
- Routing issues
Configuration Mismatches
- Incorrect upstream server addresses
- Port misconfigurations
- SSL/TLS protocol mismatches
- Load balancer health check failures
Resource Exhaustion
- Connection pool exhaustion
- File descriptor limits
- Memory pressure
- CPU saturation
Step-by-Step Troubleshooting Process
Step 1: Identify the Proxy Layer
First, determine which component is returning the 502 error:
# Check HTTP headers for server identification
curl -I https://your-domain.com
# Look for Server header or other identifying information
# Examples:
# Server: nginx/1.23.1
# Server: Microsoft-Azure-Application-Gateway/v2
# Server: cloudflare
Step 2: Verify Backend Server Health
The most common cause is backend server failure:
# Check if backend servers are responding
# Replace with your actual backend IPs/ports
curl -v http://backend-server:8080/health
# For multiple backends, test each one
for server in server1 server2 server3; do
echo "Testing $server:"
curl -f http://$server:8080/ || echo "Failed"
done
Step 3: Platform-Specific Diagnostics
Nginx Troubleshooting
Check Nginx error logs for specific details:
# View recent error logs
sudo tail -f /var/log/nginx/error.log
# Common 502 error patterns to look for:
# "connect() failed (111: Connection refused)"
# "upstream prematurely closed connection"
# "upstream timed out"
Verify upstream configuration:
# Check /etc/nginx/sites-available/your-site
upstream backend {
server backend1.example.com:8080 max_fails=3 fail_timeout=30s;
server backend2.example.com:8080 max_fails=3 fail_timeout=30s;
}
server {
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}
Azure Application Gateway V2
Check backend health status:
# Using Azure CLI
az network application-gateway show-backend-health \
--name myAppGateway \
--resource-group myResourceGroup
# Check health probe configuration
az network application-gateway probe show \
--gateway-name myAppGateway \
--resource-group myResourceGroup \
--name myHealthProbe
Common Azure Application Gateway issues:
- Health probes failing due to incorrect paths
- NSG rules blocking traffic
- Backend pool configuration errors
- SSL certificate mismatches
AWS Application Load Balancer
Check target group health:
# Using AWS CLI
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/1234567890123456
# Check ALB logs in CloudWatch
aws logs filter-log-events \
--log-group-name /aws/applicationloadbalancer/my-alb \
--filter-pattern "502"
Cloudflare
Check origin server connectivity:
# Test direct connection to origin
curl -H "Host: yourdomain.com" http://origin-ip-address/
# Check SSL configuration
openssl s_client -connect origin-ip:443 -servername yourdomain.com
Step 4: Configuration Fixes
Nginx Configuration Optimizations
# Increase timeout values in /etc/nginx/nginx.conf
http {
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
proxy_buffers 8 32k;
proxy_buffer_size 64k;
# Increase worker connections
events {
worker_connections 2048;
}
# Add upstream keepalive
upstream backend {
server backend1:8080;
server backend2:8080;
keepalive 32;
}
}
Docker-specific Nginx Issues
For Docker deployments, ensure proper networking:
# Check container connectivity
docker network ls
docker network inspect bridge
# Test inter-container communication
docker exec nginx-container ping backend-container
Step 5: Application-Level Fixes
Python/Django Applications
Common issues with Gunicorn/uWSGI:
# Check Gunicorn processes
ps aux | grep gunicorn
# Restart with proper configuration
gunicorn --workers 4 --timeout 120 --bind 0.0.0.0:8000 myapp.wsgi:application
# For high traffic, increase worker count
gunicorn --workers $((2 * $(nproc) + 1)) --timeout 300 myapp.wsgi:application
Node.js Applications
# Check Node.js process health
pm2 status
# Restart with cluster mode
pm2 start app.js -i max
# Check for memory leaks
node --inspect app.js
Step 6: Monitoring and Prevention
Set up proper monitoring to prevent future occurrences:
# Nginx status monitoring
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
# Set up log rotation
sudo logrotate -f /etc/logrotate.d/nginx
Advanced Troubleshooting Techniques
Network-Level Debugging
# Use tcpdump to capture traffic
sudo tcpdump -i any -s 0 -w capture.pcap port 80 or port 443
# Analyze with wireshark or tshark
tshark -r capture.pcap -Y "http.response.code == 502"
# Check network connectivity
mtr --report your-backend-server.com
SSL/TLS Troubleshooting
# Test SSL handshake
openssl s_client -connect your-server:443 -servername your-domain.com
# Check certificate chain
curl -vI https://your-domain.com
# Verify cipher compatibility
nmap --script ssl-enum-ciphers -p 443 your-server.com
Performance Optimization
For high-traffic scenarios:
# Increase system limits
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
# Tune kernel parameters
echo "net.core.somaxconn = 65536" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 65536" >> /etc/sysctl.conf
sysctl -p
Recovery Procedures
When dealing with production outages:
- Immediate Response: Remove failing backends from load balancer
- Temporary Fix: Implement circuit breaker patterns
- Long-term Solution: Address root cause and improve monitoring
Automated Recovery Scripts
#!/bin/bash
# health-check-and-restart.sh
HEALTH_URL="http://localhost:8080/health"
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
if curl -f $HEALTH_URL; then
echo "Health check passed"
exit 0
else
echo "Health check failed, attempt $i/$MAX_RETRIES"
if [ $i -eq $MAX_RETRIES ]; then
echo "Restarting service..."
systemctl restart your-app-service
sleep 10
fi
fi
done
Frequently Asked Questions
#!/bin/bash
# 502 Bad Gateway Diagnostic Script
# Run this script to automatically diagnose common 502 issues
set -e
echo "=== 502 Bad Gateway Diagnostic Tool ==="
echo "Starting comprehensive health check..."
# Function to check service status
check_service() {
local service=$1
echo "Checking $service status..."
if systemctl is-active --quiet $service; then
echo "✓ $service is running"
else
echo "✗ $service is not running"
echo " Try: sudo systemctl restart $service"
fi
}
# Function to check port connectivity
check_port() {
local host=$1
local port=$2
echo "Testing connection to $host:$port..."
if timeout 5 bash -c "</dev/tcp/$host/$port"; then
echo "✓ Port $port is open on $host"
else
echo "✗ Cannot connect to $host:$port"
fi
}
# Function to analyze logs
analyze_logs() {
local logfile=$1
if [ -f "$logfile" ]; then
echo "Recent 502 errors in $logfile:"
tail -100 "$logfile" | grep -i "502\|bad gateway\|upstream" | tail -5
fi
}
# Main diagnostic checks
echo "\n1. Checking web server services..."
for service in nginx apache2 httpd; do
if systemctl list-units --full -all | grep -Fq "$service.service"; then
check_service $service
fi
done
# Check backend application services
echo "\n2. Checking application services..."
for service in gunicorn uwsgi pm2; do
if command -v $service &> /dev/null || systemctl list-units | grep -q $service; then
check_service $service 2>/dev/null || echo "$service: Check manually"
fi
done
# Test common backend ports
echo "\n3. Testing backend connectivity..."
check_port localhost 8000 # Common Django/Flask port
check_port localhost 3000 # Common Node.js port
check_port localhost 8080 # Common Java/Spring port
check_port localhost 9000 # Common PHP-FPM port
# Analyze log files
echo "\n4. Analyzing log files..."
analyze_logs "/var/log/nginx/error.log"
analyze_logs "/var/log/apache2/error.log"
analyze_logs "/var/log/httpd/error_log"
# Check system resources
echo "\n5. System resource check..."
echo "Memory usage:"
free -h | grep Mem
echo "\nDisk usage:"
df -h / | tail -1
echo "\nLoad average:"
uptime
# Check for common configuration issues
echo "\n6. Configuration validation..."
if command -v nginx &> /dev/null; then
echo "Nginx configuration test:"
sudo nginx -t 2>&1 || echo "Nginx config has errors"
fi
if command -v apache2ctl &> /dev/null; then
echo "Apache configuration test:"
sudo apache2ctl configtest 2>&1 || echo "Apache config has errors"
fi
echo "\n=== Diagnostic Complete ==="
echo "If issues persist, check:"
echo "- Application logs for crashes"
echo "- Database connectivity"
echo "- SSL certificate validity"
echo "- Firewall rules"
echo "- DNS resolution"
# Quick fix suggestions
echo "\n=== Quick Fix Commands ==="
echo "# Restart web server:"
echo "sudo systemctl restart nginx # or apache2"
echo "\n# Restart application:"
echo "sudo systemctl restart gunicorn # or your app service"
echo "\n# Check detailed logs:"
echo "sudo tail -f /var/log/nginx/error.log"
echo "\n# Test backend directly:"
echo "curl -v http://localhost:8000/"Error Medic Editorial
Our team of senior DevOps and SRE engineers has collectively resolved thousands of production incidents across diverse technology stacks. We specialize in creating comprehensive troubleshooting guides that provide actionable solutions for complex system errors, drawing from real-world experience managing high-scale distributed systems.
Sources
- https://nginx.org/en/docs/http/ngx_http_proxy_module.html
- https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-troubleshooting-502
- https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html
- https://developers.cloudflare.com/support/troubleshooting/cloudflare-errors/troubleshooting-cloudflare-5xx-errors/
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502