Error Medic

502 Bad Gateway Error: Complete Troubleshooting Guide for Nginx, Azure, AWS & Cloudflare

Fix 502 Bad Gateway errors across Nginx, Azure Application Gateway, AWS ALB, and Cloudflare. Step-by-step solutions for developers and DevOps engineers.

Last updated:
Last verified:
1,819 words
Key Takeaways
  • 502 Bad Gateway occurs when a proxy server receives an invalid response from an upstream server
  • Common causes include backend server failures, timeout configurations, network connectivity issues, and SSL/TLS mismatches
  • Check backend server health first, then verify proxy configurations and network connectivity
  • Platform-specific solutions exist for Nginx (upstream config), Azure Application Gateway (health probes), AWS ALB (target groups), and Cloudflare (origin server settings)
502 Bad Gateway Fix Approaches Compared
MethodWhen to UseTimeRisk
Check Backend Server HealthFirst troubleshooting step1-5 minutesLow
Restart Proxy ServiceQuick fix for temporary issues2-3 minutesMedium
Adjust Timeout SettingsPersistent timeout errors5-10 minutesLow
Review Proxy ConfigurationConfiguration-related issues10-30 minutesMedium
Update SSL/TLS SettingsCertificate or protocol issues15-45 minutesHigh
Scale Backend ResourcesHigh load scenarios30-60 minutesLow

Understanding the 502 Bad Gateway Error

A 502 Bad Gateway error occurs when a server acting as a gateway or proxy receives an invalid response from an upstream server. This is an HTTP status code that indicates a server-to-server communication problem, not a client-side issue.

The error manifests differently across platforms:

  • Nginx: "502 Bad Gateway nginx/1.x.x"
  • Azure Application Gateway: "502 - Web server received an invalid response while acting as a gateway or proxy server"
  • AWS Application Load Balancer: "502 Bad Gateway" with CloudWatch metrics showing target failures
  • Cloudflare: "502 Bad gateway" with Cloudflare branding

Common Root Causes

  1. Backend Server Issues

    • Application crashes or hangs
    • Out of memory conditions
    • Database connection failures
    • Process limits exceeded
  2. Network Connectivity Problems

    • Firewall blocking requests
    • DNS resolution failures
    • Network partitions
    • Routing issues
  3. Configuration Mismatches

    • Incorrect upstream server addresses
    • Port misconfigurations
    • SSL/TLS protocol mismatches
    • Load balancer health check failures
  4. Resource Exhaustion

    • Connection pool exhaustion
    • File descriptor limits
    • Memory pressure
    • CPU saturation

Step-by-Step Troubleshooting Process

Step 1: Identify the Proxy Layer

First, determine which component is returning the 502 error:

# Check HTTP headers for server identification
curl -I https://your-domain.com

# Look for Server header or other identifying information
# Examples:
# Server: nginx/1.23.1
# Server: Microsoft-Azure-Application-Gateway/v2
# Server: cloudflare

Step 2: Verify Backend Server Health

The most common cause is backend server failure:

# Check if backend servers are responding
# Replace with your actual backend IPs/ports
curl -v http://backend-server:8080/health

# For multiple backends, test each one
for server in server1 server2 server3; do
  echo "Testing $server:"
  curl -f http://$server:8080/ || echo "Failed"
done

Step 3: Platform-Specific Diagnostics

Nginx Troubleshooting

Check Nginx error logs for specific details:

# View recent error logs
sudo tail -f /var/log/nginx/error.log

# Common 502 error patterns to look for:
# "connect() failed (111: Connection refused)"
# "upstream prematurely closed connection"
# "upstream timed out"

Verify upstream configuration:

# Check /etc/nginx/sites-available/your-site
upstream backend {
    server backend1.example.com:8080 max_fails=3 fail_timeout=30s;
    server backend2.example.com:8080 max_fails=3 fail_timeout=30s;
}

server {
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }
}
Azure Application Gateway V2

Check backend health status:

# Using Azure CLI
az network application-gateway show-backend-health \
  --name myAppGateway \
  --resource-group myResourceGroup

# Check health probe configuration
az network application-gateway probe show \
  --gateway-name myAppGateway \
  --resource-group myResourceGroup \
  --name myHealthProbe

Common Azure Application Gateway issues:

  • Health probes failing due to incorrect paths
  • NSG rules blocking traffic
  • Backend pool configuration errors
  • SSL certificate mismatches
AWS Application Load Balancer

Check target group health:

# Using AWS CLI
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/1234567890123456

# Check ALB logs in CloudWatch
aws logs filter-log-events \
  --log-group-name /aws/applicationloadbalancer/my-alb \
  --filter-pattern "502"
Cloudflare

Check origin server connectivity:

# Test direct connection to origin
curl -H "Host: yourdomain.com" http://origin-ip-address/

# Check SSL configuration
openssl s_client -connect origin-ip:443 -servername yourdomain.com

Step 4: Configuration Fixes

Nginx Configuration Optimizations
# Increase timeout values in /etc/nginx/nginx.conf
http {
    proxy_connect_timeout 300s;
    proxy_send_timeout 300s;
    proxy_read_timeout 300s;
    proxy_buffers 8 32k;
    proxy_buffer_size 64k;
    
    # Increase worker connections
    events {
        worker_connections 2048;
    }
    
    # Add upstream keepalive
    upstream backend {
        server backend1:8080;
        server backend2:8080;
        keepalive 32;
    }
}
Docker-specific Nginx Issues

For Docker deployments, ensure proper networking:

# Check container connectivity
docker network ls
docker network inspect bridge

# Test inter-container communication
docker exec nginx-container ping backend-container

Step 5: Application-Level Fixes

Python/Django Applications

Common issues with Gunicorn/uWSGI:

# Check Gunicorn processes
ps aux | grep gunicorn

# Restart with proper configuration
gunicorn --workers 4 --timeout 120 --bind 0.0.0.0:8000 myapp.wsgi:application

# For high traffic, increase worker count
gunicorn --workers $((2 * $(nproc) + 1)) --timeout 300 myapp.wsgi:application
Node.js Applications
# Check Node.js process health
pm2 status

# Restart with cluster mode
pm2 start app.js -i max

# Check for memory leaks
node --inspect app.js

Step 6: Monitoring and Prevention

Set up proper monitoring to prevent future occurrences:

# Nginx status monitoring
location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

# Set up log rotation
sudo logrotate -f /etc/logrotate.d/nginx

Advanced Troubleshooting Techniques

Network-Level Debugging

# Use tcpdump to capture traffic
sudo tcpdump -i any -s 0 -w capture.pcap port 80 or port 443

# Analyze with wireshark or tshark
tshark -r capture.pcap -Y "http.response.code == 502"

# Check network connectivity
mtr --report your-backend-server.com

SSL/TLS Troubleshooting

# Test SSL handshake
openssl s_client -connect your-server:443 -servername your-domain.com

# Check certificate chain
curl -vI https://your-domain.com

# Verify cipher compatibility
nmap --script ssl-enum-ciphers -p 443 your-server.com

Performance Optimization

For high-traffic scenarios:

# Increase system limits
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf

# Tune kernel parameters
echo "net.core.somaxconn = 65536" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 65536" >> /etc/sysctl.conf
sysctl -p

Recovery Procedures

When dealing with production outages:

  1. Immediate Response: Remove failing backends from load balancer
  2. Temporary Fix: Implement circuit breaker patterns
  3. Long-term Solution: Address root cause and improve monitoring

Automated Recovery Scripts

#!/bin/bash
# health-check-and-restart.sh
HEALTH_URL="http://localhost:8080/health"
MAX_RETRIES=3

for i in $(seq 1 $MAX_RETRIES); do
    if curl -f $HEALTH_URL; then
        echo "Health check passed"
        exit 0
    else
        echo "Health check failed, attempt $i/$MAX_RETRIES"
        if [ $i -eq $MAX_RETRIES ]; then
            echo "Restarting service..."
            systemctl restart your-app-service
            sleep 10
        fi
    fi
done

Frequently Asked Questions

bash
#!/bin/bash
# 502 Bad Gateway Diagnostic Script
# Run this script to automatically diagnose common 502 issues

set -e

echo "=== 502 Bad Gateway Diagnostic Tool ==="
echo "Starting comprehensive health check..."

# Function to check service status
check_service() {
    local service=$1
    echo "Checking $service status..."
    if systemctl is-active --quiet $service; then
        echo "✓ $service is running"
    else
        echo "✗ $service is not running"
        echo "  Try: sudo systemctl restart $service"
    fi
}

# Function to check port connectivity
check_port() {
    local host=$1
    local port=$2
    echo "Testing connection to $host:$port..."
    if timeout 5 bash -c "</dev/tcp/$host/$port"; then
        echo "✓ Port $port is open on $host"
    else
        echo "✗ Cannot connect to $host:$port"
    fi
}

# Function to analyze logs
analyze_logs() {
    local logfile=$1
    if [ -f "$logfile" ]; then
        echo "Recent 502 errors in $logfile:"
        tail -100 "$logfile" | grep -i "502\|bad gateway\|upstream" | tail -5
    fi
}

# Main diagnostic checks
echo "\n1. Checking web server services..."
for service in nginx apache2 httpd; do
    if systemctl list-units --full -all | grep -Fq "$service.service"; then
        check_service $service
    fi
done

# Check backend application services
echo "\n2. Checking application services..."
for service in gunicorn uwsgi pm2; do
    if command -v $service &> /dev/null || systemctl list-units | grep -q $service; then
        check_service $service 2>/dev/null || echo "$service: Check manually"
    fi
done

# Test common backend ports
echo "\n3. Testing backend connectivity..."
check_port localhost 8000  # Common Django/Flask port
check_port localhost 3000  # Common Node.js port
check_port localhost 8080  # Common Java/Spring port
check_port localhost 9000  # Common PHP-FPM port

# Analyze log files
echo "\n4. Analyzing log files..."
analyze_logs "/var/log/nginx/error.log"
analyze_logs "/var/log/apache2/error.log"
analyze_logs "/var/log/httpd/error_log"

# Check system resources
echo "\n5. System resource check..."
echo "Memory usage:"
free -h | grep Mem
echo "\nDisk usage:"
df -h / | tail -1
echo "\nLoad average:"
uptime

# Check for common configuration issues
echo "\n6. Configuration validation..."
if command -v nginx &> /dev/null; then
    echo "Nginx configuration test:"
    sudo nginx -t 2>&1 || echo "Nginx config has errors"
fi

if command -v apache2ctl &> /dev/null; then
    echo "Apache configuration test:"
    sudo apache2ctl configtest 2>&1 || echo "Apache config has errors"
fi

echo "\n=== Diagnostic Complete ==="
echo "If issues persist, check:"
echo "- Application logs for crashes"
echo "- Database connectivity"
echo "- SSL certificate validity"
echo "- Firewall rules"
echo "- DNS resolution"

# Quick fix suggestions
echo "\n=== Quick Fix Commands ==="
echo "# Restart web server:"
echo "sudo systemctl restart nginx  # or apache2"
echo "\n# Restart application:"
echo "sudo systemctl restart gunicorn  # or your app service"
echo "\n# Check detailed logs:"
echo "sudo tail -f /var/log/nginx/error.log"
echo "\n# Test backend directly:"
echo "curl -v http://localhost:8000/"
E

Error Medic Editorial

Our team of senior DevOps and SRE engineers has collectively resolved thousands of production incidents across diverse technology stacks. We specialize in creating comprehensive troubleshooting guides that provide actionable solutions for complex system errors, drawing from real-world experience managing high-scale distributed systems.

Sources

Explore More browser Guides