Error Medic

Comprehensive Guide to Fixing Nginx 502 Bad Gateway, 504 Timeouts, and Core Crashes

Diagnose and resolve Nginx 502 Bad Gateway, 504 Timeouts, connection refused errors, out-of-memory crashes, and permission denied issues with this SRE guide.

Last updated:
Last verified:
1,688 words
Key Takeaways
  • Nginx 502 Bad Gateway errors indicate a broken connection to the upstream service (e.g., PHP-FPM, Node.js), often caused by the service being down, misconfigured ports, or socket permission issues.
  • Nginx 504 Gateway Timeouts happen when the upstream application takes too long to process a request; fixing this requires application profiling and tuning proxy/fastcgi timeout directives.
  • Resource exhaustion, such as 'too many connections' or 'out of memory' crashes, demands kernel-level tuning (ulimit, file descriptors) and careful configuration of worker processes and buffers.
Common Nginx Errors and Fix Approaches Compared
Symptom / ErrorCommon Root CausePrimary Diagnostic ToolTypical Resolution Time
Nginx 502 / Connection RefusedUpstream service down or wrong portsystemctl status, netstat5-10 mins
Nginx 504 / Nginx SlowHeavy backend processingApplication APM, slow logs30+ mins
Permission Denied (Socket)Incorrect socket owner or SELinuxls -l, getenforce, audit2allow10 mins
Too Many ConnectionsTraffic spike exceeding worker limitsnginx error.log, ulimit -n15 mins
Nginx Out of Memory / CrashOOM Killer, memory leak in moduledmesg, gdb (core dump)Hours/Days

Understanding Nginx Proxy Architecture & The 5xx Error Family

Nginx acts as the highly efficient, event-driven gateway for modern web infrastructure. It rarely serves dynamic content itself; instead, it proxies requests to upstream backend application servers such as PHP-FPM, Node.js, Python Gunicorn, or Java Tomcat. When you encounter errors like nginx 502, nginx 504, or experience an nginx crash, the root cause almost always lies in the communication layer between Nginx and the upstream service, or in resource exhaustion at the operating system level.

As a Site Reliability Engineer (SRE), debugging these issues requires a systematic approach: confirming the Nginx process health, verifying system resources, analyzing the error logs, and validating upstream connectivity.

Diagnosing "502 Bad Gateway" and "Connection Refused"

A 502 Bad Gateway error means Nginx successfully accepted the client's request but received an invalid response—or no response at all—from the upstream server.

When you check /var/log/nginx/error.log, you will typically see: [error] 1234#0: *5678 connect() failed (111: Connection refused) while connecting to upstream

Step 1: Verify Upstream Health The nginx connection refused error literally means the operating system rejected the TCP or Unix domain socket connection. Your first step is to verify if the backend is actually running.

systemctl status php8.1-fpm
# or
systemctl status my-node-app

If the service is running, ensure it is listening on the expected port or socket. Use netstat -tulpn or ss -tulpn to verify the bindings.

Step 2: Addressing "Nginx Permission Denied" If your upstream relies on Unix sockets (common for PHP-FPM or Gunicorn) instead of TCP ports, you might see: [error] 1234#0: *5678 connect() to unix:/var/run/php-fpm.sock failed (13: Permission denied) while connecting to upstream

This is a strict file permission issue. Nginx runs under a specific user (usually nginx or www-data). If this user does not have read and write permissions to the socket file, it cannot proxy traffic. Fix: Check the user running the upstream service. You may need to change the socket owner configuration in your PHP-FPM pool (listen.owner = www-data, listen.group = www-data). On RHEL/CentOS systems, SELinux is often the hidden culprit blocking proxy connections. If SELinux is active, run setsebool -P httpd_can_network_connect 1 to allow Nginx to connect to network proxies.

Solving "504 Gateway Timeout" and "Nginx Slow" Issues

Unlike a 502, an nginx 504 Gateway Timeout implies that Nginx established the connection to the upstream, sent the request, but the upstream failed to return a response before the proxy timeout limit was reached.

Log example: [error] 1234#0: *5678 upstream timed out (110: Connection timed out) while reading response header from upstream

If users complain that your site is nginx slow and eventually throws a 504, the problem is your backend application code, a slow database query, or an external API call hanging.

Mitigation and Tuning: While fixing the application is the true solution, you can temporarily increase Nginx's patience by tuning the timeout directives in nginx.conf or your server block:

location / {
    proxy_pass http://backend;
    proxy_read_timeout 300s;
    proxy_connect_timeout 75s;
    proxy_send_timeout 300s;
}

If you are using FastCGI (PHP), adjust the fastcgi_read_timeout directive instead. Keep in mind that infinitely increasing timeouts will eventually tie up all your Nginx worker connections, leading to complete service degradation.

Tackling "Nginx Too Many Connections"

During traffic spikes or DDoS attacks, your server might run out of available connection slots. The error log will clearly state: [alert] 1234#0: *5678 1024 worker_connections are not enough

How to Fix:

  1. Open /etc/nginx/nginx.conf.
  2. In the events block, increase the limit: worker_connections 4096; or higher.
  3. Ensure worker_processes auto; is set so Nginx spawns one worker per CPU core.

Kernel Limits: Nginx cannot open more connections than the Linux kernel allows file descriptors. If you increase worker_connections to 10000, but your OS limit is 1024, Nginx will still fail. Check the limit for the Nginx user by running su - nginx -c 'ulimit -n'. To increase this permanently, edit /etc/security/limits.conf:

nginx       soft    nofile   65535
nginx       hard    nofile   65535

Restart Nginx after making these kernel-level changes.

Investigating Nginx Out of Memory, High CPU, and Core Dumps

When a server suffers from nginx high cpu or an nginx out of memory event, the symptoms are severe. The service may abruptly terminate, leaving users with generic browser connection errors.

OOM Killer: If Nginx consumes all available system RAM—perhaps due to a massive influx of traffic with large payloads, unoptimized proxy_buffers, or a memory leak in a third-party dynamic module—the Linux kernel will terminate it to protect the OS. Check the kernel logs for OOM termination: dmesg -T | grep -i oom-killer If you see nginx listed here, you need to either add more physical RAM/swap, or restrict Nginx's memory footprint by tuning client_max_body_size and optimizing buffer sizes.

Nginx Crash and Core Dumps: If you encounter a segmentation fault where Nginx abruptly exits with an nginx failed status and a signal 11 or 9, you have a deep bug, often related to OpenSSL or compiled third-party modules. To trace an nginx crash log, you must enable core dumps.

Add this to the top of your nginx.conf (main context):

worker_rlimit_core 500M;
working_directory /tmp/nginx-cores;

Ensure the /tmp/nginx-cores directory exists and is writable by the nginx user. When the nginx crash happens next, a core file (e.g., core.1234) will be written. You can then use the GNU Debugger to analyze the nginx core dump: gdb /usr/sbin/nginx /tmp/nginx-cores/core.1234 Typing bt (backtrace) in GDB will reveal the exact C function where Nginx crashed, which is invaluable for submitting bug reports or removing the offending module.

Resolving "Nginx Service Not Starting"

Often during deployments, you may find Nginx completely dead with a systemd status of nginx service not starting or nginx not working.

  1. Configuration Syntax: Never restart Nginx without testing the config. Run nginx -t. A simple missing semicolon can prevent the entire master process from booting.
  2. Port Binding Conflicts: If the error log shows bind() to 0.0.0.0:80 failed (98: Address already in use), another process is hoarding the port. This could be Apache, an orphaned Nginx master process, or another reverse proxy. Find the culprit using netstat -tulpn | grep :80 or lsof -i :80 and terminate it using kill -9 <PID>.

By systematically verifying upstream health, tuning timeout and connection limits, and deeply analyzing system logs and core dumps, you can ensure your Nginx infrastructure remains resilient under extreme loads.

Frequently Asked Questions

bash
#!/bin/bash
# Nginx Diagnostic Script: Checks syntax, ports, upstream health, and logs

echo "--- 1. Testing Nginx Configuration Syntax ---"
nginx -t

echo -e "\n--- 2. Checking Nginx Process Health ---"
systemctl status nginx --no-pager | grep -i active

echo -e "\n--- 3. Identifying Processes Listening on Port 80/443 ---"
netstat -tulpn | grep -E ':80|:443'

echo -e "\n--- 4. Extracting Recent 502 and 504 Errors from Nginx Logs ---"
if [ -f /var/log/nginx/error.log ]; then
    tail -n 500 /var/log/nginx/error.log | grep -E 'Connection refused|timed out|Permission denied|worker_connections'
else
    echo "Log file /var/log/nginx/error.log not found."
fi

echo -e "\n--- 5. Checking for Kernel OOM (Out of Memory) Kills ---"
dmesg -T | grep -i 'oom-killer' | tail -n 5

echo -e "\n--- 6. Checking Current File Descriptor Limits (ulimit) ---"
su - nginx -s /bin/bash -c 'ulimit -n'
E

Error Medic Editorial

Error Medic Editorial is composed of senior Site Reliability Engineers and DevOps architects dedicated to publishing actionable, deeply technical troubleshooting guides for enterprise infrastructure.

Sources

Related Articles in Nginx

Explore More Linux Sysadmin Guides