Error Medic

Troubleshooting NGINX Ingress: "504 Gateway Timeout", "Connection Refused", and CrashLoopBackOff

Fix NGINX Ingress timeouts, Connection Refused, and CrashLoopBackOff errors. Learn how to debug proxy settings, adjust timeouts, and check pod health.

Last updated:
Last verified:
1,418 words
Key Takeaways
  • 504 Gateway Timeouts usually stem from upstream application pods taking too long to respond; fix by adjusting proxy-read-timeout annotations.
  • Connection Refused often indicates the NGINX controller service lacks endpoints, or an external cloud LoadBalancer/Security Group is misconfigured.
  • CrashLoopBackOff typically points to OOMKilled events from high memory usage or bad NGINX configuration syntax caused by a malformed Ingress resource.
  • Quick fix: Always verify controller logs using `kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx` to pinpoint whether the issue is routing, upstream health, or internal controller crashes.
NGINX Ingress Failure Modes Compared
Symptom / ErrorPrimary Root CauseDiagnostic CommandResolution Strategy
504 Gateway TimeoutUpstream pod processing delay > 60skubectl logs <app-pod>Add nginx.ingress.kubernetes.io/proxy-read-timeout annotation
Connection RefusedService selector mismatch / Firewallkubectl get endpoints -n ingress-nginxFix service label selectors or cloud security groups
CrashLoopBackOffOOMKilled or invalid nginx.conf syntaxkubectl logs <controller-pod> --previousIncrease memory limits or fix malformed Ingress snippet
404 Default BackendHost or Path routing rule mismatchkubectl describe ingress <name>Correct the host header or pathType in Ingress YAML

Understanding NGINX Ingress Failures

When managing Kubernetes clusters, the NGINX Ingress Controller is often the critical gateway for all incoming traffic. When it fails, your entire application stack can appear to be offline. The most common symptoms reported by developers and operators include 504 Gateway Timeout, Connection Refused, the dreaded CrashLoopBackOff state, or a general nginx ingress not working complaint.

This guide breaks down each of these failure modes, providing actionable diagnostic steps and concrete solutions to get your traffic routing restored.

1. NGINX Ingress Timeout (504 Gateway Timeout)

A 504 Gateway Timeout occurs when NGINX is acting as a proxy and does not receive a timely response from the upstream server (your application pod). By default, NGINX waits 60 seconds for a response. If your backend takes longer than this to generate a response, NGINX severs the connection and returns a 504 to the client.

Diagnostic Steps
  1. Check Upstream Pods: Are your application pods actually processing requests, or are they deadlocked? Use kubectl logs <your-app-pod>. Look for slow database queries, thread exhaustion, or application-level timeouts.
  2. Check Ingress Logs: Run kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx. You will see entries like upstream timed out (110: Connection timed out) while reading response header from upstream.
The Fix

If your application legitimately needs more than 60 seconds to process a request (e.g., file uploads, complex report generation, or long-polling web sockets), you need to increase the timeout annotations on your specific Ingress resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-long-running-app
  annotations:
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "120"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"

2. NGINX Ingress Connection Refused

Getting a Connection Refused error usually means the client reached the server IP, but no process was listening on the target port (typically 80 or 443). This happens at the TCP layer before HTTP negotiation even begins.

Diagnostic Steps
  1. Verify the Service: Ensure the Ingress controller service is actually exposed and running. kubectl get svc -n ingress-nginx.
  2. Check Endpoints: Does the service have endpoints? kubectl get endpoints -n ingress-nginx ingress-nginx-controller. If endpoints show as <none>, the service selector isn't matching the running controller pods.
  3. External Load Balancer: If using AWS ELB/NLB, Azure ALB, or GCP Load Balancers, verify the target groups. A misconfigured security group might block traffic between the load balancer and the Kubernetes worker nodes.
The Fix

If endpoints are missing, verify the pod labels match the service selector. If you are testing locally (Minikube/Kind/Docker Desktop), ensure you are using minikube tunnel or port-forwarding correctly: kubectl port-forward --namespace=ingress-nginx service/ingress-nginx-controller 8080:80 If it's a cloud environment, ensure the NodePort assigned to the NGINX service is allowed through your node's firewall/security groups.

3. NGINX Ingress CrashLoopBackOff

A CrashLoopBackOff state means the Ingress controller pod starts, crashes almost immediately, and Kubernetes keeps backing off and trying to restart it. The controller cannot route traffic while in this state.

Diagnostic Steps
  1. Describe the Pod: kubectl describe pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx. Look at the Events section at the bottom and the State reason for the container.
  2. Check for OOMKilled: If the reason is OOMKilled, the controller ran out of memory. This is highly common in large clusters with thousands of ingress rules or frequent reload events.
  3. Inspect Logs for Syntax Errors: NGINX will crash if it generates an invalid nginx.conf. Run kubectl logs <ingress-pod> --previous to see the log right before the crash.
The Fix
  • For OOMKilled: Increase the memory limits in the controller's Deployment spec. A typical production deployment might need 512Mi or even 1Gi of memory depending on cluster size.
  • For Configuration Errors: A bad configuration snippet (via nginx.ingress.kubernetes.io/configuration-snippet) in any single Ingress resource can break the global nginx.conf template, taking down the whole controller. Look for the specific Ingress resource causing the syntax error in the logs, and fix or delete it.

4. NGINX Ingress Not Working (Generic 404 or 503)

If you receive a default backend - 404 error, the request successfully reached the NGINX Ingress controller, but NGINX couldn't find a matching routing rule for the Host header or URL path.

Diagnostic Steps
  1. Check Host Headers: Ensure the Host header in your HTTP request (e.g., in your browser or curl) exactly matches the host field defined in your Ingress rule.
  2. Check Path Matching: NGINX path matching can be strict. Check your pathType (Prefix vs Exact).
  3. Verify Upstream Health (503 Error): A 503 Service Temporarily Unavailable usually means the endpoints list is populated, but the upstream application pods are failing their readiness probes, so NGINX has marked them as down.
The Fix

Double-check your Ingress YAML definition. If you are using a regex in the path, ensure you include the rewrite-target annotation if necessary, and ensure you use pathType: ImplementationSpecific or Prefix appropriately depending on your ingress-nginx controller version.

Frequently Asked Questions

bash
#!/bin/bash
# NGINX Ingress Diagnostic Script

NAMESPACE="ingress-nginx"
SELECTOR="app.kubernetes.io/name=ingress-nginx"

echo "=== 1. Checking NGINX Ingress Pod Status ==="
kubectl get pods -n $NAMESPACE -l $SELECTOR -o wide

echo -e "\n=== 2. Checking NGINX Ingress Service and Endpoints ==="
kubectl get svc,endpoints -n $NAMESPACE -l $SELECTOR

echo -e "\n=== 3. Fetching Recent Error Logs (Looking for Syntax Errors or Timeouts) ==="
POD_NAME=$(kubectl get pods -n $NAMESPACE -l $SELECTOR -o jsonpath='{.items[0].metadata.name}')
kubectl logs -n $NAMESPACE $POD_NAME | grep -iE "error|fatal|timeout|invalid"

echo -e "\n=== 4. Checking Previous Crashed Container Logs (For CrashLoopBackOff) ==="
kubectl logs -n $NAMESPACE $POD_NAME --previous | tail -n 20 || echo "No previous crashed container found."

echo -e "\n=== 5. Testing Local Port Forwarding ==="
echo "Run this command manually to test bypassing the external Load Balancer:"
echo "kubectl port-forward -n $NAMESPACE svc/ingress-nginx-controller 8080:80"
E

Error Medic Editorial

Our SRE and DevOps editorial team specializes in Kubernetes troubleshooting, cloud-native architecture, and site reliability engineering at scale.

Sources

Related Articles in Nginx Ingress

Explore More DevOps Config Guides