Error Medic

Fixing NGINX Ingress Timeout (504 Gateway Time-out) and Connection Refused in Kubernetes

Resolve NGINX Ingress 504 Gateway Time-out, connection refused, and CrashLoopBackOff errors. Learn root causes, diagnostic kubectl commands, and timeout annotat

Last updated:
Last verified:
1,792 words
Key Takeaways
  • Upstream application slowness or unresponsiveness is the primary cause of 504 Gateway Time-out errors, often requiring tuning of proxy-read-timeout annotations.
  • Connection Refused (111: Connection refused) usually points to misconfigured Kubernetes Services, mismatched container ports, or missing Endpoints.
  • CrashLoopBackOff indicates the NGINX Ingress Controller Pod itself is failing to start, frequently caused by hostPort conflicts (e.g., port 80/443 already in use) or invalid ConfigMaps.
  • Quick Fix for timeouts: Apply 'nginx.ingress.kubernetes.io/proxy-read-timeout' and 'nginx.ingress.kubernetes.io/proxy-send-timeout' annotations to your Ingress resource to immediately mitigate 504s while investigating upstream performance.
Troubleshooting Approaches Compared
MethodWhen to UseTimeRisk
Adjust Ingress AnnotationsFor 504 Gateway Timeout when upstream is legitimately slow (e.g., heavy DB queries)5 minsLow
Verify Service & EndpointsFor 502 Bad Gateway or 111: Connection Refused in NGINX logs10 minsLow
Analyze Controller LogsFor CrashLoopBackOff or NGINX Ingress not working entirely across all routes15 minsMedium
Scale Upstream PodsWhen the upstream application is overwhelmed, causing CPU/Memory throttling and timeouts5 minsLow

Understanding the Error

The NGINX Ingress Controller is a core component in many Kubernetes clusters, acting as the primary entry point for external traffic routing to internal services. When it fails to properly route or receive responses from your backend applications, you will encounter a series of highly specific errors. The most common of these are 504 Gateway Time-out, 111: Connection refused, and the controller itself entering a CrashLoopBackOff state.

Understanding the architecture is crucial: The NGINX Ingress Controller watches the Kubernetes API for new Ingress resources, updates a dynamically generated nginx.conf file, and reloads the NGINX process. Traffic flows from the Client -> External Load Balancer -> NGINX Ingress Pod -> Kubernetes Service -> Upstream Application Pod.

Failures can occur at any hop in this chain.

Signature 1: NGINX Ingress Timeout (504 Gateway Time-out)

A 504 Gateway Time-out means that NGINX successfully established a connection with your upstream Pod, sent the HTTP request, but did not receive a complete response within the configured timeout window (default is typically 60 seconds).

Exact Error Message in Browser/cURL:

<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

Exact Error Message in NGINX Ingress Logs:

2023/10/24 14:32:01 [error] 42#42: *123456 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.1.100, server: example.com, request: "GET /api/heavy-export HTTP/1.1", upstream: "http://10.244.1.50:8080/api/heavy-export", host: "example.com"

Signature 2: NGINX Ingress Connection Refused (502 Bad Gateway)

If NGINX tries to forward traffic to an IP address and port where no application is listening, the OS networking stack immediately rejects the connection.

Exact Error Message in NGINX Ingress Logs:

2023/10/24 14:35:12 [error] 42#42: *123457 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.1.100, server: example.com, request: "GET / HTTP/1.1", upstream: "http://10.244.2.11:80/", host: "example.com"

Signature 3: NGINX Ingress CrashLoopBackOff

This occurs when the NGINX Ingress Controller Pod itself crashes repeatedly upon startup. This means no traffic is being routed into your cluster.

Exact Error Message in Kubernetes:

NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-5b4d8c6b9f-x9jkl   0/1     CrashLoopBackOff   6          8m

Step 1: Diagnosing and Fixing 504 Gateway Time-out

When dealing with timeouts, you must determine if the upstream application is genuinely slow, deadlocked, or overwhelmed. If the application legitimately takes longer than 60 seconds to process a request (e.g., a massive database export or AI model inference), you need to tune NGINX.

1. Identify the Culprit

Check the Ingress logs to confirm the timeout. Use kubectl to tail the logs of the ingress controller:

kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100 | grep "upstream timed out"

Identify the upstream: IP address in the log. Cross-reference this IP with your Pods:

kubectl get pods -A -o wide | grep <UPSTREAM_IP>

Check the resource utilization of that specific Pod. Is it hitting CPU limits? Is memory maxed out causing garbage collection pauses?

kubectl top pod <POD_NAME> -n <NAMESPACE>

2. The Fix: Adjusting Timeout Annotations

If the application is healthy but inherently slow, you must increase the NGINX timeout annotations on the specific Ingress resource. Do not change this globally unless absolutely necessary, as it can tie up NGINX worker connections.

Edit your Ingress resource and add the following annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: heavy-api-ingress
  namespace: production
  annotations:
    # Time to wait for a connection to the upstream to be established
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    # Time to wait to receive data from the upstream (most common cause of 504)
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    # Time to wait to transmit data to the upstream
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
  rules:
  # ... routing rules ...

Apply the changes: kubectl apply -f ingress.yaml

The NGINX controller will automatically detect the annotation change, regenerate nginx.conf, and reload the workers without dropping active connections.


Step 2: Diagnosing and Fixing Connection Refused

A 111: Connection refused error usually manifests to the client as a 502 Bad Gateway. It means NGINX has an IP for your Pod, but the Pod isn't accepting TCP connections on the specified port.

1. Verify the Kubernetes Endpoints

NGINX routes directly to Pod IPs (Endpoints), bypassing kube-proxy. If the Endpoints list is empty or pointing to the wrong port, NGINX will fail.

Check the Endpoints for your Service: kubectl get endpoints <SERVICE_NAME> -n <NAMESPACE>

If the list is empty (<none>), your Service selector does not match any Pod labels. Check your Service definition:

apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app # MUST match the pod labels exactly
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080 # MUST match the port your app binds to inside the container

2. Verify Container Binding

A very common mistake is an application binding to localhost or 127.0.0.1 inside the container. If a Node.js, Python, or Go app binds to 127.0.0.1:8080, it will refuse connections from the NGINX Ingress controller (which comes from outside the pod's local loopback network).

Ensure your application binds to 0.0.0.0:

  • Node.js/Express: app.listen(8080, '0.0.0.0')
  • Python/Flask: app.run(host='0.0.0.0', port=8080)
  • Go: http.ListenAndServe("0.0.0.0:8080", nil)

Use kubectl exec to verify what is listening inside the Pod: kubectl exec -it <POD_NAME> -- netstat -tuln You should see 0.0.0.0:8080 (or your target port), not 127.0.0.1:8080.


Step 3: Diagnosing and Fixing NGINX Ingress CrashLoopBackOff

If the NGINX Ingress Controller is in CrashLoopBackOff, your cluster is completely disconnected from external HTTP traffic.

1. Inspect the Fatal Logs

Grab the logs from the crashing container. Because it's restarting, you might need the logs from the previous instantiation:

kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --previous

2. Common Cause: Port Conflicts (HostPort)

If you deployed NGINX using hostNetwork: true or hostPort, and another process on the Kubernetes worker node is already bound to port 80 or 443 (like an existing Apache server, or another Ingress controller), NGINX will crash immediately.

Log Signature:

F0524 10:15:32.123456       1 main.go:100] port 80 is already in use. Please check if there is any other process listening on this port.

Fix: Move the Ingress controller to a node where the port is free, or switch from hostPort to a LoadBalancer or NodePort service type depending on your environment (AWS/GCP vs Bare Metal).

3. Common Cause: Webhook Validation Failures or Invalid ConfigMaps

If you recently edited the ingress-nginx-controller ConfigMap to add global settings (like proxy-body-size), a syntax error can crash NGINX during reload.

Log Signature:

nginx: [emerg] unknown directive "proxy_body_size" in /etc/nginx/nginx.conf:123

Fix: Revert the ConfigMap changes immediately. kubectl edit configmap ingress-nginx-controller -n ingress-nginx Correct the typo (e.g., client-max-body-size instead of proxy_body_size).


Advanced Architecture: Avoiding Timeouts Proactively

Relying on long HTTP timeouts is an anti-pattern in distributed systems. If your API routes frequently exceed 60 seconds, consider re-architecting the system:

  1. Asynchronous Processing: Instead of making the client wait for a heavy PDF generation or database aggregation, accept the request, immediately return a 202 Accepted status with a Job ID, and process the task in the background using a message queue (RabbitMQ/Kafka) and worker pods.
  2. WebSockets: For long-running interactive sessions, upgrade the connection to a WebSocket. NGINX handles WebSockets well, provided you add the correct annotations (nginx.ingress.kubernetes.io/proxy-read-timeout still applies to idle WebSockets, so configure ping/pong keepalives in your app).
  3. Circuit Breakers: Implement circuit breakers (like Istio or client-side logic) to fail fast when backend services are degraded, rather than tying up NGINX worker connections waiting for inevitable timeouts.

Frequently Asked Questions

bash
#!/bin/bash
# Diagnostic script for NGINX Ingress Timeouts and Connection Refused

NAMESPACE="ingress-nginx"
APP_NAMESPACE="default"
INGRESS_POD=$(kubectl get pods -n $NAMESPACE -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')

echo "=== 1. Checking NGINX Ingress Controller Status ==="
kubectl get pods -n $NAMESPACE

echo -e "\n=== 2. Tailing NGINX Logs for 504 Timeouts or 111 Connection Refused ==="
kubectl logs -n $NAMESPACE $INGRESS_POD --tail=50 | grep -E "(504|111: Connection refused|upstream timed out)"

echo -e "\n=== 3. Checking Ingress Annotations ==="
kubectl get ingress -n $APP_NAMESPACE -o yaml | grep -A 5 "annotations:"

echo -e "\n=== 4. Validating Endpoints (Should not be empty) ==="
kubectl get endpoints -n $APP_NAMESPACE
E

Error Medic Editorial

The Error Medic Editorial team consists of senior Site Reliability Engineers and DevOps architects with over a decade of experience managing massive-scale Kubernetes clusters in production environments.

Sources

Related Articles in Nginx Ingress

Explore More DevOps Config Guides