Resolving NGINX Ingress '504 Gateway Timeout', 'Connection Refused', and CrashLoopBackOff Errors in Kubernetes
Fix NGINX Ingress 504 Gateway Timeouts and Connection Refused errors. Learn to adjust proxy-read-timeout, resolve CrashLoopBackOff, and debug K8s routing.
- 504 Gateway Timeouts are usually caused by backend applications taking longer to respond than the configured NGINX proxy-read-timeout (default 60s).
- Connection Refused (502 Bad Gateway) typically indicates a disconnect between the Kubernetes Service and the backend Pods (e.g., app bound to localhost, missing endpoints, or wrong targetPort).
- CrashLoopBackOff in the NGINX controller is often due to OOMKilled events (insufficient memory limits) or invalid NGINX configuration syntax injected via ConfigMaps/Ingress rules.
- Always verify the data path: Ingress Controller Logs -> Kubernetes Endpoints -> Backend Pod Logs.
| Symptom / Error | Primary Fix Method | Time to Resolve | Risk Level |
|---|---|---|---|
| 504 Gateway Time-out | Add nginx.ingress.kubernetes.io/proxy-read-timeout annotation | 5 mins | Low |
| 111: Connection refused | Fix Service selectors and container binding (0.0.0.0) | 15 mins | Low |
| CrashLoopBackOff (OOM) | Increase memory limits in Controller Deployment | 10 mins | Medium |
| CrashLoopBackOff (Config) | Identify and remove invalid Ingress resource using admission webhooks | 20 mins | High |
Understanding NGINX Ingress Errors in Kubernetes
When managing Kubernetes clusters, the NGINX Ingress Controller is often the critical entry point for external traffic reaching your microservices. Because it sits at the edge of your cluster, any misconfiguration, resource exhaustion, or network policy issue will manifest as an ingress error. The four most common complaints developers raise are 'nginx ingress timeout' (504), 'nginx ingress connection refused' (502), 'nginx ingress crashloopbackoff', and a general 'nginx ingress not working'.
This guide provides a senior-level, systematic approach to diagnosing and resolving these specific errors, tracing the request lifecycle from the external load balancer down to the application container.
Scenario 1: The 'nginx ingress timeout' (504 Gateway Time-out)
The Symptom
Users or API clients report receiving a 504 Gateway Time-out response. In your NGINX ingress controller logs, you will see entries similar to this:
[error] 1234#1234: *5678 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.1.5, server: api.example.com, request: "POST /v1/reports/generate HTTP/1.1", upstream: "http://10.244.1.15:8080/v1/reports/generate", host: "api.example.com"
Root Cause Analysis
A 504 Gateway Timeout means that NGINX successfully routed the request to the upstream backend pod, but the backend pod failed to return an HTTP response within the configured timeout window. By default, the proxy-read-timeout, proxy-send-timeout, and proxy-connect-timeout in the NGINX Ingress Controller are set to exactly 60 seconds.
If you have an endpoint that generates large reports, handles massive file uploads, or runs complex AI model inferences, it will likely take longer than 60 seconds. When the clock hits 60s, NGINX ruthlessly cuts the connection and returns a 504 to the client, even if the backend pod is still happily processing the job.
Step 1: Diagnose
- Check the ingress controller logs to confirm the
upstream timed outerror. - Cross-reference the timestamp with the backend pod logs. You will often see the backend pod successfully complete the task after the 504 was issued, completely unaware that NGINX dropped the client.
Step 2: The Fix
You need to instruct NGINX to wait longer for this specific route. This is done using Ingress annotations. Do not change this globally unless absolutely necessary, as it ties up NGINX worker connections.
Apply the following annotations to your specific Ingress resource:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: reporting-api-ingress
namespace: production
annotations:
# Increase timeouts to 300 seconds (5 minutes)
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
Note: If your application sits behind an AWS Classic Load Balancer (ELB) or Application Load Balancer (ALB), ensure the load balancer's idle timeout is also increased to match or exceed your NGINX timeout. Otherwise, the cloud LB will drop the connection before NGINX does, resulting in a 504 from the cloud provider, not NGINX.
Scenario 2: The 'nginx ingress connection refused' (502 Bad Gateway)
The Symptom
Clients receive a 502 Bad Gateway error. The NGINX logs reveal the following critical error:
[error] 456#456: *890 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.0.5, server: app.example.com, request: "GET / HTTP/1.1", upstream: "http://10.244.2.33:3000/", host: "app.example.com"
Root Cause Analysis
Unlike a timeout, Connection refused (Error 111 at the TCP layer) means NGINX actively reached out to the Pod IP (10.244.2.33 on port 3000), but the operating system kernel inside the pod explicitly rejected the TCP SYN packet with a RST packet.
This happens for three main reasons:
- Application Binding to Localhost: The application inside the container is listening on
127.0.0.1(localhost) instead of0.0.0.0(all interfaces). NGINX connects via the Pod's eth0 IP, which is rejected. - Port Mismatch: The Kubernetes Service
targetPortdoes not match the port the application is actually listening on. - Application Crashed: The pod is running, but the specific application process inside it has crashed, and the container hasn't restarted yet.
Step 1: Diagnose
First, verify that the Kubernetes Endpoints object is correctly populated. NGINX bypasses kube-proxy and routes directly to the Endpoints.
Run: kubectl get endpoints <service-name> -n <namespace>
If endpoints exist, port-forward directly to the pod to test local connectivity:
kubectl port-forward pod/<pod-name> 3000:3000
If port-forwarding works but NGINX fails, check the application's bind address.
Step 2: The Fix
Fix A: Change the Application Bind Address
Ensure your Node.js, Python, or Go application binds to 0.0.0.0.
- Node/Express:
app.listen(3000, '0.0.0.0') - Python/Flask:
app.run(host='0.0.0.0', port=3000) - Go:
http.ListenAndServe(":3000", nil)
Fix B: Correct the Service TargetPort Ensure your Service configuration aligns with the container port:
apiVersion: v1
kind: Service
metadata:
name: my-app-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80 # Port exposed internally in the cluster
targetPort: 3000 # EXACT port the app listens on inside the container
Scenario 3: NGINX Ingress 'CrashLoopBackOff'
The Symptom
When you run kubectl get pods -n ingress-nginx, you see the controller pod constantly restarting:
ingress-nginx-controller-5c8d66c76d-xyz12 0/1 CrashLoopBackOff 15 (3m ago) 45m
Root Cause Analysis
A CrashLoopBackOff on the ingress controller is a severe cluster-level issue. It means the NGINX process is terminating abruptly. The two primary culprits are:
- OOMKilled (Out of Memory): Under high traffic, NGINX consumes memory for active connections, SSL buffers, and caching. If the limits are set too low, the Linux kernel's OOM killer will terminate the NGINX process.
- Invalid Configuration (Poison Pill): The NGINX controller dynamically generates
nginx.confbased on your Ingress resources. If a developer deploys an Ingress resource with a malformed snippet annotation (e.g.,nginx.ingress.kubernetes.io/configuration-snippet), it can result in invalid NGINX syntax. NGINX will fail to reload or start, causing the pod to crash.
Step 1: Diagnose
First, determine if it's an OOM issue by describing the pod:
kubectl describe pod -l app.kubernetes.io/name=ingress-nginx -n ingress-nginx
Look for Reason: OOMKilled under the Last State section.
If it is NOT OOMKilled, check the logs of the previous crashed container:
kubectl logs -l app.kubernetes.io/name=ingress-nginx -n ingress-nginx --previous
You are looking for a fatal NGINX syntax error, such as:
nginx: [emerg] "server" directive is not allowed here in /etc/nginx/nginx.conf:145
Step 2: The Fix
Fix A: Resolving OOMKilled Update your Helm chart values or deployment manifest to increase memory limits.
controller:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi # Increase this significantly
Fix B: Resolving Poison Pill Configurations If the controller is crashing due to bad syntax, you must find and delete the offending Ingress object. Because the controller is in CrashLoopBackOff, it cannot process deletions gracefully. You may need to manually inspect all recently modified ingress resources:
kubectl get ingress --all-namespaces -o yaml | grep -C 5 snippet
Once identified, delete the malformed ingress. To prevent this permanently, enable the NGINX Ingress Validating Webhook. The webhook intercepts kubectl apply commands and tests the resulting nginx.conf before accepting the Ingress object into the cluster.
Scenario 4: NGINX Ingress Not Working (Default Backend 404)
The Symptom
You deploy your app, create an Ingress, but when you curl the endpoint, you get:
default backend - 404
Root Cause Analysis
This generic "not working" state means the request reached the NGINX controller, but NGINX has no matching rule in its routing table for the provided Host header and Path.
- Missing IngressClass: In newer Kubernetes versions, Ingress resources require an
ingressClassName. Without it, the controller ignores the object. - Host Header Mismatch: The client is requesting an IP or a domain that does not perfectly match the
hostfield in the Ingress rule. - Path Mismatch: The
pathType(Exact vs Prefix) or the path regex is incorrect.
Step 1: Diagnose & Fix
Ensure your Ingress resource correctly specifies the ingressClassName and matches the requested Host.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: default
spec:
ingressClassName: nginx # CRITICAL: Must match your controller's class
rules:
- host: myapp.example.com # Must match the HTTP Host header exactly
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app-service
port:
number: 80
Test via curl by explicitly passing the Host header:
curl -H "Host: myapp.example.com" http://<ingress-controller-ip>/
By systematically verifying timeouts, backend connectivity, resource limits, and routing rules, you can resolve the vast majority of NGINX Ingress Controller issues in production Kubernetes environments.
Frequently Asked Questions
#!/bin/bash
# Comprehensive NGINX Ingress Diagnostic Script
NAMESPACE="production"
INGRESS_NAME="my-app-ingress"
SERVICE_NAME="my-app-service"
INGRESS_NS="ingress-nginx"
echo "--- 1. Checking Ingress Controller Logs for Errors (Timeouts/Refused) ---"
kubectl logs -n $INGRESS_NS -l app.kubernetes.io/name=ingress-nginx --tail=50 | grep -E 'error|warn|504|111'
echo -e "\n--- 2. Checking Ingress Controller Pod Status (CrashLoopBackOff check) ---"
kubectl get pods -n $INGRESS_NS -l app.kubernetes.io/name=ingress-nginx
echo -e "\n--- 3. Verifying Ingress Resource Annotations and Class ---"
kubectl get ingress $INGRESS_NAME -n $NAMESPACE -o yaml | grep -E 'proxy-|ingressClassName|host'
echo -e "\n--- 4. Checking Service and Endpoints Mapping ---"
kubectl get svc $SERVICE_NAME -n $NAMESPACE
kubectl get endpoints $SERVICE_NAME -n $NAMESPACE
echo -e "\n--- 5. Checking Backend Pod Status ---"
kubectl get pods -n $NAMESPACE -l app=$(kubectl get svc $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.spec.selector.app}')Error Medic Editorial
Error Medic Editorial comprises senior Site Reliability Engineers and DevOps architects dedicated to breaking down complex distributed systems failures into actionable, production-ready solutions.
Sources
- https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-timeouts
- https://kubernetes.github.io/ingress-nginx/troubleshooting/
- https://stackoverflow.com/questions/52175510/nginx-ingress-connect-failed-111-connection-refused-while-connecting-to-u
- https://github.com/kubernetes/ingress-nginx/issues/6451