Why do I get a 504 Gateway Time-out only on large file uploads?

Large file uploads take time. If the upload duration exceeds the 'proxy-read-timeout' (default 60s), NGINX will cut the connection. You need to increase 'nginx.ingress.kubernetes.io/proxy-read-timeout' and ensure 'nginx.ingress.kubernetes.io/proxy-body-size' is large enough to accept the file.

What does 'connect() failed (111: Connection refused)' mean in NGINX logs?

It means NGINX attempted to forward traffic to the IP address of your application Pod, but no process was listening on the expected port. This is usually caused by the application binding to '127.0.0.1' instead of '0.0.0.0', or a mismatch between the Service 'targetPort' and the actual container port.

My NGINX Ingress Pod is in CrashLoopBackOff with 'bind() to 0.0.0.0:80 failed'. How do I fix it?

This indicates a port conflict. If your NGINX controller is using 'hostNetwork: true' or 'hostPort', another process on the underlying Kubernetes Node (like an OS-level Apache server or another Ingress) is already using port 80. You must stop the conflicting service or move the Ingress Pod to a dedicated node.

How do I globally increase the timeout for all Ingress resources in my cluster?

You can add 'proxy-read-timeout: "120"' to the 'ingress-nginx-controller' ConfigMap. However, this is generally discouraged as it affects all applications. It is safer to apply timeout annotations selectively on a per-Ingress basis.

Why is my NGINX Ingress completely not working after upgrading Kubernetes?

Ingress API versions changed significantly (e.g., from 'networking.k8s.io/v1beta1' to 'networking.k8s.io/v1'). Furthermore, newer Kubernetes versions require a newer version of the NGINX Ingress Controller. Check your controller logs for API deprecation errors and upgrade your Ingress Controller Helm chart.

Fixing NGINX Ingress Timeout (504 Gateway Time-out) and Connection Refused in Kubernetes

Resolve NGINX Ingress 504 Gateway Time-out, connection refused, and CrashLoopBackOff errors. Learn root causes, diagnostic kubectl commands, and timeout annotat

Last updated: February 24, 2026

Last verified: February 24, 2026

1,792 words

Key Takeaways

Upstream application slowness or unresponsiveness is the primary cause of 504 Gateway Time-out errors, often requiring tuning of proxy-read-timeout annotations.
Connection Refused (111: Connection refused) usually points to misconfigured Kubernetes Services, mismatched container ports, or missing Endpoints.
CrashLoopBackOff indicates the NGINX Ingress Controller Pod itself is failing to start, frequently caused by hostPort conflicts (e.g., port 80/443 already in use) or invalid ConfigMaps.
Quick Fix for timeouts: Apply 'nginx.ingress.kubernetes.io/proxy-read-timeout' and 'nginx.ingress.kubernetes.io/proxy-send-timeout' annotations to your Ingress resource to immediately mitigate 504s while investigating upstream performance.

Troubleshooting Approaches Compared
Method	When to Use	Time	Risk
Adjust Ingress Annotations	For 504 Gateway Timeout when upstream is legitimately slow (e.g., heavy DB queries)	5 mins	Low
Verify Service & Endpoints	For 502 Bad Gateway or 111: Connection Refused in NGINX logs	10 mins	Low
Analyze Controller Logs	For CrashLoopBackOff or NGINX Ingress not working entirely across all routes	15 mins	Medium
Scale Upstream Pods	When the upstream application is overwhelmed, causing CPU/Memory throttling and timeouts	5 mins	Low

Understanding the Error

The NGINX Ingress Controller is a core component in many Kubernetes clusters, acting as the primary entry point for external traffic routing to internal services. When it fails to properly route or receive responses from your backend applications, you will encounter a series of highly specific errors. The most common of these are 504 Gateway Time-out, 111: Connection refused, and the controller itself entering a CrashLoopBackOff state.

Understanding the architecture is crucial: The NGINX Ingress Controller watches the Kubernetes API for new Ingress resources, updates a dynamically generated nginx.conf file, and reloads the NGINX process. Traffic flows from the Client -> External Load Balancer -> NGINX Ingress Pod -> Kubernetes Service -> Upstream Application Pod.

Failures can occur at any hop in this chain.

Signature 1: NGINX Ingress Timeout (504 Gateway Time-out)

A 504 Gateway Time-out means that NGINX successfully established a connection with your upstream Pod, sent the HTTP request, but did not receive a complete response within the configured timeout window (default is typically 60 seconds).

Exact Error Message in Browser/cURL:

<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

Exact Error Message in NGINX Ingress Logs:

2023/10/24 14:32:01 [error] 42#42: *123456 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.1.100, server: example.com, request: "GET /api/heavy-export HTTP/1.1", upstream: "http://10.244.1.50:8080/api/heavy-export", host: "example.com"

Signature 2: NGINX Ingress Connection Refused (502 Bad Gateway)

If NGINX tries to forward traffic to an IP address and port where no application is listening, the OS networking stack immediately rejects the connection.

Exact Error Message in NGINX Ingress Logs:

2023/10/24 14:35:12 [error] 42#42: *123457 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.1.100, server: example.com, request: "GET / HTTP/1.1", upstream: "http://10.244.2.11:80/", host: "example.com"

Signature 3: NGINX Ingress CrashLoopBackOff

This occurs when the NGINX Ingress Controller Pod itself crashes repeatedly upon startup. This means no traffic is being routed into your cluster.

Exact Error Message in Kubernetes:

NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-5b4d8c6b9f-x9jkl   0/1     CrashLoopBackOff   6          8m

Step 1: Diagnosing and Fixing 504 Gateway Time-out

When dealing with timeouts, you must determine if the upstream application is genuinely slow, deadlocked, or overwhelmed. If the application legitimately takes longer than 60 seconds to process a request (e.g., a massive database export or AI model inference), you need to tune NGINX.

1. Identify the Culprit

Check the Ingress logs to confirm the timeout. Use kubectl to tail the logs of the ingress controller:

kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100 | grep "upstream timed out"

Identify the upstream: IP address in the log. Cross-reference this IP with your Pods:

kubectl get pods -A -o wide | grep <UPSTREAM_IP>

Check the resource utilization of that specific Pod. Is it hitting CPU limits? Is memory maxed out causing garbage collection pauses?

kubectl top pod <POD_NAME> -n <NAMESPACE>

2. The Fix: Adjusting Timeout Annotations

If the application is healthy but inherently slow, you must increase the NGINX timeout annotations on the specific Ingress resource. Do not change this globally unless absolutely necessary, as it can tie up NGINX worker connections.

Edit your Ingress resource and add the following annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: heavy-api-ingress
  namespace: production
  annotations:
    # Time to wait for a connection to the upstream to be established
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    # Time to wait to receive data from the upstream (most common cause of 504)
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    # Time to wait to transmit data to the upstream
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
  rules:
  # ... routing rules ...

Apply the changes: kubectl apply -f ingress.yaml

The NGINX controller will automatically detect the annotation change, regenerate nginx.conf, and reload the workers without dropping active connections.

Step 2: Diagnosing and Fixing Connection Refused

A 111: Connection refused error usually manifests to the client as a 502 Bad Gateway. It means NGINX has an IP for your Pod, but the Pod isn't accepting TCP connections on the specified port.

1. Verify the Kubernetes Endpoints

NGINX routes directly to Pod IPs (Endpoints), bypassing kube-proxy. If the Endpoints list is empty or pointing to the wrong port, NGINX will fail.

Check the Endpoints for your Service: kubectl get endpoints <SERVICE_NAME> -n <NAMESPACE>

If the list is empty (<none>), your Service selector does not match any Pod labels. Check your Service definition:

apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app # MUST match the pod labels exactly
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080 # MUST match the port your app binds to inside the container

2. Verify Container Binding

A very common mistake is an application binding to localhost or 127.0.0.1 inside the container. If a Node.js, Python, or Go app binds to 127.0.0.1:8080, it will refuse connections from the NGINX Ingress controller (which comes from outside the pod's local loopback network).

Ensure your application binds to 0.0.0.0:

Node.js/Express: app.listen(8080, '0.0.0.0')
Python/Flask: app.run(host='0.0.0.0', port=8080)
Go: http.ListenAndServe("0.0.0.0:8080", nil)

Use kubectl exec to verify what is listening inside the Pod: kubectl exec -it <POD_NAME> -- netstat -tuln You should see 0.0.0.0:8080 (or your target port), not 127.0.0.1:8080.

Step 3: Diagnosing and Fixing NGINX Ingress CrashLoopBackOff

If the NGINX Ingress Controller is in CrashLoopBackOff, your cluster is completely disconnected from external HTTP traffic.

1. Inspect the Fatal Logs

Grab the logs from the crashing container. Because it's restarting, you might need the logs from the previous instantiation:

kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --previous

2. Common Cause: Port Conflicts (HostPort)

If you deployed NGINX using hostNetwork: true or hostPort, and another process on the Kubernetes worker node is already bound to port 80 or 443 (like an existing Apache server, or another Ingress controller), NGINX will crash immediately.

Log Signature:

F0524 10:15:32.123456       1 main.go:100] port 80 is already in use. Please check if there is any other process listening on this port.

Fix: Move the Ingress controller to a node where the port is free, or switch from hostPort to a LoadBalancer or NodePort service type depending on your environment (AWS/GCP vs Bare Metal).

3. Common Cause: Webhook Validation Failures or Invalid ConfigMaps

If you recently edited the ingress-nginx-controller ConfigMap to add global settings (like proxy-body-size), a syntax error can crash NGINX during reload.

Log Signature:

nginx: [emerg] unknown directive "proxy_body_size" in /etc/nginx/nginx.conf:123

Fix: Revert the ConfigMap changes immediately. kubectl edit configmap ingress-nginx-controller -n ingress-nginx Correct the typo (e.g., client-max-body-size instead of proxy_body_size).

Advanced Architecture: Avoiding Timeouts Proactively

Relying on long HTTP timeouts is an anti-pattern in distributed systems. If your API routes frequently exceed 60 seconds, consider re-architecting the system:

Asynchronous Processing: Instead of making the client wait for a heavy PDF generation or database aggregation, accept the request, immediately return a 202 Accepted status with a Job ID, and process the task in the background using a message queue (RabbitMQ/Kafka) and worker pods.
WebSockets: For long-running interactive sessions, upgrade the connection to a WebSocket. NGINX handles WebSockets well, provided you add the correct annotations (nginx.ingress.kubernetes.io/proxy-read-timeout still applies to idle WebSockets, so configure ping/pong keepalives in your app).
Circuit Breakers: Implement circuit breakers (like Istio or client-side logic) to fail fast when backend services are degraded, rather than tying up NGINX worker connections waiting for inevitable timeouts.

Frequently Asked Questions

bash

#!/bin/bash
# Diagnostic script for NGINX Ingress Timeouts and Connection Refused

NAMESPACE="ingress-nginx"
APP_NAMESPACE="default"
INGRESS_POD=$(kubectl get pods -n $NAMESPACE -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')

echo "=== 1. Checking NGINX Ingress Controller Status ==="
kubectl get pods -n $NAMESPACE

echo -e "\n=== 2. Tailing NGINX Logs for 504 Timeouts or 111 Connection Refused ==="
kubectl logs -n $NAMESPACE $INGRESS_POD --tail=50 | grep -E "(504|111: Connection refused|upstream timed out)"

echo -e "\n=== 3. Checking Ingress Annotations ==="
kubectl get ingress -n $APP_NAMESPACE -o yaml | grep -A 5 "annotations:"

echo -e "\n=== 4. Validating Endpoints (Should not be empty) ==="
kubectl get endpoints -n $APP_NAMESPACE

Error Medic Editorial

The Error Medic Editorial team consists of senior Site Reliability Engineers and DevOps architects with over a decade of experience managing massive-scale Kubernetes clusters in production environments.

Sources

Explore More DevOps Config Guides

Ansible Failed: Fix Connection Refused, Permission Denied & Timeout Errors

Fix Ansible failures including connection refused, permission denied, and timeout errors. Step-by-step diagnosis with real commands and verified solutions.

ArgoCD 'connection refused' Error: Complete Troubleshooting Guide (2024)

Fix ArgoCD 'connection refused', CrashLoopBackOff, ImagePullBackOff, and timeout errors with step-by-step diagnostic commands and proven solutions.

ArgoCD Connection Refused: Fix CrashLoopBackOff, ImagePullBackOff, Permission Denied & Timeout Errors

Fix ArgoCD connection refused errors: diagnose CrashLoopBackOff, ImagePullBackOff, permission denied, and timeout with step-by-step kubectl commands and config