Why do I get a 502 Bad Gateway only intermittently during deployments?

Intermittent 502s usually occur during container rollouts, scaling events, or rolling updates. When a new container spins up, Traefik might discover it via the provider API before the application inside has fully started its web server. Using native Docker healthchecks or strict Kubernetes readiness probes ensures Traefik only routes traffic to fully initialized and warmed-up containers.

Does adding `expose: - "80"` in my docker-compose.yml fix the 'Connection Refused' error?

Not inherently. The `expose` directive only documents the intended port for linking, but it doesn't force Traefik to use it if multiple ports exist or if Traefik gets confused. You must always use the explicit `traefik.http.services. .loadbalancer.server.port= ` label to guarantee Traefik targets the correct internal container port.

How can I view Traefik's internal routing configuration to debug a persistent 504?

Enable the Traefik dashboard and insecure API (`--api.insecure=true` for local development debugging). Navigate to the web UI (usually port 8080) to visually inspect the exact Routers, Services, and Middlewares Traefik has successfully mapped, including any active transport timeouts. Setting `--log.level=DEBUG` is also mandatory for deep visibility.

My container is attached to multiple Docker networks. Why does Traefik fail to route to it?

When a backend container is attached to multiple networks (e.g., an internal secure database network and a public Traefik network), Traefik might randomly pick the IP address of the internal network, which it cannot route to. You must specify which network Traefik should use for routing by applying the label `traefik.docker.network= `.

Can a 502 Bad Gateway be caused by SSL/TLS configuration issues?

Yes. If Traefik is instructed to communicate with your backend over HTTPS (using the `loadbalancer.server.scheme=https` label), but your backend application only serves plain HTTP, or if it has an invalid/self-signed internal certificate that Traefik naturally rejects, the connection will fail immediately and Traefik will return a 502.

Fixing Traefik 502 Bad Gateway and 504 Gateway Timeout Errors

Comprehensive troubleshooting for Traefik 502 Bad Gateway, 504 Timeouts, and Connection Refused errors. Learn to diagnose Docker networks, ports, and timeouts.

Last updated: February 24, 2026

Last verified: February 24, 2026

2,287 words

Key Takeaways

Root Cause 1: Traefik and the backend container are not sharing a common Docker network, resulting in 'Connection Refused' and a 502 error.
Root Cause 2: Traefik is automatically routing traffic to the wrong internal port of the backend service (e.g., targeting port 80 instead of 3000).
Root Cause 3: The backend application is legitimately taking too long to respond, exceeding Traefik's default or configured forwarding timeouts, causing a 504.
Quick Fix Summary: Explicitly define `traefik.docker.network`, specify target ports via `loadbalancer.server.port` labels, verify backend health, and adjust `forwardingTimeouts` for long-running endpoints.

Fix Approaches Compared
Method	When to Use	Time	Risk
Verify Docker Networks	When seeing 'Connection Refused' logs or Traefik cannot resolve the backend IP address.	5 mins	Low
Specify Target Port	When the backend exposes multiple ports or a non-standard port and Traefik guesses incorrectly.	2 mins	Low
Increase Forwarding Timeouts	When facing '504 Gateway Timeout' on heavy API requests, file uploads, or long-running DB queries.	5 mins	Low
Configure TLS/Scheme	When the backend application enforces HTTPS internally and rejects Traefik's default HTTP probe.	10 mins	Medium

Understanding Traefik Gateway Errors

When operating Traefik as a reverse proxy, ingress controller, or API gateway, encountering HTTP 502 Bad Gateway, HTTP 504 Gateway Timeout, or raw 'Connection Refused' errors is a frequent occurrence. Because Traefik dynamically discovers services via providers like Docker, Kubernetes, or HashiCorp Consul, the root cause of the disconnect often lies in the configuration bridging Traefik to your backend applications, rather than Traefik itself.

The 502 Bad Gateway

A 502 Bad Gateway error occurs when Traefik successfully receives a request from an external client, identifies the matching routing rule (Router), and attempts to forward the request to the backend server (Service), but receives an invalid response or no response at all. In the Traefik ecosystem, this almost always means Traefik cannot establish a TCP connection to the backend IP address and port it discovered.

The 504 Gateway Timeout

A 504 Gateway Timeout indicates that Traefik successfully resolved the backend and established a TCP connection, but the backend failed to return a complete HTTP response within the allowed timeframe. This happens with slow database queries, upstream external API delays, or insufficient timeout configurations explicitly set in Traefik's transport layer.

Connection Refused

When viewing Traefik debug logs, you might see the underlying network error: dial tcp <IP>:<PORT>: connect: connection refused. This is the direct network error triggering the HTTP 502. It means the target IP is reachable at the network layer, but no application process is actively listening on the specified port, or a host-level firewall/network isolation policy violently rejected the TCP SYN packet.

Step 1: Diagnosing 502 Bad Gateway and Connection Refused

The most frequent cause of a 502 in Traefik (especially when utilizing the Docker provider) is a network isolation issue, a port mismatch, or a TLS handshake failure.

1.1 Docker Network Isolation

By default, Docker Compose provisions a default bridge network for each distinct docker-compose.yml stack. If Traefik runs in its own infrastructure stack and your application runs in a separate project stack, they are placed on completely isolated bridge networks. Traefik will discover the container via the Docker socket, retrieve its internal overlay IP, attempt to route traffic to it, and fail with a 502 because it cannot route packets across isolated Docker bridges.

Diagnosis: Inspect the networks attached to both the Traefik container and your target backend container: docker inspect <traefik_container_name> -f '{{json .NetworkSettings.Networks}}' docker inspect <backend_container_name> -f '{{json .NetworkSettings.Networks}}'

The Fix: Ensure both containers share at least one common network. Create an external network (e.g., traefik-public) and attach both services to it. In your application's docker-compose.yml:

networks:
  traefik-public:
    external: true

services:
  myapp:
    networks:
      - traefik-public

Crucially, if your application container is attached to multiple networks (e.g., an internal DB network and the Traefik network), you must explicitly tell Traefik which network to use to route external traffic. Add this label to your backend service: traefik.docker.network=traefik-public

1.2 Incorrect Port Discovery

Traefik attempts to intelligently auto-detect the internal port your container is listening on. If a container exposes multiple ports (e.g., a web server exposing 80 for HTTP and 8080 for Prometheus metrics) or doesn't explicitly expose any using the Dockerfile EXPOSE instruction, Traefik might arbitrarily guess the wrong one.

Diagnosis: Enable debug mode in Traefik logs. You will see Traefik attempting to forward requests to a specific IP and port (e.g., Forwarding to 172.18.0.4:80). If your Node.js application listens on port 3000 but Traefik is trying to hit port 80, the OS will refuse the connection, yielding a 502.

The Fix: Override the auto-discovery and explicitly define the internal load balancer port using Docker labels: labels: - "traefik.http.services.my-app-service.loadbalancer.server.port=3000"

1.3 Backend Application Crash or Boot Delay

Sometimes the network plumbing is flawless, but the application simply isn't actively running. If the backend container is trapped in a crash-loop or is still executing lengthy initialization tasks (like running synchronous database migrations on startup), the internal web server won't be ready to accept TCP connections.

Diagnosis: Check the backend container logs: docker logs -f <backend_container_name>. Look for unhandled exceptions, stack traces, or initialization progress bars.

The Fix: Implement rigorous Healthchecks. Traefik natively integrates with Docker and Kubernetes healthchecks. By defining proper health probes, Traefik will exclude the unready container from the load balancer pool until it reports as 'healthy', preventing 502s during rollouts or restarts.

1.4 TLS/HTTPS Backend Communication Failures

Increasingly, zero-trust architectures mandate that even internal backend services enforce HTTPS. If Traefik attempts to connect via standard plain-text HTTP (its default behavior) to a backend port that strictly expects a TLS handshake, the connection will instantaneously drop or the backend will aggressively reject the malformed HTTP request, leading directly to a 502 Bad Gateway.

Diagnosis: Utilize curl from directly within the Traefik container targeting the backend internal IP. If curl http://<ip>:<port> returns an error like curl: (52) Empty reply from server or mentions an SSL handshake failure, but curl -k https://<ip>:<port> successfully returns data, you have a protocol scheme mismatch.

The Fix: You must explicitly instruct Traefik to negotiate an HTTPS scheme when communicating with this specific backend. Apply the following label to your service: traefik.http.services.<service-name>.loadbalancer.server.scheme=https Additionally, if the backend uses a self-signed or internal CA certificate (very common in microservices), Traefik will reject the connection because it cannot cryptographically verify the certificate authority. You must configure a specific serversTransport in a dynamic configuration file that skips TLS verification for that specific internal service, and reference it via labels: traefik.http.services.<name>.loadbalancer.serversTransport=<transport-name>@file

Step 2: Diagnosing 504 Gateway Timeout Errors

When you receive a 504 Gateway Timeout, the TCP connection from Traefik to the backend was successfully established, but the response lifecycle failed to complete in time.

2.1 Backend Processing Delays

Determine if the application endpoint is genuinely designed to take a long time. For example, a heavy PDF report generation endpoint, a bulk data export, or a complex machine learning inference request might legitimately take 45 to 120 seconds to process.

The Fix: If the delay is expected and architecturally sound, you must increase Traefik's forwarding timeouts. By default, Traefik is quite lenient, but underlying OS or infrastructure timeouts might interfere. You can configure precise timeouts at the entrypoint or service transport level using a File Provider (traefik.yml):

http:
  serversTransports:
    long_running_transport:
      forwardingTimeouts:
        dialTimeout: 30s
        responseHeaderTimeout: 120s
        idleConnTimeout: 90s

And attach this specialized transport to your specific service via Docker labels: - "traefik.http.services.my-heavy-app.loadbalancer.serversTransport=long_running_transport@file"

2.2 Unresponsive Upstream Dependencies

If your backend application is synchronously waiting on an external API (like a payment gateway) or a database query that hangs indefinitely due to lock contention, the backend thread will block. Traefik will patiently wait until its internal timeout is reached, eventually severing the connection and returning a 504 to the end user.

Diagnosis: Instrument your application with distributed tracing (e.g., OpenTelemetry, Jaeger) or add detailed duration logging to see exactly where the request is stalling within your backend code pipeline.

2.3 TCP Idle Connection Drop (Cloud Load Balancers)

In cloud environments like AWS (using Elastic Load Balancers in front of Traefik), Azure, or GCP, stateful firewalls or external load balancers will automatically and silently drop TCP connections if no packets traverse the wire for a certain idle period (typically 60 seconds). If an API request processes for 65 seconds without sending data, the cloud provider drops the connection. Traefik never receives a TCP FIN/RST packet, hangs waiting, and eventually throws a 504 or a 502.

Diagnosis: Analyze if requests that take exactly a specific duration (e.g., exactly 60 seconds) consistently fail. Review your cloud provider's default idle timeout settings for your external Load Balancers or NAT Gateways.

The Fix: Configure TCP Keep-Alive settings in your backend and in Traefik to periodically send empty probe packets to keep the connection state active on all intermediary cloud firewalls. Also, ensure Traefik's responseHeaderTimeout is strictly less than the cloud provider's hard idle timeout to return a graceful error rather than a hanging socket.

Step 3: Kubernetes Specific 502/504 Diagnostics

When operating Traefik as a Kubernetes Ingress Controller, the complexity of networking increases dramatically due to CNI (Container Network Interface) plugins, kube-proxy iptables rules, and internal DNS.

3.1 Kubernetes Endpoint Missing

A Kubernetes Service acts as a network abstraction over ephemeral Pods. Traefik routes traffic directly to the IP endpoints associated with the Service. If your Pods fail their readiness probes, Kubernetes removes their IPs from the Service's endpoint list.

Diagnosis: Verify if the Kubernetes Service actually has registered active endpoints: kubectl get endpoints <service-name> -n <namespace> If the output shows <none> under the ENDPOINTS column, Traefik has absolutely nowhere to send the traffic and will immediately return a 502.

The Fix: Investigate why the backend Pods are failing their readiness probes using kubectl describe pod <pod-name> and kubectl logs <pod-name>. Fix the underlying application initialization issue.

3.2 CoreDNS Resolution Latency

Unlike the standalone Docker provider which reads IPs directly from the local Docker socket, Traefik in Kubernetes relies heavily on the cluster's internal DNS (CoreDNS) to resolve Service names to ClusterIPs, or routes directly to Pod IPs. If CoreDNS is experiencing high latency, CPU throttling, or dropping UDP packets, Traefik will fail to resolve the backend service hostname, leading to immediate 502 errors.

Diagnosis: Check the CoreDNS pod logs in the kube-system namespace for errors or high latency warnings. Exec into the Traefik pod and attempt to resolve your service manually to test DNS latency: kubectl exec -it <traefik-pod> -n traefik -- nslookup my-service.my-namespace.svc.cluster.local

The Fix: Scale out the CoreDNS deployment to handle high DNS query volumes. Ensure your Kubernetes node's resolv.conf is properly configured, and seriously consider enabling the NodeLocal DNSCache feature in your Kubernetes cluster to drastically reduce cross-node DNS lookup latency and mitigate UDP packet drop for Traefik ingress routing.

Frequently Asked Questions

bash

# 1. Enable DEBUG logging in Traefik via Docker Compose
# command:
#   - "--log.level=DEBUG"
#   - "--api.insecure=true"

# 2. View live Traefik logs and filter for gateway errors
docker logs -f traefik | grep -i -E "502|504|error|connection refused"

# 3. Verify Docker network overlap for Traefik and the backend service
docker inspect traefik -f '{{json .NetworkSettings.Networks}}' | jq .
docker inspect my-backend-app -f '{{json .NetworkSettings.Networks}}' | jq .

# 4. Test pure TCP connectivity manually from inside the Traefik container
# Replace <backend-ip> and <port> with the exact values failing in the Traefik logs
docker exec -it traefik /bin/sh -c "wget -qO- http://<backend-ip>:<port> || echo 'TCP Connection Failed'"

# 5. Check Kubernetes endpoints if using Traefik as an Ingress Controller
kubectl get endpoints my-backend-service -n my-app-namespace
kubectl describe pod -l app=my-backend-service -n my-app-namespace | grep -i readiness

Error Medic Editorial

Our SRE and DevOps engineering team breaks down complex infrastructure issues into clear, actionable guides. We specialize in Kubernetes networking, Docker orchestration, and advanced reverse proxy configurations for Traefik, Nginx, and Envoy.

Sources

Explore More Networking Guides

AWS ALB 502 Bad Gateway & 504 Gateway Timeout: Complete Troubleshooting Guide

Fix AWS ALB 502 Bad Gateway and 504 timeout errors fast. Covers target health, keep-alive mismatches, idle timeout tuning, and exact CLI diagnostic commands.

Envoy 503 Service Unavailable: Complete Troubleshooting Guide

Fix Envoy 503 errors fast: diagnose no_healthy_upstream, circuit breaker trips, and health check failures using the Admin API and config tuning.

Envoy 503 Service Unavailable: Root Causes and Troubleshooting Guide

Fix Envoy 503 Service Unavailable errors. Learn how to diagnose upstream connection failures, connection pool exhaustion, and TLS issues with actionable steps.