Why do I get a 504 Gateway Timeout only when routing through the Istio Ingress Gateway?

The Istio Ingress Gateway acts as an Envoy proxy itself and enforces its own timeout limits. Even if your internal VirtualServices have long timeouts, if the gateway's timeout is shorter, the client will receive a 504. You must ensure the VirtualService attached to the Ingress Gateway is also configured with an adequate timeout value.

What does the 'UT' response flag in Istio logs mean?

'UT' stands for Upstream Request Timeout. It indicates that the Envoy proxy terminated the request because the upstream application took longer to respond than the configured timeout limit (defaulting to 15 seconds).

How do I fix 503 Connection Refused with mTLS enabled?

Connection Refused (UF flag) with mTLS usually means a missing DestinationRule or a mode mismatch. Ensure you have a DestinationRule configured for the target host, and ensure the client sidecar is able to initiate mTLS. If the client is outside the mesh, the server's PeerAuthentication must be set to PERMISSIVE, not STRICT.

Can I disable timeouts completely in Istio?

While it is technically possible to disable timeouts by setting the VirtualService timeout to '0s', it is highly discouraged in production environments. Disabling timeouts can lead to infinite hanging connections, resource exhaustion, and eventual cascading node failures.

Why is my application returning 200 OK but Istio logs show 504?

This happens when the application takes longer to process the request than Envoy's timeout limit. Envoy cuts the connection to the client and logs a 504, but it does not terminate the actual application thread. The application finishes its work later and logs a 200 OK, completely unaware that Envoy already timed out the client request.

Resolving Istio 504 Gateway Timeout and 503 Connection Refused Errors

Fix Istio 504 Gateway Timeout and 503 Connection Refused errors by adjusting VirtualService timeout limits, DestinationRule settings, and diagnosing Envoy logs.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,817 words

Key Takeaways

504 Gateway Timeout errors (response flag 'UT') usually occur because an upstream service takes longer to respond than Envoy's default 15-second timeout limit.
503 Service Unavailable / Connection Refused errors (response flags 'UF' or 'URX') frequently indicate a missing DestinationRule, mismatched subset labels, or mTLS strict mode misconfigurations.
Quick Fix: Increase the timeout threshold in your VirtualService configuration to accommodate slow-responding endpoints, and verify peer authentication mTLS settings using 'istioctl x authz check'.
Always inspect the Envoy sidecar access logs ('kubectl logs <pod-name> -c istio-proxy') to identify the exact HTTP response flags causing the drop.

Troubleshooting Methods for Istio Timeouts & Connection Drops
Method	When to Use	Time	Risk
Increase VirtualService Timeout	When upstream application legitimately requires more than 15 seconds to process heavy requests.	5 mins	Low (But can mask underlying application performance degradation)
Configure Envoy Proxy Retries	For mitigating transient network blips, temporary unavailability, or intermittent 503s.	10 mins	Medium (High retry counts can cause 'retry storms' and cascading failures)
Fix mTLS PeerAuthentication	When seeing persistent 503 Connection Refused between injected and un-injected services.	15 mins	High (Misconfigurations can lead to severe security implications or wider outages)
Scale Up Upstream Pods (HPA)	When the upstream service is overwhelmed, causing processing delays and subsequent Envoy timeouts.	5 mins	Low (Increases cloud resource costs but stabilizes traffic flow safely)

Understanding the Error

When operating a service mesh like Istio, all ingress and inter-service traffic is intercepted and managed by Envoy sidecar proxies. While this architecture provides unparalleled observability, security, and routing capabilities, it also introduces a strict traffic management layer. Two of the most common and disruptive issues DevOps engineers and SREs face in this environment are Istio 504 Gateway Timeout errors and 503 Service Unavailable / Connection Refused errors.

Developers will typically see exact error messages such as:

HTTP/1.1 504 Gateway Timeout
upstream request timeout
HTTP/1.1 503 Service Unavailable
upstream connect error or disconnect/reset before headers. reset reason: connection failure
connection refused

The Anatomy of an Istio Timeout (504)

By default, Envoy enforces a strict 15-second timeout on all HTTP requests. If an upstream service (your application container) takes 15.1 seconds to process a request, Envoy forcefully terminates the connection, returning a 504 Gateway Timeout to the client. Crucially, your application is completely unaware of this termination; it will continue processing the request to completion, log a success message (like HTTP 200 OK), but the client will only ever see the 504. This discrepancy between application logs and proxy logs is a classic hallmark of an Istio timeout issue.

The Anatomy of Connection Refused (503)

A 503 Service Unavailable or connection refused error generally indicates that the Envoy sidecar cannot establish a TCP connection with the destination pod. This rarely means the target pod is down (which usually results in a 502 Bad Gateway). Instead, it points to a configuration mismatch. Common culprits include:

mTLS Misconfigurations: The client proxy attempts a plaintext connection, but the server sidecar enforces strict mTLS (PeerAuthentication set to STRICT).
Missing DestinationRules: Istio needs a DestinationRule to know how to route traffic to specific subsets or apply TLS settings.
Port Mismatches: The port defined in the Kubernetes Service does not match the targetPort of the deployment, or the port name doesn't follow Istio's <protocol>-<suffix> naming convention (e.g., http-web).

Step 1: Diagnose the Exact Failure Reason

The most critical step in troubleshooting Istio routing issues is examining the Envoy proxy access logs. Do not rely solely on your application logs.

Run the following command to tail the proxy logs of the failing client pod:

kubectl logs <client-pod-name> -n <namespace> -c istio-proxy --tail 100

Look for the Envoy Response Flags in the log output. These are typically two- or three-letter codes appended to the response status:

UT (Upstream Request Timeout): Confirms a 504 error caused by the upstream application taking longer than the configured VirtualService timeout.
UF (Upstream Connection Failure): Envoy failed to connect to the upstream service. This often pairs with connection refused errors.
URX (Upstream Retry Limit Exceeded): The proxy retried the request but exhausted its retry budget.
UH (No Healthy Upstream): The destination service has no healthy endpoints in its load balancing pool (often a Kubernetes readiness probe failure, not an Istio routing issue).
NR (No Route Configured): Istio doesn't know where to send this traffic. Check your VirtualService routes.

You can also use the istioctl CLI to inspect the proxy configuration state and ensure the Envoy sidecars are perfectly synced with the Istio control plane (istiod):

istioctl proxy-status
istioctl analyze -n <namespace>

Step 2: Fix Istio 504 Timeout Errors

If you identified a UT flag, the resolution requires extending the timeout limit in the relevant VirtualService resource. Remember that increasing timeouts should be a deliberate decision; if an endpoint is slow, consider optimizing the application performance first.

To increase the timeout to 60 seconds, locate the VirtualService routing traffic to your application and add/modify the timeout directive under the HTTP route:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-slow-service-vs
  namespace: production
spec:
  hosts:
  - my-slow-service
  http:
  - route:
    - destination:
        host: my-slow-service
    # Increase default 15s timeout to 60 seconds
    timeout: 60s
    retries:
      attempts: 3
      perTryTimeout: 20s
      retryOn: gateway-error,connect-failure,refused-stream

Apply the updated configuration:

kubectl apply -f virtualservice.yaml

Note on Retries: In the example above, we also added a retries block. If the timeout is caused by intermittent network latency rather than a consistently slow process, configuring retries can significantly improve reliability without requiring a massive global timeout extension.

Step 3: Fix 503 Connection Refused Errors

If your proxy logs reveal a UF (Upstream Connection Failure) or you see upstream connect error or disconnect/reset before headers, the issue is likely rooted in mTLS or DestinationRule misconfiguration.

Scenario A: mTLS Strict Mode Mismatch

If you have incrementally adopted Istio, some namespaces might enforce strict mTLS while others do not. If a client without an Envoy sidecar (or in PERMISSIVE mode) tries to communicate with a server enforcing STRICT mTLS, the connection will be refused.

Verify the PeerAuthentication policy for the destination namespace:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

The Fix: Ensure the calling client is also part of the mesh (injected with an Envoy sidecar) so it can negotiate the TLS handshake. If the client is external or cannot be injected, you must downgrade the destination's mTLS mode to PERMISSIVE or create a specific port-level exception.

Scenario B: Missing or Misconfigured DestinationRule

If a VirtualService routes traffic to specific subsets (e.g., v1 and v2 for canary deployments), you must have a corresponding DestinationRule defining those subsets. If the DestinationRule is missing, Envoy won't know the pod IP addresses associated with the subset, resulting in a 503.

Ensure your DestinationRule exists and accurately maps to Kubernetes pod labels:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: my-service-dr
  namespace: production
spec:
  host: my-service.production.svc.cluster.local
  subsets:
  - name: v1
    labels:
      version: v1

Scenario C: Headless Services and Traffic Policies

If you are routing traffic to external databases (like RDS) or headless services via a ServiceEntry, ensure the resolution is configured correctly (usually resolution: DNS). If Envoy tries to route to a headless service without a proper DestinationRule defining the load balancing algorithm, connections will drop.

Step 4: Advanced Validation and Tracing

After applying your fixes, validate the traffic flow to ensure the timeouts or connection refusals are resolved.

1. Use istioctl proxy-config: Dump the Envoy cluster configuration to verify your timeouts and endpoints have propagated to the sidecar.

istioctl proxy-config cluster <pod-name> -n <namespace> --fqdn my-service.production.svc.cluster.local -o json

Search the JSON output for the timeout parameter to confirm it reflects your new 60s limit.

2. Trigger test requests: Exec into a container within the mesh and use curl -v to observe the headers and response timing.

kubectl exec -it <test-pod> -n <namespace> -c application -- curl -v http://my-service:8080/api/data

By systematically verifying proxy logs, adjusting VirtualService limits, and auditing mTLS DestinationRules, you can stabilize your Istio data plane and eliminate disruptive 504 and 503 routing errors.

Frequently Asked Questions

bash

#!/bin/bash
# Diagnostic script for Istio Timeouts and Connection Refused errors

NAMESPACE="production"
POD_NAME=$1

if [ -z "$POD_NAME" ]; then
  echo "Usage: ./istio_diagnose.sh <pod-name>"
  exit 1
fi

echo "=== 1. Checking Istio Proxy Logs for UT or UF flags ==="
kubectl logs $POD_NAME -n $NAMESPACE -c istio-proxy | grep -E '"UT"|"UF"|"URX"' | tail -n 10

echo "\n=== 2. Running Istio Analyzer in namespace ==="
istioctl analyze -n $NAMESPACE

echo "\n=== 3. Checking VirtualService Configurations (Timeouts) ==="
kubectl get virtualservice -n $NAMESPACE -o yaml | grep -A 2 -B 2 timeout

echo "\n=== 4. Checking PeerAuthentication (mTLS Strict Mode) ==="
kubectl get peerauthentication --all-namespaces

echo "\n=== 5. Dumping Envoy Cluster Config for $POD_NAME ==="
# Helpful for verifying if the timeout was propagated to the specific Envoy sidecar
istioctl proxy-config cluster $POD_NAME -n $NAMESPACE | grep -i timeout

Error Medic Editorial

Error Medic Editorial is a team of Senior DevOps and Site Reliability Engineers dedicated to demystifying cloud-native architectures. We specialize in Kubernetes, Istio service mesh, and large-scale incident resolution.

Sources

Explore More DevOps Config Guides

Ansible Failed: Fix Connection Refused, Permission Denied & Timeout Errors

Fix Ansible failures including connection refused, permission denied, and timeout errors. Step-by-step diagnosis with real commands and verified solutions.

ArgoCD 'connection refused' Error: Complete Troubleshooting Guide (2024)

Fix ArgoCD 'connection refused', CrashLoopBackOff, ImagePullBackOff, and timeout errors with step-by-step diagnostic commands and proven solutions.

ArgoCD Connection Refused: Fix CrashLoopBackOff, ImagePullBackOff, Permission Denied & Timeout Errors

Fix ArgoCD connection refused errors: diagnose CrashLoopBackOff, ImagePullBackOff, permission denied, and timeout with step-by-step kubectl commands and config