Error Medic

Troubleshooting 'The request to your container failed. Error: 504 Gateway Timeout' in GCP Cloud Run

Fix GCP Cloud Run 504 Gateway Timeout errors by adjusting request limits, optimizing cold starts, and offloading long-running jobs to Cloud Tasks.

Last updated:
Last verified:
1,789 words
Key Takeaways
  • Cloud Run has a default timeout of 300 seconds (5 minutes) per request. If your code takes longer, GCP terminates it with a 504 error.
  • Cold starts, especially in JVM or large Node.js/Python applications, can easily eat into the timeout window, leading to intermittent failures during scale-ups.
  • Downstream bottlenecks, such as slow database queries or hanging external API calls without proper client timeouts, will exhaust the Cloud Run request timer.
  • Quick fix: Increase the timeout limit up to 3600 seconds (60 minutes). Long-term fix: Decouple long-running operations using Cloud Pub/Sub or Cloud Tasks.
Fix Approaches Compared
MethodWhen to UseImplementation TimeRisk / Cost
Increase Timeout LimitQuick mitigation for workloads taking 5 to 60 minutes.< 5 minsLow risk, but masks underlying performance issues.
Enable CPU Allocation (Always On)Background threads are processing after the response is sent or to keep connections warm.5 minsMedium cost increase (billed for idle container time).
Optimize Cold Starts (Min Instances/CPU Boost)Timeouts only happen on the first request to a new instance.10 minsMedium cost increase, high effectiveness for intermittent 504s.
Decouple via Cloud Tasks / Pub/SubHeavy batch jobs, video processing, or report generation.Days to WeeksHigh effort, but the most scalable and resilient architecture.

Understanding the Error

Google Cloud Run is designed for stateless, request-driven web services. By default, it imposes a strict maximum request timeout of 300 seconds (5 minutes). If your container does not return an HTTP response within this window, the Cloud Run infrastructure aggressively terminates the request, returning HTTP 504 Gateway Timeout to the client. In your logs, you will typically see the exact message: The request to your container failed. Error: 504 Gateway Timeout.

Timeouts in Cloud Run generally fall into three categories: Hard limit breaches, Cold start penalties, and Downstream dependency starvation. Understanding which bucket your application falls into is critical for resolving the issue without artificially inflating your infrastructure costs.

Root Cause 1: Hard Limit Breaches (Long-Running Processing)

The most straightforward reason for a 504 timeout is that your application is simply performing a task that takes longer than the configured timeout limit. Common examples include:

  • Generating massive PDF reports or exporting gigabytes of data to CSV.
  • Processing large image or video files synchronously.
  • Running complex machine learning inferences on CPU.
  • Batch processing thousands of database records in a single HTTP request.

Root Cause 2: The Cold Start Penalty

Cloud Run scales to zero. When a new request arrives and no containers are running, GCP must provision a new environment, pull your container image, start the container, and wait for your web server to begin listening on the designated port (usually 8080). This is known as a cold start.

If your application uses a heavy framework (like Spring Boot in Java), has a massive bundle size (Node.js/Next.js), or initializes heavy database connection pools and loads large ML models into memory at startup, the cold start alone might take 30, 60, or even 120 seconds. If a downstream operation then takes an additional 4 minutes, the total duration breaches the 5-minute default limit, resulting in a 504.

Root Cause 3: Downstream Dependency Starvation and Zombie Connections

Often, the application code itself is fast, but it relies on external services. If your container makes an HTTP request to a third-party API that hangs, and you haven't configured a proper client-side timeout in your code (e.g., using requests.get(url, timeout=5) in Python), your Cloud Run instance will sit idle waiting for a response until Cloud Run kills the overarching request. The same applies to unoptimized database queries holding table locks.


Diagnostic Steps

Before changing infrastructure settings, verify exactly why the timeout is happening using Google Cloud's observability tools.

Step 1: Query Cloud Logging

Navigate to the Logs Explorer in the Google Cloud Console and run the following MQL (Monitoring Query Language) or basic log filter to isolate timeouts:

resource.type = "cloud_run_revision"
httpRequest.status = 504
severity >= ERROR

Look at the httpRequest.latency field in the JSON payload of the log entry. If the latency is exactly 300s (or whatever your configured limit is), your code is hitting the hard timeout. If the latency is extremely short (e.g., a few milliseconds) but you still get a 504 or 503, the container likely crashed before it could serve the request (often an OOM - Out of Memory error).

Step 2: Utilize Cloud Trace

If you have Google Cloud Trace enabled (which is highly recommended for microservices), investigate the trace waterfall for the failing requests.

  1. Go to Cloud Trace > Trace list.
  2. Filter by HTTP 504.
  3. Examine the spans. If the span for SQL Query takes 299 seconds, your database is the bottleneck, not the compute layer.

Step 3: Check Container CPU and Concurrency Metrics

Navigate to Cloud Run > Your Service > Metrics. Look at the Container CPU utilization and Concurrent requests. If CPU utilization is consistently hitting 100%, your processing is CPU-bound, which slows down response times. If concurrent requests hit your configured maximum (default 80), new requests are queued. Queued time counts toward the total request timeout!


Step-by-Step Fixes

Fix 1: Increase the Cloud Run Request Timeout

If your task legitimately takes longer than 5 minutes and you cannot refactor the application immediately, the fastest fix is to increase the timeout limit. Cloud Run supports timeouts up to 3600 seconds (60 minutes).

To update the timeout via the gcloud CLI:

gcloud run services update [SERVICE_NAME] \
  --timeout=3600 \
  --region=[REGION]

Warning: While this fixes the immediate 504, keeping HTTP connections open for 60 minutes is risky. Intermediate firewalls, load balancers, or client browsers (like Chrome) typically drop idle HTTP connections long before 60 minutes expire. The server might finish the work, but the client will see a broken pipe.

Fix 2: Optimize Cold Starts with CPU Boost and Min Instances

If your 504s only happen during spikes in traffic (when new instances are spinning up), mitigate cold starts.

Enable Startup CPU Boost: This feature temporarily allocates more CPU to your container during instance startup, cutting initialization times significantly.

gcloud run services update [SERVICE_NAME] \
  --cpu-boost \
  --region=[REGION]

Set Minimum Instances: If you want to eliminate cold starts entirely (at the cost of paying for idle instances), configure a minimum number of instances.

gcloud run services update [SERVICE_NAME] \
  --min-instances=1 \
  --region=[REGION]

Fix 3: CPU Allocation ('Always On' vs 'During Request')

By default, Cloud Run only allocates CPU to your container while it is actively processing a request. If your code spins up a background thread (e.g., in Go or Node.js) to finish processing after returning an HTTP 200 OK, that background thread is instantly throttled to near-zero CPU. If a subsequent request hits that same container, the background thread wakes up, consumes resources, and causes the new request to timeout.

To allow background processing, change the CPU allocation model:

gcloud run services update [SERVICE_NAME] \
  --no-cpu-throttling \
  --region=[REGION]

Fix 4: The Architectural Fix (Cloud Tasks / Pub/Sub)

The most robust solution for long-running processes is to move away from synchronous HTTP request-response models. If a user uploads a video to be encoded, do not make them wait for the HTTP response. Instead:

  1. The user makes an HTTP request to Cloud Run.
  2. Cloud Run writes the job payload to Google Cloud Tasks or publishes a message to Google Cloud Pub/Sub.
  3. Cloud Run immediately returns an HTTP 202 Accepted to the user with a Job ID.
  4. A separate Cloud Run service (or the same one handling a different route) consumes the Pub/Sub message or Task asynchronously.
  5. The client polls an endpoint (e.g., /status/{job_id}) or listens via WebSockets for completion.

Cloud Tasks is specifically designed for this, as it allows you to configure specific retry logic, rate limits, and dispatch intervals that integrate perfectly with Cloud Run's autoscaling.

Best Practices for Resilience

  • Always set timeouts on external clients: Never use a default HTTP client in your code. In Python's requests, always do requests.get(url, timeout=10). In Go, configure http.Client{Timeout: 10 * time.Second}.
  • Use Connection Pooling for Databases: Use the Cloud SQL Auth Proxy or frameworks like PgBouncer. Establishing a new TCP/TLS connection to PostgreSQL on every Cloud Run invocation adds massive latency overhead.
  • Keep Container Images Lean: Use Alpine or Distroless base images. A smaller image pulls faster from the Artifact Registry, slightly reducing cold start latency.
  • Implement Pagination for Data Reads: Never attempt to SELECT * on large tables and serialize them to JSON in a single request. Always use cursors and limit your queries.

Frequently Asked Questions

bash
# 1. Check current timeout configuration for a service
gcloud run services describe my-service --region us-central1 --format="value(spec.template.spec.timeoutSeconds)"

# 2. Update timeout to 15 minutes (900 seconds)
gcloud run services update my-service --timeout=900 --region us-central1

# 3. Enable CPU boost to speed up cold starts
gcloud run services update my-service --cpu-boost --region us-central1

# 4. Tail logs specifically for 504 Gateway Timeouts
gcloud logging read 'resource.type="cloud_run_revision" AND httpRequest.status=504' --limit 10 --format json
E

Error Medic Editorial

A collective of senior Site Reliability Engineers and Cloud Architects dedicated to solving complex infrastructure puzzles and creating scalable, resilient cloud environments.

Sources

Related Articles in GCP Cloud Run

Explore More Cloud Infrastructure Guides