Troubleshooting 'The request to your container failed. Error: 504 Gateway Timeout' in GCP Cloud Run
Fix GCP Cloud Run 504 Gateway Timeout errors by adjusting request limits, optimizing cold starts, and offloading long-running jobs to Cloud Tasks.
- Cloud Run has a default timeout of 300 seconds (5 minutes) per request. If your code takes longer, GCP terminates it with a 504 error.
- Cold starts, especially in JVM or large Node.js/Python applications, can easily eat into the timeout window, leading to intermittent failures during scale-ups.
- Downstream bottlenecks, such as slow database queries or hanging external API calls without proper client timeouts, will exhaust the Cloud Run request timer.
- Quick fix: Increase the timeout limit up to 3600 seconds (60 minutes). Long-term fix: Decouple long-running operations using Cloud Pub/Sub or Cloud Tasks.
| Method | When to Use | Implementation Time | Risk / Cost |
|---|---|---|---|
| Increase Timeout Limit | Quick mitigation for workloads taking 5 to 60 minutes. | < 5 mins | Low risk, but masks underlying performance issues. |
| Enable CPU Allocation (Always On) | Background threads are processing after the response is sent or to keep connections warm. | 5 mins | Medium cost increase (billed for idle container time). |
| Optimize Cold Starts (Min Instances/CPU Boost) | Timeouts only happen on the first request to a new instance. | 10 mins | Medium cost increase, high effectiveness for intermittent 504s. |
| Decouple via Cloud Tasks / Pub/Sub | Heavy batch jobs, video processing, or report generation. | Days to Weeks | High effort, but the most scalable and resilient architecture. |
Understanding the Error
Google Cloud Run is designed for stateless, request-driven web services. By default, it imposes a strict maximum request timeout of 300 seconds (5 minutes). If your container does not return an HTTP response within this window, the Cloud Run infrastructure aggressively terminates the request, returning HTTP 504 Gateway Timeout to the client. In your logs, you will typically see the exact message: The request to your container failed. Error: 504 Gateway Timeout.
Timeouts in Cloud Run generally fall into three categories: Hard limit breaches, Cold start penalties, and Downstream dependency starvation. Understanding which bucket your application falls into is critical for resolving the issue without artificially inflating your infrastructure costs.
Root Cause 1: Hard Limit Breaches (Long-Running Processing)
The most straightforward reason for a 504 timeout is that your application is simply performing a task that takes longer than the configured timeout limit. Common examples include:
- Generating massive PDF reports or exporting gigabytes of data to CSV.
- Processing large image or video files synchronously.
- Running complex machine learning inferences on CPU.
- Batch processing thousands of database records in a single HTTP request.
Root Cause 2: The Cold Start Penalty
Cloud Run scales to zero. When a new request arrives and no containers are running, GCP must provision a new environment, pull your container image, start the container, and wait for your web server to begin listening on the designated port (usually 8080). This is known as a cold start.
If your application uses a heavy framework (like Spring Boot in Java), has a massive bundle size (Node.js/Next.js), or initializes heavy database connection pools and loads large ML models into memory at startup, the cold start alone might take 30, 60, or even 120 seconds. If a downstream operation then takes an additional 4 minutes, the total duration breaches the 5-minute default limit, resulting in a 504.
Root Cause 3: Downstream Dependency Starvation and Zombie Connections
Often, the application code itself is fast, but it relies on external services. If your container makes an HTTP request to a third-party API that hangs, and you haven't configured a proper client-side timeout in your code (e.g., using requests.get(url, timeout=5) in Python), your Cloud Run instance will sit idle waiting for a response until Cloud Run kills the overarching request. The same applies to unoptimized database queries holding table locks.
Diagnostic Steps
Before changing infrastructure settings, verify exactly why the timeout is happening using Google Cloud's observability tools.
Step 1: Query Cloud Logging
Navigate to the Logs Explorer in the Google Cloud Console and run the following MQL (Monitoring Query Language) or basic log filter to isolate timeouts:
resource.type = "cloud_run_revision"
httpRequest.status = 504
severity >= ERROR
Look at the httpRequest.latency field in the JSON payload of the log entry. If the latency is exactly 300s (or whatever your configured limit is), your code is hitting the hard timeout. If the latency is extremely short (e.g., a few milliseconds) but you still get a 504 or 503, the container likely crashed before it could serve the request (often an OOM - Out of Memory error).
Step 2: Utilize Cloud Trace
If you have Google Cloud Trace enabled (which is highly recommended for microservices), investigate the trace waterfall for the failing requests.
- Go to Cloud Trace > Trace list.
- Filter by HTTP 504.
- Examine the spans. If the span for
SQL Querytakes 299 seconds, your database is the bottleneck, not the compute layer.
Step 3: Check Container CPU and Concurrency Metrics
Navigate to Cloud Run > Your Service > Metrics. Look at the Container CPU utilization and Concurrent requests.
If CPU utilization is consistently hitting 100%, your processing is CPU-bound, which slows down response times. If concurrent requests hit your configured maximum (default 80), new requests are queued. Queued time counts toward the total request timeout!
Step-by-Step Fixes
Fix 1: Increase the Cloud Run Request Timeout
If your task legitimately takes longer than 5 minutes and you cannot refactor the application immediately, the fastest fix is to increase the timeout limit. Cloud Run supports timeouts up to 3600 seconds (60 minutes).
To update the timeout via the gcloud CLI:
gcloud run services update [SERVICE_NAME] \
--timeout=3600 \
--region=[REGION]
Warning: While this fixes the immediate 504, keeping HTTP connections open for 60 minutes is risky. Intermediate firewalls, load balancers, or client browsers (like Chrome) typically drop idle HTTP connections long before 60 minutes expire. The server might finish the work, but the client will see a broken pipe.
Fix 2: Optimize Cold Starts with CPU Boost and Min Instances
If your 504s only happen during spikes in traffic (when new instances are spinning up), mitigate cold starts.
Enable Startup CPU Boost: This feature temporarily allocates more CPU to your container during instance startup, cutting initialization times significantly.
gcloud run services update [SERVICE_NAME] \
--cpu-boost \
--region=[REGION]
Set Minimum Instances: If you want to eliminate cold starts entirely (at the cost of paying for idle instances), configure a minimum number of instances.
gcloud run services update [SERVICE_NAME] \
--min-instances=1 \
--region=[REGION]
Fix 3: CPU Allocation ('Always On' vs 'During Request')
By default, Cloud Run only allocates CPU to your container while it is actively processing a request. If your code spins up a background thread (e.g., in Go or Node.js) to finish processing after returning an HTTP 200 OK, that background thread is instantly throttled to near-zero CPU. If a subsequent request hits that same container, the background thread wakes up, consumes resources, and causes the new request to timeout.
To allow background processing, change the CPU allocation model:
gcloud run services update [SERVICE_NAME] \
--no-cpu-throttling \
--region=[REGION]
Fix 4: The Architectural Fix (Cloud Tasks / Pub/Sub)
The most robust solution for long-running processes is to move away from synchronous HTTP request-response models. If a user uploads a video to be encoded, do not make them wait for the HTTP response. Instead:
- The user makes an HTTP request to Cloud Run.
- Cloud Run writes the job payload to Google Cloud Tasks or publishes a message to Google Cloud Pub/Sub.
- Cloud Run immediately returns an
HTTP 202 Acceptedto the user with a Job ID. - A separate Cloud Run service (or the same one handling a different route) consumes the Pub/Sub message or Task asynchronously.
- The client polls an endpoint (e.g.,
/status/{job_id}) or listens via WebSockets for completion.
Cloud Tasks is specifically designed for this, as it allows you to configure specific retry logic, rate limits, and dispatch intervals that integrate perfectly with Cloud Run's autoscaling.
Best Practices for Resilience
- Always set timeouts on external clients: Never use a default HTTP client in your code. In Python's
requests, always dorequests.get(url, timeout=10). In Go, configurehttp.Client{Timeout: 10 * time.Second}. - Use Connection Pooling for Databases: Use the Cloud SQL Auth Proxy or frameworks like PgBouncer. Establishing a new TCP/TLS connection to PostgreSQL on every Cloud Run invocation adds massive latency overhead.
- Keep Container Images Lean: Use Alpine or Distroless base images. A smaller image pulls faster from the Artifact Registry, slightly reducing cold start latency.
- Implement Pagination for Data Reads: Never attempt to
SELECT *on large tables and serialize them to JSON in a single request. Always use cursors and limit your queries.
Frequently Asked Questions
# 1. Check current timeout configuration for a service
gcloud run services describe my-service --region us-central1 --format="value(spec.template.spec.timeoutSeconds)"
# 2. Update timeout to 15 minutes (900 seconds)
gcloud run services update my-service --timeout=900 --region us-central1
# 3. Enable CPU boost to speed up cold starts
gcloud run services update my-service --cpu-boost --region us-central1
# 4. Tail logs specifically for 504 Gateway Timeouts
gcloud logging read 'resource.type="cloud_run_revision" AND httpRequest.status=504' --limit 10 --format jsonError Medic Editorial
A collective of senior Site Reliability Engineers and Cloud Architects dedicated to solving complex infrastructure puzzles and creating scalable, resilient cloud environments.