GCP Cloud Run Timeout: Fix HTTP 504, Deadline Exceeded & Startup Timeout Errors
Fix GCP Cloud Run timeout errors including HTTP 504 and deadline exceeded. Increase timeout to 3600s, set min-instances, and offload long tasks to Cloud Run Job
- Cloud Run's default request timeout is 300 seconds; any HTTP request exceeding this limit returns HTTP 504 Gateway Timeout—raise it up to 3600 seconds for processing-heavy endpoints
- Container startup timeouts fire when your process fails to listen on the PORT environment variable within approximately 4 minutes, commonly caused by heavy model loading or slow dependency initialization
- CPU is throttled to near-zero between requests by default, causing cold-start latency spikes that cascade into visible timeout errors for the first user after an idle period
- Workloads exceeding 3600 seconds must move to Cloud Run Jobs (168-hour max task timeout) or Cloud Tasks—standard HTTP services cannot accommodate them
- Quick fix: gcloud run services update SERVICE --timeout=3600 --region=REGION raises the request limit to the maximum (1 hour)
| Method | When to Use | Time to Implement | Risk |
|---|---|---|---|
| Increase --timeout to 3600s | Request genuinely needs more processing time | < 5 min | Low |
| Set --min-instances=1 | Cold start latency is triggering first-request timeouts | < 5 min | Low (minor cost increase) |
| Optimize container startup | Startup probe fails; container takes >30s to bind PORT | Hours–days | Low |
| Offload to Cloud Tasks | Background or async work where the caller should not wait | Hours | Medium (architecture change) |
| Use Cloud Run Jobs | Batch processing, ETL, or jobs exceeding 1 hour | Hours | Medium (new job resource) |
| Disable CPU throttling (--no-cpu-throttling) | Compute-heavy workloads stall mid-request under load | < 5 min | Low (cost increase) |
Understanding GCP Cloud Run Timeouts
Cloud Run timeouts are not a single error—they appear at multiple layers of the request lifecycle. Identifying the correct layer is what determines which fix to apply.
The Three Timeout Layers
1. Request Timeout (HTTP 504 Gateway Timeout)
This is the most common timeout. Cloud Run enforces a hard limit on how long any single HTTP request can take. The default is 300 seconds (5 minutes). When a request exceeds this limit, Cloud Run terminates it and the client receives:
HTTP/1.1 504 Gateway Timeout
In Cloud Logging, you will see entries like:
httpRequest.status: 504
textPayload: The request was aborted because there was no available instance.
The maximum configurable timeout is 3600 seconds (1 hour).
2. Container Startup Timeout
When Cloud Run spins up a new instance, it expects the process to listen on the PORT environment variable within the startup window (approximately 4 minutes). If initialization takes too long—loading ML models, warming large caches, or building slow connection pools—startup fails and the triggering request receives:
Container failed to start and listen on the port defined by the PORT environment variable within the allotted startup time.
This typically surfaces as HTTP 500 or 503 on the first request after a scale-out event.
3. Downstream Dependency Timeout
Your container starts and responds quickly in isolation, but makes blocking calls to Cloud SQL, Firestore, an external REST API, or another Cloud Run service that hangs. The Cloud Run request timeout eventually fires. This manifests as:
- gRPC status
DEADLINE_EXCEEDED - Python:
requests.exceptions.ReadTimeout - Node.js:
Error: ETIMEDOUTorAbortError - Java:
java.net.SocketTimeoutException - Go:
context deadline exceeded
Step 1: Diagnose Which Timeout You Have
Check Cloud Logging for the 504 signature:
gcloud logging read \
'resource.type="cloud_run_revision" AND httpRequest.status=504' \
--project=PROJECT_ID \
--limit=20 \
--format=json
Check for container startup failures:
gcloud logging read \
'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
--project=PROJECT_ID \
--limit=10
Verify your current timeout setting:
gcloud run services describe SERVICE_NAME \
--region=REGION \
--format='value(spec.template.spec.timeoutSeconds)'
Identify slow requests in Cloud Monitoring:
Navigate to Cloud Run > Metrics > Request Latency and examine the p99 percentile. If p99 approaches your configured timeout, requests are hitting the ceiling under normal or peak load.
Step 2: Fix Request Timeout Issues
Option A — Increase the request timeout
For requests that legitimately need more processing time (file exports, ML inference, data transformation):
gcloud run services update SERVICE_NAME \
--timeout=3600 \
--region=REGION
Declare it in service.yaml for repeatable deployments:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-service
spec:
template:
spec:
timeoutSeconds: 3600
containers:
- image: gcr.io/PROJECT_ID/IMAGE
Apply with:
gcloud run services replace service.yaml --region=REGION
Option B — Offload to Cloud Tasks for async processing
For work where the client should not wait:
- Receive the request, validate input, and enqueue a Cloud Task pointing to a dedicated handler URL.
- Return
202 Acceptedimmediately with a task or job ID. - Cloud Tasks invokes your handler with its own retry and timeout configuration (up to 30 minutes per attempt).
- The caller polls a status endpoint or receives a webhook callback when done.
Option C — Use Cloud Run Jobs for batch work
For workloads with no HTTP client waiting (ETL pipelines, reports, ML training):
gcloud run jobs create my-batch-job \
--image=gcr.io/PROJECT_ID/IMAGE \
--task-timeout=86400 \
--region=REGION
Cloud Run Jobs support task timeouts up to 168 hours (7 days).
Step 3: Fix Container Startup Timeouts
Minimize container startup time:
- Use smaller base images such as Alpine or distroless instead of full Debian or Ubuntu.
- Move heavy initialization (model loading, cache warmup, DB pool creation) out of the startup path and into the first request handler or a background goroutine.
- Reduce Dockerfile layer count to speed up image pulls from Artifact Registry.
Configure a startup probe to signal readiness precisely:
containers:
- image: gcr.io/PROJECT_ID/IMAGE
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 12
timeoutSeconds: 3
With failureThreshold: 12 and periodSeconds: 5, Cloud Run waits up to 60 seconds for your health endpoint before declaring startup failed.
Eliminate cold starts with minimum instances:
gcloud run services update SERVICE_NAME \
--min-instances=1 \
--region=REGION
This keeps one instance always warm, preventing the first request after an idle period from paying the full cold start penalty.
Step 4: Fix Downstream Dependency Timeouts
Always set explicit timeouts on outbound calls. Never rely on Cloud Run's request timeout as an implicit downstream timeout.
Python (requests):
response = requests.get(
'https://api.example.com/data',
timeout=(5, 30) # (connect_timeout_secs, read_timeout_secs)
)
Node.js (fetch API):
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);
try {
const response = await fetch(url, { signal: controller.signal });
} finally {
clearTimeout(timeoutId);
}
Cloud SQL via Unix socket: Include connect_timeout in the connection string and configure the pool with explicit limits:
postgresql+pg8000://user:pass@/dbname?unix_sock=/cloudsql/PROJECT:REGION:INSTANCE&connect_timeout=10
Set pool_timeout and statement_timeout to values well below your Cloud Run request timeout so pool exhaustion surfaces as an explicit error rather than a silent hang that reaches the outer timeout.
Step 5: CPU Throttling and Concurrency Tuning
By default, Cloud Run throttles CPU to near-zero when no requests are being processed. This causes:
- Background threads and goroutines to stall between requests
- Database keep-alives to expire, forcing expensive reconnections on the next request
- In-container scheduled tasks to fire unreliably
Allocate CPU always-on for compute-heavy services:
gcloud run services update SERVICE_NAME \
--no-cpu-throttling \
--region=REGION
Tune concurrency to your workload type:
- CPU-bound requests:
--concurrency=1so each instance handles one request at a time without starvation - I/O-bound requests:
--concurrency=80or higher to maximize throughput per instance
Mismatched concurrency settings cause CPU starvation under load, manifesting as latency spikes that cascade into timeout failures.
Monitoring and Alerting
Track these key Cloud Monitoring metrics to catch timeout regressions proactively:
run.googleapis.com/request_latencies— watch p99 approaching your configured timeout valuerun.googleapis.com/request_countfiltered byresponse_code_class=5xxorresponse_code=504run.googleapis.com/container/instance_count— sudden drops can indicate crash-loops from startup timeouts
Create a Cloud Monitoring alerting policy that fires when the 504 error rate exceeds 1% of total requests over a 5-minute window to catch regressions before they impact significant user traffic.
Frequently Asked Questions
#!/usr/bin/env bash
# GCP Cloud Run Timeout — Diagnostic & Fix Script
# Prerequisites: gcloud CLI authenticated with sufficient IAM permissions
# Usage: PROJECT_ID=my-project SERVICE=my-service REGION=us-central1 bash fix-cloudrun-timeout.sh
set -euo pipefail
PROJECT_ID="${PROJECT_ID:?Error: export PROJECT_ID first}"
SERVICE="${SERVICE:?Error: export SERVICE first}"
REGION="${REGION:-us-central1}"
echo "=== [1] Current timeout and scaling configuration ==="
gcloud run services describe "${SERVICE}" \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--format='table(spec.template.spec.timeoutSeconds,spec.template.spec.containerConcurrency,spec.template.metadata.annotations)'
echo ""
echo "=== [2] Recent HTTP 504 Gateway Timeout errors (last 2 hours) ==="
gcloud logging read \
'resource.type="cloud_run_revision" AND httpRequest.status=504' \
--project="${PROJECT_ID}" \
--freshness=2h \
--limit=25 \
--format='table(timestamp,httpRequest.latency,httpRequest.requestUrl)'
echo ""
echo "=== [3] Container startup failures ==="
gcloud logging read \
'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
--project="${PROJECT_ID}" \
--freshness=2h \
--limit=10 \
--format='table(timestamp,textPayload)'
echo ""
echo "=== [4] Current minimum instances setting ==="
gcloud run services describe "${SERVICE}" \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--format='value(spec.template.metadata.annotations)'
echo ""
echo "=== [5] Recommended fixes — uncomment and run the relevant line ==="
echo ""
echo "# Fix 1: Raise request timeout to the maximum (3600 seconds = 1 hour)"
echo "# gcloud run services update ${SERVICE} --timeout=3600 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 2: Keep one warm instance to eliminate cold starts"
echo "# gcloud run services update ${SERVICE} --min-instances=1 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 3: Disable CPU throttling for compute-intensive workloads"
echo "# gcloud run services update ${SERVICE} --no-cpu-throttling --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 4: Create a Cloud Run Job for long-running batch work (up to 168-hour tasks)"
echo "# gcloud run jobs create ${SERVICE}-job --image=IMAGE --task-timeout=86400 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "Diagnostic complete. Review output above before applying any fix."
Error Medic Editorial
Error Medic Editorial is a team of senior DevOps, SRE, and cloud infrastructure engineers with production experience operating workloads on GCP, AWS, and Azure. Our troubleshooting guides are grounded in real incident post-mortems and cross-referenced with official platform documentation.
Sources
- https://cloud.google.com/run/docs/configuring/request-timeout
- https://cloud.google.com/run/docs/troubleshooting
- https://cloud.google.com/run/docs/container-contract
- https://cloud.google.com/run/docs/configuring/min-instances
- https://cloud.google.com/run/docs/configuring/cpu-allocation
- https://stackoverflow.com/questions/tagged/google-cloud-run