Error Medic

GCP Cloud Run Timeout: Fix HTTP 504, Deadline Exceeded & Startup Timeout Errors

Fix GCP Cloud Run timeout errors including HTTP 504 and deadline exceeded. Increase timeout to 3600s, set min-instances, and offload long tasks to Cloud Run Job

Last updated:
Last verified:
2,144 words
Key Takeaways
  • Cloud Run's default request timeout is 300 seconds; any HTTP request exceeding this limit returns HTTP 504 Gateway Timeout—raise it up to 3600 seconds for processing-heavy endpoints
  • Container startup timeouts fire when your process fails to listen on the PORT environment variable within approximately 4 minutes, commonly caused by heavy model loading or slow dependency initialization
  • CPU is throttled to near-zero between requests by default, causing cold-start latency spikes that cascade into visible timeout errors for the first user after an idle period
  • Workloads exceeding 3600 seconds must move to Cloud Run Jobs (168-hour max task timeout) or Cloud Tasks—standard HTTP services cannot accommodate them
  • Quick fix: gcloud run services update SERVICE --timeout=3600 --region=REGION raises the request limit to the maximum (1 hour)
Fix Approaches Compared
MethodWhen to UseTime to ImplementRisk
Increase --timeout to 3600sRequest genuinely needs more processing time< 5 minLow
Set --min-instances=1Cold start latency is triggering first-request timeouts< 5 minLow (minor cost increase)
Optimize container startupStartup probe fails; container takes >30s to bind PORTHours–daysLow
Offload to Cloud TasksBackground or async work where the caller should not waitHoursMedium (architecture change)
Use Cloud Run JobsBatch processing, ETL, or jobs exceeding 1 hourHoursMedium (new job resource)
Disable CPU throttling (--no-cpu-throttling)Compute-heavy workloads stall mid-request under load< 5 minLow (cost increase)

Understanding GCP Cloud Run Timeouts

Cloud Run timeouts are not a single error—they appear at multiple layers of the request lifecycle. Identifying the correct layer is what determines which fix to apply.

The Three Timeout Layers

1. Request Timeout (HTTP 504 Gateway Timeout)

This is the most common timeout. Cloud Run enforces a hard limit on how long any single HTTP request can take. The default is 300 seconds (5 minutes). When a request exceeds this limit, Cloud Run terminates it and the client receives:

HTTP/1.1 504 Gateway Timeout

In Cloud Logging, you will see entries like:

httpRequest.status: 504
textPayload: The request was aborted because there was no available instance.

The maximum configurable timeout is 3600 seconds (1 hour).

2. Container Startup Timeout

When Cloud Run spins up a new instance, it expects the process to listen on the PORT environment variable within the startup window (approximately 4 minutes). If initialization takes too long—loading ML models, warming large caches, or building slow connection pools—startup fails and the triggering request receives:

Container failed to start and listen on the port defined by the PORT environment variable within the allotted startup time.

This typically surfaces as HTTP 500 or 503 on the first request after a scale-out event.

3. Downstream Dependency Timeout

Your container starts and responds quickly in isolation, but makes blocking calls to Cloud SQL, Firestore, an external REST API, or another Cloud Run service that hangs. The Cloud Run request timeout eventually fires. This manifests as:

  • gRPC status DEADLINE_EXCEEDED
  • Python: requests.exceptions.ReadTimeout
  • Node.js: Error: ETIMEDOUT or AbortError
  • Java: java.net.SocketTimeoutException
  • Go: context deadline exceeded

Step 1: Diagnose Which Timeout You Have

Check Cloud Logging for the 504 signature:

gcloud logging read \
  'resource.type="cloud_run_revision" AND httpRequest.status=504' \
  --project=PROJECT_ID \
  --limit=20 \
  --format=json

Check for container startup failures:

gcloud logging read \
  'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
  --project=PROJECT_ID \
  --limit=10

Verify your current timeout setting:

gcloud run services describe SERVICE_NAME \
  --region=REGION \
  --format='value(spec.template.spec.timeoutSeconds)'

Identify slow requests in Cloud Monitoring:

Navigate to Cloud Run > Metrics > Request Latency and examine the p99 percentile. If p99 approaches your configured timeout, requests are hitting the ceiling under normal or peak load.


Step 2: Fix Request Timeout Issues

Option A — Increase the request timeout

For requests that legitimately need more processing time (file exports, ML inference, data transformation):

gcloud run services update SERVICE_NAME \
  --timeout=3600 \
  --region=REGION

Declare it in service.yaml for repeatable deployments:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-service
spec:
  template:
    spec:
      timeoutSeconds: 3600
      containers:
        - image: gcr.io/PROJECT_ID/IMAGE

Apply with:

gcloud run services replace service.yaml --region=REGION

Option B — Offload to Cloud Tasks for async processing

For work where the client should not wait:

  1. Receive the request, validate input, and enqueue a Cloud Task pointing to a dedicated handler URL.
  2. Return 202 Accepted immediately with a task or job ID.
  3. Cloud Tasks invokes your handler with its own retry and timeout configuration (up to 30 minutes per attempt).
  4. The caller polls a status endpoint or receives a webhook callback when done.

Option C — Use Cloud Run Jobs for batch work

For workloads with no HTTP client waiting (ETL pipelines, reports, ML training):

gcloud run jobs create my-batch-job \
  --image=gcr.io/PROJECT_ID/IMAGE \
  --task-timeout=86400 \
  --region=REGION

Cloud Run Jobs support task timeouts up to 168 hours (7 days).


Step 3: Fix Container Startup Timeouts

Minimize container startup time:

  • Use smaller base images such as Alpine or distroless instead of full Debian or Ubuntu.
  • Move heavy initialization (model loading, cache warmup, DB pool creation) out of the startup path and into the first request handler or a background goroutine.
  • Reduce Dockerfile layer count to speed up image pulls from Artifact Registry.

Configure a startup probe to signal readiness precisely:

containers:
  - image: gcr.io/PROJECT_ID/IMAGE
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 5
      failureThreshold: 12
      timeoutSeconds: 3

With failureThreshold: 12 and periodSeconds: 5, Cloud Run waits up to 60 seconds for your health endpoint before declaring startup failed.

Eliminate cold starts with minimum instances:

gcloud run services update SERVICE_NAME \
  --min-instances=1 \
  --region=REGION

This keeps one instance always warm, preventing the first request after an idle period from paying the full cold start penalty.


Step 4: Fix Downstream Dependency Timeouts

Always set explicit timeouts on outbound calls. Never rely on Cloud Run's request timeout as an implicit downstream timeout.

Python (requests):

response = requests.get(
    'https://api.example.com/data',
    timeout=(5, 30)  # (connect_timeout_secs, read_timeout_secs)
)

Node.js (fetch API):

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);
try {
  const response = await fetch(url, { signal: controller.signal });
} finally {
  clearTimeout(timeoutId);
}

Cloud SQL via Unix socket: Include connect_timeout in the connection string and configure the pool with explicit limits:

postgresql+pg8000://user:pass@/dbname?unix_sock=/cloudsql/PROJECT:REGION:INSTANCE&connect_timeout=10

Set pool_timeout and statement_timeout to values well below your Cloud Run request timeout so pool exhaustion surfaces as an explicit error rather than a silent hang that reaches the outer timeout.


Step 5: CPU Throttling and Concurrency Tuning

By default, Cloud Run throttles CPU to near-zero when no requests are being processed. This causes:

  • Background threads and goroutines to stall between requests
  • Database keep-alives to expire, forcing expensive reconnections on the next request
  • In-container scheduled tasks to fire unreliably

Allocate CPU always-on for compute-heavy services:

gcloud run services update SERVICE_NAME \
  --no-cpu-throttling \
  --region=REGION

Tune concurrency to your workload type:

  • CPU-bound requests: --concurrency=1 so each instance handles one request at a time without starvation
  • I/O-bound requests: --concurrency=80 or higher to maximize throughput per instance

Mismatched concurrency settings cause CPU starvation under load, manifesting as latency spikes that cascade into timeout failures.


Monitoring and Alerting

Track these key Cloud Monitoring metrics to catch timeout regressions proactively:

  • run.googleapis.com/request_latencies — watch p99 approaching your configured timeout value
  • run.googleapis.com/request_count filtered by response_code_class=5xx or response_code=504
  • run.googleapis.com/container/instance_count — sudden drops can indicate crash-loops from startup timeouts

Create a Cloud Monitoring alerting policy that fires when the 504 error rate exceeds 1% of total requests over a 5-minute window to catch regressions before they impact significant user traffic.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# GCP Cloud Run Timeout — Diagnostic & Fix Script
# Prerequisites: gcloud CLI authenticated with sufficient IAM permissions
# Usage: PROJECT_ID=my-project SERVICE=my-service REGION=us-central1 bash fix-cloudrun-timeout.sh

set -euo pipefail

PROJECT_ID="${PROJECT_ID:?Error: export PROJECT_ID first}"
SERVICE="${SERVICE:?Error: export SERVICE first}"
REGION="${REGION:-us-central1}"

echo "=== [1] Current timeout and scaling configuration ==="
gcloud run services describe "${SERVICE}" \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --format='table(spec.template.spec.timeoutSeconds,spec.template.spec.containerConcurrency,spec.template.metadata.annotations)'

echo ""
echo "=== [2] Recent HTTP 504 Gateway Timeout errors (last 2 hours) ==="
gcloud logging read \
  'resource.type="cloud_run_revision" AND httpRequest.status=504' \
  --project="${PROJECT_ID}" \
  --freshness=2h \
  --limit=25 \
  --format='table(timestamp,httpRequest.latency,httpRequest.requestUrl)'

echo ""
echo "=== [3] Container startup failures ==="
gcloud logging read \
  'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
  --project="${PROJECT_ID}" \
  --freshness=2h \
  --limit=10 \
  --format='table(timestamp,textPayload)'

echo ""
echo "=== [4] Current minimum instances setting ==="
gcloud run services describe "${SERVICE}" \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --format='value(spec.template.metadata.annotations)'

echo ""
echo "=== [5] Recommended fixes — uncomment and run the relevant line ==="
echo ""
echo "# Fix 1: Raise request timeout to the maximum (3600 seconds = 1 hour)"
echo "# gcloud run services update ${SERVICE} --timeout=3600 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 2: Keep one warm instance to eliminate cold starts"
echo "# gcloud run services update ${SERVICE} --min-instances=1 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 3: Disable CPU throttling for compute-intensive workloads"
echo "# gcloud run services update ${SERVICE} --no-cpu-throttling --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 4: Create a Cloud Run Job for long-running batch work (up to 168-hour tasks)"
echo "# gcloud run jobs create ${SERVICE}-job --image=IMAGE --task-timeout=86400 --region=${REGION} --project=${PROJECT_ID}"

echo ""
echo "Diagnostic complete. Review output above before applying any fix."
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps, SRE, and cloud infrastructure engineers with production experience operating workloads on GCP, AWS, and Azure. Our troubleshooting guides are grounded in real incident post-mortems and cross-referenced with official platform documentation.

Sources

Related Articles in GCP Cloud Run

Explore More Cloud Infrastructure Guides