Why does Cloud Run return HTTP 504 even though my service responds in under a second in local testing?

Local testing does not simulate Cloud Run's cold start behavior or CPU throttling. In Cloud Run, the first request to a new instance must wait for the container to start, which can take 10 to 90 seconds depending on image size and initialization work. If outbound calls also behave differently in the cloud network environment due to VPC routing or private IP resolution, the combined latency can exceed your configured timeout. Set --min-instances=1 to eliminate cold starts entirely, and enable Cloud Trace to find exactly where time is spent on a per-request basis.

What is the maximum timeout I can configure for a Cloud Run service?

The maximum request timeout for a Cloud Run service using HTTP/1.1 or HTTP/2 is 3600 seconds, which equals 1 hour. For Cloud Run Jobs the maximum task timeout is 168 hours or 7 days. If your workload genuinely requires longer than 1 hour for a single HTTP request, you must restructure it as an async pattern: return 202 Accepted immediately, process via Cloud Tasks or Cloud Run Jobs, and notify the caller via webhook or a polling status endpoint. There is no configuration option to extend the 3600-second HTTP service limit.

I see 'deadline exceeded' in my logs but I never configured any deadlines in my code. Where is this coming from?

The gRPC status DEADLINE_EXCEEDED and its HTTP equivalent 504 can originate from multiple places even when your own code sets no deadlines. Cloud Run itself enforces your configured request timeout. Additionally, Google Cloud client libraries for Firestore, BigQuery, Vertex AI, and Cloud Storage carry internal per-call deadlines independently. Examine the stack trace to identify which client library call throws the error. If it originates inside a Google Cloud client, the upstream service timed out on its side. You can override per-call deadlines in ClientOptions, or reduce the scope of the query to fit within the upstream service's response SLA.

My Cloud Run service times out only under high load but works fine for individual requests. What is causing this?

Under high load several contention issues emerge simultaneously. Cloud Run scales out new instances at the same time, each paying cold start latency together. Your database connection pool may become exhausted, causing requests to queue waiting for a connection slot. Downstream dependencies such as Cloud SQL or Redis become bottlenecks as their response times grow under increased concurrent load. Diagnose by monitoring Cloud SQL active connections during load spikes, enable Cloud Trace for per-request latency breakdowns, and consider using Cloud SQL Auth Proxy with explicit max-connections limits or a dedicated connection pooler such as PgBouncer.

Can I use WebSockets or long-polling on Cloud Run, and do they face the same 3600-second limit?

Yes. Cloud Run supports WebSockets, HTTP/2 streaming, and server-sent events, all subject to the same 3600-second maximum timeout. For WebSockets you must configure your container to serve HTTP/2 using h2c. Design clients with automatic reconnection logic to handle the hard limit being reached gracefully. Cloud Run does not support raw TCP or UDP—only HTTP-based protocols are allowed. For truly persistent long-lived connections consider decoupling producers from consumers using Pub/Sub or Cloud Tasks rather than maintaining a persistent HTTP connection that fights the platform timeout.

GCP Cloud Run Timeout: Fix HTTP 504, Deadline Exceeded & Startup Timeout Errors

Fix GCP Cloud Run timeout errors including HTTP 504 and deadline exceeded. Increase timeout to 3600s, set min-instances, and offload long tasks to Cloud Run Job

Last updated: February 23, 2026

Last verified: February 23, 2026

2,144 words

Key Takeaways

Cloud Run's default request timeout is 300 seconds; any HTTP request exceeding this limit returns HTTP 504 Gateway Timeout—raise it up to 3600 seconds for processing-heavy endpoints
Container startup timeouts fire when your process fails to listen on the PORT environment variable within approximately 4 minutes, commonly caused by heavy model loading or slow dependency initialization
CPU is throttled to near-zero between requests by default, causing cold-start latency spikes that cascade into visible timeout errors for the first user after an idle period
Workloads exceeding 3600 seconds must move to Cloud Run Jobs (168-hour max task timeout) or Cloud Tasks—standard HTTP services cannot accommodate them
Quick fix: gcloud run services update SERVICE --timeout=3600 --region=REGION raises the request limit to the maximum (1 hour)

Fix Approaches Compared
Method	When to Use	Time to Implement	Risk
Increase --timeout to 3600s	Request genuinely needs more processing time	< 5 min	Low
Set --min-instances=1	Cold start latency is triggering first-request timeouts	< 5 min	Low (minor cost increase)
Optimize container startup	Startup probe fails; container takes >30s to bind PORT	Hours–days	Low
Offload to Cloud Tasks	Background or async work where the caller should not wait	Hours	Medium (architecture change)
Use Cloud Run Jobs	Batch processing, ETL, or jobs exceeding 1 hour	Hours	Medium (new job resource)
Disable CPU throttling (--no-cpu-throttling)	Compute-heavy workloads stall mid-request under load	< 5 min	Low (cost increase)

Understanding GCP Cloud Run Timeouts

Cloud Run timeouts are not a single error—they appear at multiple layers of the request lifecycle. Identifying the correct layer is what determines which fix to apply.

The Three Timeout Layers

1. Request Timeout (HTTP 504 Gateway Timeout)

This is the most common timeout. Cloud Run enforces a hard limit on how long any single HTTP request can take. The default is 300 seconds (5 minutes). When a request exceeds this limit, Cloud Run terminates it and the client receives:

HTTP/1.1 504 Gateway Timeout

In Cloud Logging, you will see entries like:

httpRequest.status: 504
textPayload: The request was aborted because there was no available instance.

The maximum configurable timeout is 3600 seconds (1 hour).

2. Container Startup Timeout

When Cloud Run spins up a new instance, it expects the process to listen on the PORT environment variable within the startup window (approximately 4 minutes). If initialization takes too long—loading ML models, warming large caches, or building slow connection pools—startup fails and the triggering request receives:

Container failed to start and listen on the port defined by the PORT environment variable within the allotted startup time.

This typically surfaces as HTTP 500 or 503 on the first request after a scale-out event.

3. Downstream Dependency Timeout

Your container starts and responds quickly in isolation, but makes blocking calls to Cloud SQL, Firestore, an external REST API, or another Cloud Run service that hangs. The Cloud Run request timeout eventually fires. This manifests as:

gRPC status DEADLINE_EXCEEDED
Python: requests.exceptions.ReadTimeout
Node.js: Error: ETIMEDOUT or AbortError
Java: java.net.SocketTimeoutException
Go: context deadline exceeded

Step 1: Diagnose Which Timeout You Have

Check Cloud Logging for the 504 signature:

gcloud logging read \
  'resource.type="cloud_run_revision" AND httpRequest.status=504' \
  --project=PROJECT_ID \
  --limit=20 \
  --format=json

Check for container startup failures:

gcloud logging read \
  'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
  --project=PROJECT_ID \
  --limit=10

Verify your current timeout setting:

gcloud run services describe SERVICE_NAME \
  --region=REGION \
  --format='value(spec.template.spec.timeoutSeconds)'

Identify slow requests in Cloud Monitoring:

Navigate to Cloud Run > Metrics > Request Latency and examine the p99 percentile. If p99 approaches your configured timeout, requests are hitting the ceiling under normal or peak load.

Step 2: Fix Request Timeout Issues

Option A — Increase the request timeout

For requests that legitimately need more processing time (file exports, ML inference, data transformation):

gcloud run services update SERVICE_NAME \
  --timeout=3600 \
  --region=REGION

Declare it in service.yaml for repeatable deployments:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-service
spec:
  template:
    spec:
      timeoutSeconds: 3600
      containers:
        - image: gcr.io/PROJECT_ID/IMAGE

Apply with:

gcloud run services replace service.yaml --region=REGION

Option B — Offload to Cloud Tasks for async processing

For work where the client should not wait:

Receive the request, validate input, and enqueue a Cloud Task pointing to a dedicated handler URL.
Return 202 Accepted immediately with a task or job ID.
Cloud Tasks invokes your handler with its own retry and timeout configuration (up to 30 minutes per attempt).
The caller polls a status endpoint or receives a webhook callback when done.

Option C — Use Cloud Run Jobs for batch work

For workloads with no HTTP client waiting (ETL pipelines, reports, ML training):

gcloud run jobs create my-batch-job \
  --image=gcr.io/PROJECT_ID/IMAGE \
  --task-timeout=86400 \
  --region=REGION

Cloud Run Jobs support task timeouts up to 168 hours (7 days).

Step 3: Fix Container Startup Timeouts

Minimize container startup time:

Use smaller base images such as Alpine or distroless instead of full Debian or Ubuntu.
Move heavy initialization (model loading, cache warmup, DB pool creation) out of the startup path and into the first request handler or a background goroutine.
Reduce Dockerfile layer count to speed up image pulls from Artifact Registry.

Configure a startup probe to signal readiness precisely:

containers:
  - image: gcr.io/PROJECT_ID/IMAGE
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 5
      failureThreshold: 12
      timeoutSeconds: 3

With failureThreshold: 12 and periodSeconds: 5, Cloud Run waits up to 60 seconds for your health endpoint before declaring startup failed.

Eliminate cold starts with minimum instances:

gcloud run services update SERVICE_NAME \
  --min-instances=1 \
  --region=REGION

This keeps one instance always warm, preventing the first request after an idle period from paying the full cold start penalty.

Step 4: Fix Downstream Dependency Timeouts

Always set explicit timeouts on outbound calls. Never rely on Cloud Run's request timeout as an implicit downstream timeout.

Python (requests):

response = requests.get(
    'https://api.example.com/data',
    timeout=(5, 30)  # (connect_timeout_secs, read_timeout_secs)
)

Node.js (fetch API):

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000);
try {
  const response = await fetch(url, { signal: controller.signal });
} finally {
  clearTimeout(timeoutId);
}

Cloud SQL via Unix socket: Include connect_timeout in the connection string and configure the pool with explicit limits:

postgresql+pg8000://user:pass@/dbname?unix_sock=/cloudsql/PROJECT:REGION:INSTANCE&connect_timeout=10

Set pool_timeout and statement_timeout to values well below your Cloud Run request timeout so pool exhaustion surfaces as an explicit error rather than a silent hang that reaches the outer timeout.

Step 5: CPU Throttling and Concurrency Tuning

By default, Cloud Run throttles CPU to near-zero when no requests are being processed. This causes:

Background threads and goroutines to stall between requests
Database keep-alives to expire, forcing expensive reconnections on the next request
In-container scheduled tasks to fire unreliably

Allocate CPU always-on for compute-heavy services:

gcloud run services update SERVICE_NAME \
  --no-cpu-throttling \
  --region=REGION

Tune concurrency to your workload type:

CPU-bound requests: --concurrency=1 so each instance handles one request at a time without starvation
I/O-bound requests: --concurrency=80 or higher to maximize throughput per instance

Mismatched concurrency settings cause CPU starvation under load, manifesting as latency spikes that cascade into timeout failures.

Monitoring and Alerting

Track these key Cloud Monitoring metrics to catch timeout regressions proactively:

run.googleapis.com/request_latencies — watch p99 approaching your configured timeout value
run.googleapis.com/request_count filtered by response_code_class=5xx or response_code=504
run.googleapis.com/container/instance_count — sudden drops can indicate crash-loops from startup timeouts

Create a Cloud Monitoring alerting policy that fires when the 504 error rate exceeds 1% of total requests over a 5-minute window to catch regressions before they impact significant user traffic.

Frequently Asked Questions

bash

#!/usr/bin/env bash
# GCP Cloud Run Timeout — Diagnostic & Fix Script
# Prerequisites: gcloud CLI authenticated with sufficient IAM permissions
# Usage: PROJECT_ID=my-project SERVICE=my-service REGION=us-central1 bash fix-cloudrun-timeout.sh

set -euo pipefail

PROJECT_ID="${PROJECT_ID:?Error: export PROJECT_ID first}"
SERVICE="${SERVICE:?Error: export SERVICE first}"
REGION="${REGION:-us-central1}"

echo "=== [1] Current timeout and scaling configuration ==="
gcloud run services describe "${SERVICE}" \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --format='table(spec.template.spec.timeoutSeconds,spec.template.spec.containerConcurrency,spec.template.metadata.annotations)'

echo ""
echo "=== [2] Recent HTTP 504 Gateway Timeout errors (last 2 hours) ==="
gcloud logging read \
  'resource.type="cloud_run_revision" AND httpRequest.status=504' \
  --project="${PROJECT_ID}" \
  --freshness=2h \
  --limit=25 \
  --format='table(timestamp,httpRequest.latency,httpRequest.requestUrl)'

echo ""
echo "=== [3] Container startup failures ==="
gcloud logging read \
  'resource.type="cloud_run_revision" AND textPayload:"failed to start"' \
  --project="${PROJECT_ID}" \
  --freshness=2h \
  --limit=10 \
  --format='table(timestamp,textPayload)'

echo ""
echo "=== [4] Current minimum instances setting ==="
gcloud run services describe "${SERVICE}" \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --format='value(spec.template.metadata.annotations)'

echo ""
echo "=== [5] Recommended fixes — uncomment and run the relevant line ==="
echo ""
echo "# Fix 1: Raise request timeout to the maximum (3600 seconds = 1 hour)"
echo "# gcloud run services update ${SERVICE} --timeout=3600 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 2: Keep one warm instance to eliminate cold starts"
echo "# gcloud run services update ${SERVICE} --min-instances=1 --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 3: Disable CPU throttling for compute-intensive workloads"
echo "# gcloud run services update ${SERVICE} --no-cpu-throttling --region=${REGION} --project=${PROJECT_ID}"
echo ""
echo "# Fix 4: Create a Cloud Run Job for long-running batch work (up to 168-hour tasks)"
echo "# gcloud run jobs create ${SERVICE}-job --image=IMAGE --task-timeout=86400 --region=${REGION} --project=${PROJECT_ID}"

echo ""
echo "Diagnostic complete. Review output above before applying any fix."

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps, SRE, and cloud infrastructure engineers with production experience operating workloads on GCP, AWS, and Azure. Our troubleshooting guides are grounded in real incident post-mortems and cross-referenced with official platform documentation.

Sources

Explore More Cloud Infrastructure Guides

AWS CloudFront 403 Forbidden: Complete Troubleshooting Guide (Rate Limits, Timeouts & Fixes)

Fix AWS CloudFront 403 Forbidden errors fast. Step-by-step diagnosis covering S3 OAC misconfig, WAF blocks, geo-restrictions, signed URL expiry, and rate limits

AWS ECS 502 Bad Gateway: Complete Troubleshooting Guide

Fix AWS ECS 502 Bad Gateway errors fast. Covers health check misconfig, security group blocks, port mismatches, and timeout issues with exact CLI commands.

AWS Lambda Timeout, 403 Forbidden, 502 Bad Gateway & Throttling: The Complete Troubleshooting Guide

Fix AWS Lambda timeout, 403 Forbidden, 502 Bad Gateway, throttling, and access denied errors with step-by-step diagnosis commands and proven configuration fixes