Why does my Lambda function timeout only in production but not locally?

Local environments have no network latency to AWS services and no VPC routing. In production, VPC-attached functions must traverse NAT Gateway, and cold starts include ENI attachment time. Additionally, production databases may be under load causing query latency spikes. Enable X-Ray in production and compare Initialization Duration vs. function Duration in REPORT lines to isolate whether the overhead is cold-start or execution time.

My Lambda has the right IAM policy but still gets AccessDeniedException — why?

Three common causes: (1) The policy uses a wildcard resource but an SCP (Service Control Policy) at the AWS Organization level restricts the action — check with your cloud team. (2) The policy was just attached and Lambda is reusing a cached execution environment with the old credentials — force a new cold start by updating the function configuration (e.g., add an environment variable). (3) The resource ARN in the policy doesn't match exactly — especially watch for region or account ID mismatches in cross-account scenarios.

How do I prevent API Gateway from returning 502 when Lambda takes longer than 29 seconds?

API Gateway has a hard 29-second integration timeout that cannot be increased. Redesign long-running operations using the async pattern: have Lambda return immediately with a 202 Accepted and a job ID, process the work asynchronously, and expose a polling endpoint or use WebSockets/SNS to notify the client on completion. Alternatively, use Application Load Balancer (ALB) as the trigger — ALB has a configurable timeout up to 4,000 seconds.

What is the difference between Lambda throttling and Lambda rate limiting?

Lambda throttling (TooManyRequestsException) occurs when concurrent executions exceed your account or function limit — it is a capacity control. Rate limiting in the context of Lambda usually refers to the AWS API call rate limits (e.g., InvokeFunction API: 3,500 TPS burst, 500 TPS steady state in most regions). If you are invoking Lambda from application code in a tight loop, you can hit the InvokeFunction API rate limit independently of execution concurrency. Use AWS SDK retry configuration with adaptive mode to handle both.

After fixing my IAM role, Lambda still returns 403 — what am I missing?

403 from API Gateway to Lambda is a resource-based policy issue on the Lambda function itself, not the execution role. The execution role controls what Lambda can do (call other AWS services). The resource-based policy controls who can invoke Lambda. Run `aws lambda get-policy --function-name my-function` to inspect it. If the policy is missing or the source ARN is wrong (e.g., wrong API Gateway ARN or stage), API Gateway cannot call Lambda regardless of IAM policies.

AWS Lambda Timeout, 403 Forbidden, 502 Bad Gateway & Throttling: The Complete Troubleshooting Guide

Fix AWS Lambda timeout, 403 Forbidden, 502 Bad Gateway, throttling, and access denied errors with step-by-step diagnosis commands and proven configuration fixes

Last updated: February 23, 2026

Last verified: February 23, 2026

2,109 words

Key Takeaways

Lambda timeouts are caused by insufficient timeout settings (default 3s), blocked VPC egress, cold-start overhead, or downstream dependencies hanging — increase timeout up to 15 minutes and add connection pooling
403 Forbidden and Access Denied errors stem from missing IAM permissions on the function's execution role or resource-based policy; 502 Bad Gateway from API Gateway means Lambda returned a malformed response or crashed during execution
Throttling (429 TooManyRequestsException) occurs when concurrent executions exceed the account limit (default 1,000) or a reserved concurrency cap — use provisioned concurrency, SQS buffering, or request a limit increase
Service Unavailable (503) and rate limit errors require exponential backoff with jitter; never use fixed-interval retries against AWS APIs
Always check CloudWatch Logs (/aws/lambda/<function-name>), X-Ray traces, and the Lambda function's Last Invocation metrics before guessing at a root cause

Fix Approaches Compared
Error / Symptom	Root Cause	Fix Method	Time to Apply	Risk
Task timed out after X seconds	Timeout too short or downstream hang	Increase timeout + add async pattern	< 5 min	Low
403 Forbidden (API Gateway)	Missing resource-based policy	Add lambda:InvokeFunction permission	< 5 min	Low
AccessDeniedException in logs	IAM role missing action	Attach inline or managed IAM policy	5-10 min	Low
502 Bad Gateway	Malformed Lambda response body	Fix response JSON structure in handler	10-30 min	Medium
TooManyRequestsException (throttle)	Concurrency limit hit	Reserve concurrency + SQS buffer	30-60 min	Medium
ServiceUnavailableException	Transient AWS service fault	Exponential backoff + retry logic	15-30 min	Low
ENI creation timeout (VPC)	Subnet has no free IPs / NAT missing	Expand CIDR or add NAT Gateway	30-90 min	High
Cold start > timeout	Large deployment package	Lambda SnapStart or provisioned concurrency	1-2 hrs	Medium

Understanding AWS Lambda Error Patterns

AWS Lambda failures fall into four categories: execution errors (your code or its timeout), permission errors (IAM and resource policies), integration errors (API Gateway ↔ Lambda contract violations), and capacity errors (throttling and service limits). Each category requires a different diagnostic path.

Diagnosing Lambda Timeout (`Task timed out after X.XX seconds`)

The most common Lambda error. You will see this in CloudWatch Logs:

[ERROR] Runtime.ExitError
RequestId: abc123 Error: Runtime exited with error: signal: killed
END RequestId: abc123
REPORT RequestId: abc123 Duration: 3000.00 ms Billed Duration: 3000 ms Init Duration: 450.12 ms

Step 1 — Identify where time is being spent. Enable AWS X-Ray active tracing and add subsegment annotations around external calls. If X-Ray is not available, add console.time() / console.timeEnd() (Node.js) or time.perf_counter() (Python) around each I/O boundary.

Step 2 — Check VPC configuration. If the function is VPC-attached, Lambda must create an Elastic Network Interface (ENI). Subnet exhaustion and missing NAT Gateway routes are the top causes of cold-start timeouts. Verify:

aws ec2 describe-network-interfaces \
  --filters Name=description,Values="AWS Lambda VPC ENI*" \
  --query 'NetworkInterfaces[].{SubnetId:SubnetId,Status:Status,PrivateIp:PrivateIpAddress}'

Step 3 — Check downstream dependencies. Lambda shares nothing between invocations by default. If your function calls RDS, ElastiCache, or an external HTTP API, verify those endpoints are reachable and not saturated. For RDS, use RDS Proxy to pool connections — each Lambda invocation opening its own DB connection is a common cause of timeouts under load.

Step 4 — Increase timeout (to a sensible max). For background jobs use up to 900 seconds (15 minutes). For API-backed functions keep under 29 seconds (API Gateway hard limit is 29s):

aws lambda update-function-configuration \
  --function-name my-function \
  --timeout 30

Diagnosing 403 Forbidden and Access Denied

Two distinct surfaces produce these errors.

API Gateway → Lambda: 403 Forbidden

This means API Gateway does not have permission to invoke the Lambda function. The resource-based policy on the Lambda function must grant lambda:InvokeFunction to the API Gateway principal.

Error in API Gateway test console:

{
  "message": "Forbidden"
}

Fix — add the invocation permission:

aws lambda add-permission \
  --function-name my-function \
  --statement-id apigateway-prod \
  --action lambda:InvokeFunction \
  --principal apigateway.amazonaws.com \
  --source-arn "arn:aws:execute-api:us-east-1:123456789:abc123def/*/GET/endpoint"

IAM execution role: AccessDeniedException

You will see this in CloudWatch Logs when the Lambda tries to call another AWS service:

botocore.exceptions.ClientError: An error occurred (AccessDeniedException)
when calling the GetSecretValue operation: User:
arn:aws:sts::123456789:assumed-role/my-lambda-role/my-function
is not authorized to perform: secretsmanager:GetSecretValue
on resource: arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret

Fix — attach the required policy to the execution role:

aws iam put-role-policy \
  --role-name my-lambda-role \
  --policy-name SecretsAccess \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret*"
    }]
  }'

Always use the principle of least privilege — scope the Resource to the exact ARN, not *.

Diagnosing 502 Bad Gateway

When API Gateway returns 502 Bad Gateway, it received a response from Lambda that does not conform to the expected integration response format. The Lambda function itself did not time out — it returned something malformed.

Common causes:

Handler returned None / undefined instead of a dict/object with statusCode
body field is not a string (must be JSON-stringified)
Unhandled exception propagated to the runtime

API Gateway execution logs (enable in Stage settings) will show:

Endpoint response body before transformations: null
Execution failed due to configuration error: Malformed Lambda proxy response

Correct response format (Python):

def handler(event, context):
    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"message": "ok"})  # body MUST be a string
    }

Common mistake — returning a dict as body directly:

# WRONG - causes 502
return {"statusCode": 200, "body": {"message": "ok"}}

# RIGHT
return {"statusCode": 200, "body": json.dumps({"message": "ok"})}

Diagnosing Lambda Throttling and Rate Limits

Throttled invocations return TooManyRequestsException (HTTP 429). There are two throttle types:

Account-level concurrency limit: Default 1,000 concurrent executions per region. Shared across all functions.
Function-level reserved concurrency: If a function has reserved concurrency set to N, it will throttle at N concurrent executions regardless of account limit.

Check current account limits and usage:

# View account concurrency limit
aws lambda get-account-settings

# View per-function reserved concurrency
aws lambda get-function-concurrency --function-name my-function

# View throttle metrics for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Throttles \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Sum

Fix options in order of preference:

SQS as a buffer: Place an SQS queue in front of Lambda. Lambda polls at a controlled rate, smoothing bursts. Set MaximumConcurrency on the event source mapping.
Provisioned concurrency: Keeps N execution environments warm. Eliminates cold starts and guarantees capacity.
Limit increase request: Submit via AWS Support for account-level concurrency increases (allow 48-72 hours).

# Set provisioned concurrency on an alias
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 50

Diagnosing Service Unavailable (503)

ServiceUnavailableException is a transient AWS-side fault. Your code should implement exponential backoff with jitter:

import boto3, time, random
from botocore.config import Config

client = boto3.client('lambda', config=Config(
    retries={
        'max_attempts': 5,
        'mode': 'adaptive'  # adaptive mode adds client-side rate limiting
    }
))

For custom retry logic:

def invoke_with_backoff(client, **kwargs):
    base, cap, attempt = 0.5, 30, 0
    while True:
        try:
            return client.invoke(**kwargs)
        except client.exceptions.ServiceException as e:
            attempt += 1
            if attempt > 5:
                raise
            sleep = min(cap, base * 2 ** attempt) + random.uniform(0, 1)
            time.sleep(sleep)

Centralized Diagnostic Runbook

For any Lambda error, run this triage sequence before changing configuration:

# 1. Tail recent log events
aws logs tail /aws/lambda/my-function --since 1h --format short

# 2. Get last 20 invocation errors
aws logs filter-log-events \
  --log-group-name /aws/lambda/my-function \
  --filter-pattern "?ERROR ?error ?Exception" \
  --start-time $(($(date +%s) - 3600))000 \
  --query 'events[].message'

# 3. Check function configuration
aws lambda get-function-configuration --function-name my-function \
  --query '{Timeout:Timeout,MemorySize:MemorySize,Role:Role,VpcConfig:VpcConfig}'

# 4. Inspect execution role policies
ROLE=$(aws lambda get-function-configuration --function-name my-function \
  --query 'Role' --output text | awk -F/ '{print $NF}')
aws iam list-attached-role-policies --role-name "$ROLE"
aws iam list-role-policies --role-name "$ROLE"

# 5. Check resource-based policy
aws lambda get-policy --function-name my-function --query 'Policy' --output text | python3 -m json.tool

Frequently Asked Questions

bash

#!/usr/bin/env bash
# Lambda Error Triage Script
# Usage: FUNCTION=my-function REGION=us-east-1 bash triage-lambda.sh

set -euo pipefail
FUNCTION=${FUNCTION:?"Set FUNCTION env var"}
REGION=${REGION:-us-east-1}
SINCE=${SINCE:-1h}

echo "=== Lambda Triage: $FUNCTION ==="

echo ""
echo "--- Function Configuration ---"
aws lambda get-function-configuration \
  --function-name "$FUNCTION" --region "$REGION" \
  --query '{Timeout:Timeout,Memory:MemorySize,Runtime:Runtime,Role:Role,VpcConfig:VpcConfig,LastModified:LastModified}'

echo ""
echo "--- Concurrency Settings ---"
aws lambda get-function-concurrency \
  --function-name "$FUNCTION" --region "$REGION" 2>/dev/null || echo "No reserved concurrency set"

echo ""
echo "--- Account Concurrency Limit ---"
aws lambda get-account-settings --region "$REGION" \
  --query 'AccountLimit.{ConcurrentExecutions:ConcurrentExecutions,UnreservedConcurrentExecutions:UnreservedConcurrentExecutions}'

echo ""
echo "--- Resource-Based Policy ---"
aws lambda get-policy \
  --function-name "$FUNCTION" --region "$REGION" \
  --query 'Policy' --output text 2>/dev/null | python3 -m json.tool || echo "No resource-based policy"

echo ""
echo "--- Execution Role Policies ---"
ROLE=$(aws lambda get-function-configuration \
  --function-name "$FUNCTION" --region "$REGION" \
  --query 'Role' --output text | awk -F/ '{print $NF}')
echo "Role: $ROLE"
aws iam list-attached-role-policies --role-name "$ROLE" \
  --query 'AttachedPolicies[].PolicyName'
aws iam list-role-policies --role-name "$ROLE" \
  --query 'PolicyNames'

echo ""
echo "--- Recent Errors (last $SINCE) ---"
aws logs tail "/aws/lambda/$FUNCTION" \
  --since "$SINCE" \
  --filter-pattern '?ERROR ?Exception ?Timeout ?throttl' \
  --region "$REGION" 2>/dev/null | tail -50 || echo "No log group found or no matching events"

echo ""
echo "--- Throttle Metric (last 30 min) ---"
START=$(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || \
  date -u -v-30M +%Y-%m-%dT%H:%M:%SZ)
END=$(date -u +%Y-%m-%dT%H:%M:%SZ)
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Throttles \
  --dimensions Name=FunctionName,Value="$FUNCTION" \
  --start-time "$START" --end-time "$END" \
  --period 300 --statistics Sum \
  --region "$REGION" \
  --query 'sort_by(Datapoints,&Timestamp)[].{Time:Timestamp,Throttles:Sum}'

echo ""
echo "--- Duration P99 (last 30 min) ---"
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value="$FUNCTION" \
  --start-time "$START" --end-time "$END" \
  --period 300 --statistics p99 \
  --region "$REGION" \
  --query 'sort_by(Datapoints,&Timestamp)[].{Time:Timestamp,P99ms:p99}' 2>/dev/null || echo "Extended statistics require CloudWatch Metric Streams or separate API"

echo ""
echo "=== Triage complete ==="

Error Medic Editorial

The Error Medic Editorial team comprises senior DevOps engineers and SRE practitioners with hands-on experience operating large-scale AWS workloads. We focus on actionable, command-level troubleshooting guides grounded in real production incidents.

Sources

Explore More Cloud Infrastructure Guides

AWS CloudFront 403 Forbidden: Complete Troubleshooting Guide (Rate Limits, Timeouts & Fixes)

Fix AWS CloudFront 403 Forbidden errors fast. Step-by-step diagnosis covering S3 OAC misconfig, WAF blocks, geo-restrictions, signed URL expiry, and rate limits

AWS ECS 502 Bad Gateway: Complete Troubleshooting Guide

Fix AWS ECS 502 Bad Gateway errors fast. Covers health check misconfig, security group blocks, port mismatches, and timeout issues with exact CLI commands.

AWS RDS Access Denied: Fix Connection Refused and Timeout Errors (2024 Guide)

Fix AWS RDS access denied, connection refused, and timeout errors in minutes. Step-by-step guide covering IAM, security groups, VPC, and DB credentials.