Error Medic

AWS Lambda Timeout, 403 Forbidden, 502 Bad Gateway & Throttling: The Complete Troubleshooting Guide

Fix AWS Lambda timeout, 403 Forbidden, 502 Bad Gateway, throttling, and access denied errors with step-by-step diagnosis commands and proven configuration fixes

Last updated:
Last verified:
2,109 words
Key Takeaways
  • Lambda timeouts are caused by insufficient timeout settings (default 3s), blocked VPC egress, cold-start overhead, or downstream dependencies hanging — increase timeout up to 15 minutes and add connection pooling
  • 403 Forbidden and Access Denied errors stem from missing IAM permissions on the function's execution role or resource-based policy; 502 Bad Gateway from API Gateway means Lambda returned a malformed response or crashed during execution
  • Throttling (429 TooManyRequestsException) occurs when concurrent executions exceed the account limit (default 1,000) or a reserved concurrency cap — use provisioned concurrency, SQS buffering, or request a limit increase
  • Service Unavailable (503) and rate limit errors require exponential backoff with jitter; never use fixed-interval retries against AWS APIs
  • Always check CloudWatch Logs (/aws/lambda/<function-name>), X-Ray traces, and the Lambda function's Last Invocation metrics before guessing at a root cause
Fix Approaches Compared
Error / SymptomRoot CauseFix MethodTime to ApplyRisk
Task timed out after X secondsTimeout too short or downstream hangIncrease timeout + add async pattern< 5 minLow
403 Forbidden (API Gateway)Missing resource-based policyAdd lambda:InvokeFunction permission< 5 minLow
AccessDeniedException in logsIAM role missing actionAttach inline or managed IAM policy5-10 minLow
502 Bad GatewayMalformed Lambda response bodyFix response JSON structure in handler10-30 minMedium
TooManyRequestsException (throttle)Concurrency limit hitReserve concurrency + SQS buffer30-60 minMedium
ServiceUnavailableExceptionTransient AWS service faultExponential backoff + retry logic15-30 minLow
ENI creation timeout (VPC)Subnet has no free IPs / NAT missingExpand CIDR or add NAT Gateway30-90 minHigh
Cold start > timeoutLarge deployment packageLambda SnapStart or provisioned concurrency1-2 hrsMedium

Understanding AWS Lambda Error Patterns

AWS Lambda failures fall into four categories: execution errors (your code or its timeout), permission errors (IAM and resource policies), integration errors (API Gateway ↔ Lambda contract violations), and capacity errors (throttling and service limits). Each category requires a different diagnostic path.


Diagnosing Lambda Timeout (Task timed out after X.XX seconds)

The most common Lambda error. You will see this in CloudWatch Logs:

[ERROR] Runtime.ExitError
RequestId: abc123 Error: Runtime exited with error: signal: killed
END RequestId: abc123
REPORT RequestId: abc123 Duration: 3000.00 ms Billed Duration: 3000 ms Init Duration: 450.12 ms

Step 1 — Identify where time is being spent. Enable AWS X-Ray active tracing and add subsegment annotations around external calls. If X-Ray is not available, add console.time() / console.timeEnd() (Node.js) or time.perf_counter() (Python) around each I/O boundary.

Step 2 — Check VPC configuration. If the function is VPC-attached, Lambda must create an Elastic Network Interface (ENI). Subnet exhaustion and missing NAT Gateway routes are the top causes of cold-start timeouts. Verify:

aws ec2 describe-network-interfaces \
  --filters Name=description,Values="AWS Lambda VPC ENI*" \
  --query 'NetworkInterfaces[].{SubnetId:SubnetId,Status:Status,PrivateIp:PrivateIpAddress}'

Step 3 — Check downstream dependencies. Lambda shares nothing between invocations by default. If your function calls RDS, ElastiCache, or an external HTTP API, verify those endpoints are reachable and not saturated. For RDS, use RDS Proxy to pool connections — each Lambda invocation opening its own DB connection is a common cause of timeouts under load.

Step 4 — Increase timeout (to a sensible max). For background jobs use up to 900 seconds (15 minutes). For API-backed functions keep under 29 seconds (API Gateway hard limit is 29s):

aws lambda update-function-configuration \
  --function-name my-function \
  --timeout 30

Diagnosing 403 Forbidden and Access Denied

Two distinct surfaces produce these errors.

API Gateway → Lambda: 403 Forbidden

This means API Gateway does not have permission to invoke the Lambda function. The resource-based policy on the Lambda function must grant lambda:InvokeFunction to the API Gateway principal.

Error in API Gateway test console:

{
  "message": "Forbidden"
}

Fix — add the invocation permission:

aws lambda add-permission \
  --function-name my-function \
  --statement-id apigateway-prod \
  --action lambda:InvokeFunction \
  --principal apigateway.amazonaws.com \
  --source-arn "arn:aws:execute-api:us-east-1:123456789:abc123def/*/GET/endpoint"

IAM execution role: AccessDeniedException

You will see this in CloudWatch Logs when the Lambda tries to call another AWS service:

botocore.exceptions.ClientError: An error occurred (AccessDeniedException)
when calling the GetSecretValue operation: User:
arn:aws:sts::123456789:assumed-role/my-lambda-role/my-function
is not authorized to perform: secretsmanager:GetSecretValue
on resource: arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret

Fix — attach the required policy to the execution role:

aws iam put-role-policy \
  --role-name my-lambda-role \
  --policy-name SecretsAccess \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret*"
    }]
  }'

Always use the principle of least privilege — scope the Resource to the exact ARN, not *.


Diagnosing 502 Bad Gateway

When API Gateway returns 502 Bad Gateway, it received a response from Lambda that does not conform to the expected integration response format. The Lambda function itself did not time out — it returned something malformed.

Common causes:

  1. Handler returned None / undefined instead of a dict/object with statusCode
  2. body field is not a string (must be JSON-stringified)
  3. Unhandled exception propagated to the runtime

API Gateway execution logs (enable in Stage settings) will show:

Endpoint response body before transformations: null
Execution failed due to configuration error: Malformed Lambda proxy response

Correct response format (Python):

def handler(event, context):
    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"message": "ok"})  # body MUST be a string
    }

Common mistake — returning a dict as body directly:

# WRONG - causes 502
return {"statusCode": 200, "body": {"message": "ok"}}

# RIGHT
return {"statusCode": 200, "body": json.dumps({"message": "ok"})}

Diagnosing Lambda Throttling and Rate Limits

Throttled invocations return TooManyRequestsException (HTTP 429). There are two throttle types:

  • Account-level concurrency limit: Default 1,000 concurrent executions per region. Shared across all functions.
  • Function-level reserved concurrency: If a function has reserved concurrency set to N, it will throttle at N concurrent executions regardless of account limit.

Check current account limits and usage:

# View account concurrency limit
aws lambda get-account-settings

# View per-function reserved concurrency
aws lambda get-function-concurrency --function-name my-function

# View throttle metrics for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Throttles \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Sum

Fix options in order of preference:

  1. SQS as a buffer: Place an SQS queue in front of Lambda. Lambda polls at a controlled rate, smoothing bursts. Set MaximumConcurrency on the event source mapping.
  2. Provisioned concurrency: Keeps N execution environments warm. Eliminates cold starts and guarantees capacity.
  3. Limit increase request: Submit via AWS Support for account-level concurrency increases (allow 48-72 hours).
# Set provisioned concurrency on an alias
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 50

Diagnosing Service Unavailable (503)

ServiceUnavailableException is a transient AWS-side fault. Your code should implement exponential backoff with jitter:

import boto3, time, random
from botocore.config import Config

client = boto3.client('lambda', config=Config(
    retries={
        'max_attempts': 5,
        'mode': 'adaptive'  # adaptive mode adds client-side rate limiting
    }
))

For custom retry logic:

def invoke_with_backoff(client, **kwargs):
    base, cap, attempt = 0.5, 30, 0
    while True:
        try:
            return client.invoke(**kwargs)
        except client.exceptions.ServiceException as e:
            attempt += 1
            if attempt > 5:
                raise
            sleep = min(cap, base * 2 ** attempt) + random.uniform(0, 1)
            time.sleep(sleep)

Centralized Diagnostic Runbook

For any Lambda error, run this triage sequence before changing configuration:

# 1. Tail recent log events
aws logs tail /aws/lambda/my-function --since 1h --format short

# 2. Get last 20 invocation errors
aws logs filter-log-events \
  --log-group-name /aws/lambda/my-function \
  --filter-pattern "?ERROR ?error ?Exception" \
  --start-time $(($(date +%s) - 3600))000 \
  --query 'events[].message'

# 3. Check function configuration
aws lambda get-function-configuration --function-name my-function \
  --query '{Timeout:Timeout,MemorySize:MemorySize,Role:Role,VpcConfig:VpcConfig}'

# 4. Inspect execution role policies
ROLE=$(aws lambda get-function-configuration --function-name my-function \
  --query 'Role' --output text | awk -F/ '{print $NF}')
aws iam list-attached-role-policies --role-name "$ROLE"
aws iam list-role-policies --role-name "$ROLE"

# 5. Check resource-based policy
aws lambda get-policy --function-name my-function --query 'Policy' --output text | python3 -m json.tool

Frequently Asked Questions

bash
#!/usr/bin/env bash
# Lambda Error Triage Script
# Usage: FUNCTION=my-function REGION=us-east-1 bash triage-lambda.sh

set -euo pipefail
FUNCTION=${FUNCTION:?"Set FUNCTION env var"}
REGION=${REGION:-us-east-1}
SINCE=${SINCE:-1h}

echo "=== Lambda Triage: $FUNCTION ==="

echo ""
echo "--- Function Configuration ---"
aws lambda get-function-configuration \
  --function-name "$FUNCTION" --region "$REGION" \
  --query '{Timeout:Timeout,Memory:MemorySize,Runtime:Runtime,Role:Role,VpcConfig:VpcConfig,LastModified:LastModified}'

echo ""
echo "--- Concurrency Settings ---"
aws lambda get-function-concurrency \
  --function-name "$FUNCTION" --region "$REGION" 2>/dev/null || echo "No reserved concurrency set"

echo ""
echo "--- Account Concurrency Limit ---"
aws lambda get-account-settings --region "$REGION" \
  --query 'AccountLimit.{ConcurrentExecutions:ConcurrentExecutions,UnreservedConcurrentExecutions:UnreservedConcurrentExecutions}'

echo ""
echo "--- Resource-Based Policy ---"
aws lambda get-policy \
  --function-name "$FUNCTION" --region "$REGION" \
  --query 'Policy' --output text 2>/dev/null | python3 -m json.tool || echo "No resource-based policy"

echo ""
echo "--- Execution Role Policies ---"
ROLE=$(aws lambda get-function-configuration \
  --function-name "$FUNCTION" --region "$REGION" \
  --query 'Role' --output text | awk -F/ '{print $NF}')
echo "Role: $ROLE"
aws iam list-attached-role-policies --role-name "$ROLE" \
  --query 'AttachedPolicies[].PolicyName'
aws iam list-role-policies --role-name "$ROLE" \
  --query 'PolicyNames'

echo ""
echo "--- Recent Errors (last $SINCE) ---"
aws logs tail "/aws/lambda/$FUNCTION" \
  --since "$SINCE" \
  --filter-pattern '?ERROR ?Exception ?Timeout ?throttl' \
  --region "$REGION" 2>/dev/null | tail -50 || echo "No log group found or no matching events"

echo ""
echo "--- Throttle Metric (last 30 min) ---"
START=$(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || \
  date -u -v-30M +%Y-%m-%dT%H:%M:%SZ)
END=$(date -u +%Y-%m-%dT%H:%M:%SZ)
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Throttles \
  --dimensions Name=FunctionName,Value="$FUNCTION" \
  --start-time "$START" --end-time "$END" \
  --period 300 --statistics Sum \
  --region "$REGION" \
  --query 'sort_by(Datapoints,&Timestamp)[].{Time:Timestamp,Throttles:Sum}'

echo ""
echo "--- Duration P99 (last 30 min) ---"
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value="$FUNCTION" \
  --start-time "$START" --end-time "$END" \
  --period 300 --statistics p99 \
  --region "$REGION" \
  --query 'sort_by(Datapoints,&Timestamp)[].{Time:Timestamp,P99ms:p99}' 2>/dev/null || echo "Extended statistics require CloudWatch Metric Streams or separate API"

echo ""
echo "=== Triage complete ==="
E

Error Medic Editorial

The Error Medic Editorial team comprises senior DevOps engineers and SRE practitioners with hands-on experience operating large-scale AWS workloads. We focus on actionable, command-level troubleshooting guides grounded in real production incidents.

Sources

Related Articles in AWS Lambda

Explore More Cloud Infrastructure Guides