Why am I getting ThrottlingException even though my overall request volume seems low?

AWS API limits are often enforced on a per-second basis, not per-minute or per-hour. A sudden burst of 50 concurrent requests in a single second can trigger a ThrottlingException, even if those are the only 50 requests you make all day. You must smooth out your traffic or implement backoff.

How do I distinguish between an AWS API rate limit and a standard network timeout?

A rate limit usually returns a specific HTTP 400 or 429 status code with an explicit message like 'ThrottlingException' or 'Rate exceeded'. A timeout typically throws a client-side exception (like ReadTimeoutError) or an HTTP 504 Gateway Timeout, indicating the connection dropped before AWS could return a payload.

Will AWS Support increase all of my API limits if I open a ticket?

No. While data plane limits (like DynamoDB Read Capacity or SES send limits) are routinely increased, many control plane limits (like IAM CreateRole or EC2 Describe operations) are hard limits. Support will often advise you to optimize your architecture instead of raising these.

What is 'Jitter' in the context of API retries, and why is it important?

Jitter is the practice of adding a random amount of time to the standard exponential backoff delay. Without jitter, all failing requests might back off for exactly 2 seconds, retry at the exact same moment, and overwhelm the API again. Jitter spreads these retries out over a time window.

How can I proactively monitor AWS API throttling before it causes an outage?

You can monitor throttling proactively by setting up CloudWatch Alarms on the `ClientErrors` or `ThrottleCount` metrics within the `AWS/Usage` namespace. Additionally, monitor the AWS Personal Health Dashboard for active events related to API degradation.

How to Fix AWS API Rate Limit and Timeout Errors (ThrottlingException & HTTP 429)

Resolve AWS API rate limits (ThrottlingException, HTTP 429) and timeouts. Learn root causes, how to implement exponential backoff, and optimize SDK settings.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,293 words

Key Takeaways

Root Cause 1: Exceeding AWS service-specific account quotas or API burst limits, triggering ThrottlingException or HTTP 429.
Root Cause 2: Inefficient API usage, such as aggressive polling of control-plane APIs (e.g., DescribeInstances) without caching.
Root Cause 3: Default SDK timeout settings that are too short for long-running operations or congested networks, causing ConnectTimeoutError or ReadTimeoutError.
Quick Fix: Implement exponential backoff with jitter, use AWS SDK 'adaptive' retry modes, and extend read timeouts in your client configuration.

AWS API Error Mitigation Strategies
Method	When to Use	Time to Implement	Risk Level
SDK Adaptive Retries & Jitter	Immediate mitigation for frequent ThrottlingExceptions.	Minutes	Low
Service Quota Increase	Sustained high-traffic needs on data-plane APIs.	1-3 Days	Low
Client-Side Caching (e.g., SSM/Secrets)	Heavy read/describe operations on configuration data.	1-2 Days	Medium
Tune SDK Timeouts	Frequent ReadTimeoutError on heavy payloads (e.g., S3).	Minutes	Medium

Understanding AWS API Rate Limits and Timeouts

When integrating with Amazon Web Services (AWS), your applications interact with either the control plane (managing and configuring resources) or the data plane (accessing or mutating data). To maintain service stability and prevent noisy neighbor impact, AWS strictly enforces API rate limits using a token bucket algorithm. When your request volume exceeds the replenished tokens, AWS drops the requests and responds with throttling errors. Conversely, if network latency spikes or a heavy request takes too long, your client may drop the connection, resulting in a timeout error.

Identifying the Exact Error

Before implementing fixes, you must identify the exact exception being thrown by your SDK. Common error signatures include:

botocore.exceptions.ClientError: An error occurred (Throttling) when calling the [Operation] operation: Rate exceeded
ProvisionedThroughputExceededException (Common in DynamoDB)
TooManyRequestsException (HTTP 429)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL

Step 1: Diagnosing the Root Cause

1. Analyze CloudTrail Logs and CloudWatch Metrics Start your investigation in AWS CloudTrail. Filter the event history by Event name or Error code. Look for errorCode: ThrottlingException. Next, navigate to CloudWatch and inspect the Usage namespace for the specific AWS service. Services like API Gateway, DynamoDB, and EC2 publish specific CallCount and ThrottleCount metrics.

2. Differentiate Control Plane vs. Data Plane Limits A common anti-pattern is treating control plane APIs like a database. For instance, constantly calling ec2:DescribeInstances to check server state will rapidly exhaust your limits, as control plane limits are significantly lower than data plane limits (and are rarely increased by AWS Support). If you need state changes, rely on EventBridge events rather than aggressive polling.

3. Identify Timeout Signatures Timeouts are client-side or network-level phenomena. A ConnectTimeoutError usually implies a network configuration issue (e.g., exhausted NAT Gateway ports, restrictive Security Groups, or DNS resolution failures). A ReadTimeoutError indicates the TCP handshake succeeded, but AWS took longer to process the request than your SDK's configured read threshold allows.

Step 2: Fixing the Errors

Fix A: Implement Exponential Backoff, Jitter, and Adaptive Retries

The most robust code-level fix for HTTP 429/Throttling errors is a resilient retry strategy. Default AWS SDK retry counts (often 3 to 5) are insufficient during severe throttling events.

You must implement Exponential Backoff (increasing the delay between retries exponentially) combined with Jitter (adding randomization to the delay). Jitter is crucial because it prevents the "thundering herd" problem—where multiple blocked processes retry at the exact same millisecond, immediately exhausting the rate limit again. Modern AWS SDKs (like Boto3 for Python or the AWS SDK for Go V2) offer built-in adaptive retry modes that handle this automatically by analyzing the token bucket state.

Fix B: Adjust SDK Timeout Configurations

If you are dealing with timeouts rather than throttling, you must override the default SDK configurations.

Connect Timeout: Increase this slightly if you have high-latency routing, but generally keep it low (e.g., 5-10 seconds) to fail fast on dead network paths.
Read Timeout: Increase this substantially (e.g., 60-120 seconds) for operations known to take time, such as large S3 multipart uploads, executing complex Athena queries, or invoking cold-starting Lambda functions.

Fix C: Optimize API Call Patterns (Caching and Batching)

Reduce your aggregate API footprint:

Caching: Do not call secretsmanager:GetSecretValue or ssm:GetParameter on every function invocation. Cache the results in memory and refresh them asynchronously using a Time-To-Live (TTL).
Batching: Utilize batch operations wherever possible. Instead of calling sqs:SendMessage in a loop, aggregate payloads and use sqs:SendMessageBatch. Similarly, use dynamodb:BatchWriteItem instead of individual PutItem requests.

Fix D: Request a Service Quota Increase

If your architecture is optimized, you are caching aggressively, and you are still hitting limits on scalable data-plane resources, you need a quota increase. Navigate to the AWS Service Quotas Console, search for the service and the specific API operation, and submit an increase request. Be prepared to provide AWS Support with your use case, current architecture, and the specific CloudWatch metrics showing the throttling.

Frequently Asked Questions

python

import boto3
from botocore.config import Config
from botocore.exceptions import ClientError

# Custom configuration to handle AWS API Rate Limits and Timeouts
# This config enables 'adaptive' retries (which handles backoff/jitter internally)
# and extends the default connection and read timeouts.
custom_boto_config = Config(
    region_name='us-east-1',
    signature_version='v4',
    retries={
        'max_attempts': 10,  # Increase max attempts (default is usually 4)
        'mode': 'adaptive'   # Dynamically adjusts retry rate based on throttle responses
    },
    connect_timeout=10,      # Seconds to wait to establish a TCP connection
    read_timeout=60          # Seconds to wait for a response from the server
)

# Initialize the AWS client with the custom configuration
ec2_client = boto3.client('ec2', config=custom_boto_config)

try:
    # Example API call that is frequently targeted by rate limits (Control Plane)
    response = ec2_client.describe_instances()
    print(f"Successfully retrieved {len(response.get('Reservations', []))} reservations.")
    
except ClientError as error:
    if error.response['Error']['Code'] == 'ThrottlingException':
        print("CRITICAL: Rate limit exceeded even after 10 adaptive retries. Check quotas.")
    else:
        print(f"AWS API ClientError: {error}")
except Exception as error:
    print(f"A network timeout or system error occurred: {error}")

Error Medic Editorial

Error Medic Editorial is a collective of senior Site Reliability Engineers and Cloud Architects dedicated to documenting, analyzing, and resolving complex infrastructure incidents, cloud rate limits, and systemic deployment failures.

Sources

Explore More API Errors Guides

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI

Azure API Timeout: Fixing 'The request timed out' and 408/504 Errors in Azure APIs

Fix Azure API timeout errors (408, 504, RequestTimeout) fast. Covers ARM, APIM, Function App, and SDK timeouts with real commands and config fixes.