Why am I getting ThrottlingException even though I haven't reached my Service Quota limit?

AWS API limits are often enforced on a per-second basis (Tokens/Second). If your daily or hourly quota is high, but you burst hundreds of requests in a single second, you will exhaust the token bucket instantly and receive a ThrottlingException.

Does AWS SDK automatically retry throttled requests?

Yes, standard AWS SDKs implement default retry behavior (usually up to 3 retries) with basic exponential backoff. However, for aggressive workloads, you may need to configure the SDK to use the 'adaptive' or 'standard' retry mode with higher `max_attempts`.

How do I fix a 'Connect timeout on endpoint URL' error?

This usually indicates a network path issue or resource exhaustion (like SNAT port exhaustion on a NAT Gateway). Ensure your subnet has route access to the AWS service (consider VPC Endpoints) and increase the connect timeout in your SDK configuration.

Are AWS STS AssumeRole calls subject to rate limits?

Yes, AWS STS has strict regional and account-level rate limits. If you have many microservices refreshing credentials simultaneously, you will hit STS throttling. You must implement credential caching to mitigate this.

Can I bypass AWS API rate limits by changing regions?

Most AWS service quotas are region-specific. While technically possible to distribute workloads across regions to bypass a single region's limit, it significantly increases architectural complexity and latency, and is generally not recommended compared to requesting a quota increase.

Fixing 'ThrottlingException: Rate exceeded' and AWS API Timeouts

Resolve AWS API rate limits (ThrottlingException) and timeouts by implementing exponential backoff, jitter, and requesting service quota increases.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,344 words

Key Takeaways

Root Cause 1: Exceeding the allowed API request rate (Requests Per Second) for a specific AWS service, triggering a ThrottlingException.
Root Cause 2: Network latency or backend service degradation causing the AWS SDK to hit its configured read/connect timeout threshold.
Quick Fix Summary: Implement exponential backoff with jitter in your retry logic, and request a Service Quota increase if the baseline traffic genuinely exceeds defaults.

Fix Approaches Compared
Method	When to Use	Time	Risk
Exponential Backoff & Jitter	Client-side mitigation for bursty traffic causing intermittent ThrottlingExceptions.	Medium	Low
Service Quota Increase	Sustained high traffic that consistently hits the default API limits.	Slow (AWS Approval)	None
Caching API Responses	Read-heavy workloads polling the same AWS resources (e.g., DescribeInstances).	Medium	Medium (Stale Data)
Tuning SDK Timeouts	Addressing 'Connect timeout on endpoint' or 'Read timeout' errors.	Fast	Low

Understanding the Error

When interacting with Amazon Web Services (AWS) via the CLI, SDKs, or direct API calls, you may encounter rate-limiting and timeout errors. These mechanisms are designed to protect AWS infrastructure from abuse and ensure fair usage among all tenants. However, for high-throughput applications, they often surface as sudden, disruptive failures.

The most common error messages you will encounter are:

ThrottlingException: Rate exceeded
ProvisionedThroughputExceededException (DynamoDB specific)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the [API] operation: Rate exceeded
Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"
Connect timeout on endpoint URL

AWS APIs operate on a token bucket algorithm. You are granted a specific number of tokens (API calls) per second, with a burst capacity. Once the bucket is empty, subsequent requests are dropped, and AWS returns an HTTP 400 (Bad Request) or HTTP 503 (Service Unavailable) status with a throttling error. Timeouts, on the other hand, usually occur when the SDK waits longer for a response or connection than its configured threshold, which can be caused by network drops or transient AWS backend latency.

Step 1: Diagnose the Bottleneck

Before applying a fix, you must determine whether you are dealing with a hard rate limit or a network timeout. You can use AWS CloudTrail and Amazon CloudWatch to analyze the API calls.

Analyze CloudWatch Metrics: Navigate to CloudWatch and check the Usage metrics for the specific service. Look for CallCount and compare it against your known quotas.
Inspect SDK Logs: Enable debug logging in your AWS SDK (e.g., boto3.set_stream_logger('') in Python). Look for the HTTP status codes. A 429 Too Many Requests or 400 Bad Request with ThrottlingException confirms a rate limit. A timeout will usually throw a lower-level socket or urllib3 exception.
Check Service Quotas: Go to the AWS Service Quotas console to see your current limits for the affected API operation. For example, the DescribeInstances API in EC2 has a strict default limit.

Step 2: Implement Client-Side Resiliency (Exponential Backoff)

The standard, AWS-recommended approach to handling ThrottlingException is implementing exponential backoff with jitter. Most modern AWS SDKs (like Boto3 for Python or the AWS SDK for Node.js) have built-in retry mechanisms, but they may need tuning for high-concurrency environments.

When a request fails due to rate limiting, the client should wait a short amount of time before retrying. If the retry fails, the wait time is doubled (exponential backoff). To prevent the "thundering herd" problem where multiple clients retry at the exact same millisecond, you add "jitter" (randomized delay) to the wait time.

If you are writing custom HTTP clients or need aggressive retry policies, you must implement this logic manually. See the Code Block section for a robust Python example.

Step 3: Request a Service Quota Increase

If your application legitimately requires a higher API request rate than the default limits, you must request a quota increase. This is not an immediate fix; it requires AWS support approval.

Open the Service Quotas console in AWS.
Select the AWS service (e.g., Amazon EC2).
Search for the specific API or quota (e.g., DescribeInstances, RunInstances).
Select the quota and click Request quota increase.
Enter the desired value and provide a clear, detailed business justification. AWS Support will review the request and may ask for architectural details to ensure you aren't masking a poorly optimized application.

Step 4: Optimize API Usage (Caching and Batching)

Often, rate limits are hit because of inefficient API usage rather than pure scale.

Caching: If multiple microservices frequently call sts:AssumeRole or ec2:DescribeInstances, implement a local cache (like Redis or an in-memory TTL cache) to store the results. AWS responses rarely change second-by-second.
Batching: Instead of iterating through a list of 100 instance IDs and calling DescribeInstances 100 times, pass all 100 IDs in a single DescribeInstances API call. Most AWS 'Describe' APIs support bulk queries.

Step 5: Tuning SDK Timeouts

If you are experiencing timeouts (Read timeout or Connect timeout), the issue is often related to the SDK's default configuration or the network path (e.g., NAT Gateway exhaustion).

AWS SDKs have two primary timeout settings:

Connect Timeout: The time the SDK will wait to establish a TCP connection to the AWS endpoint.
Read Timeout: The time the SDK will wait for a response from the server after the connection is established.

You can override these defaults. For instance, in Boto3, you use the botocore.config.Config object. In heavily loaded Lambda functions or containers running in congested subnets, slightly increasing the connect timeout (e.g., from 1 second to 5 seconds) can resolve intermittent failures.

Frequently Asked Questions

python

import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
import time
import random

# 1. Configuring SDK Timeouts and Built-in Retries
# Using the 'adaptive' retry mode which dynamically limits the rate of requests
custom_config = Config(
    retries = {
        'max_attempts': 10,
        'mode': 'adaptive'
    },
    connect_timeout=5, # Increase connect timeout to 5 seconds
    read_timeout=60    # Increase read timeout to 60 seconds
)

ec2_client = boto3.client('ec2', config=custom_config)

# 2. Manual Exponential Backoff with Jitter (If not relying on SDK)
def call_aws_with_backoff(api_func, *args, **kwargs):
    max_retries = 5
    base_delay = 1 # seconds

    for attempt in range(max_retries):
        try:
            return api_func(*args, **kwargs)
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt == max_retries - 1:
                    raise # Re-raise if max retries reached
                
                # Calculate delay: (2 ^ attempt) * base_delay + random jitter
                delay = (2 ** attempt) * base_delay
                jitter = random.uniform(0, 0.5 * delay)
                sleep_time = delay + jitter
                
                print(f"Throttled. Retrying in {sleep_time:.2f} seconds...")
                time.sleep(sleep_time)
            else:
                raise # Re-raise non-throttling errors

# Example usage of manual backoff
# response = call_aws_with_backoff(ec2_client.describe_instances)

Error Medic Editorial

The Error Medic Editorial team consists of senior Site Reliability Engineers and Cloud Architects dedicated to documenting obscure infrastructure edge cases and providing actionable, production-ready solutions.

Sources

Explore More API Errors Guides

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI

Azure API Timeout: Fixing 'The request timed out' and 408/504 Errors in Azure APIs

Fix Azure API timeout errors (408, 504, RequestTimeout) fast. Covers ARM, APIM, Function App, and SDK timeouts with real commands and config fixes.