Error Medic

Fixing 'ThrottlingException: Rate exceeded' and AWS API Timeouts

Resolve AWS API rate limits (ThrottlingException) and timeouts by implementing exponential backoff, jitter, and requesting service quota increases.

Last updated:
Last verified:
1,344 words
Key Takeaways
  • Root Cause 1: Exceeding the allowed API request rate (Requests Per Second) for a specific AWS service, triggering a ThrottlingException.
  • Root Cause 2: Network latency or backend service degradation causing the AWS SDK to hit its configured read/connect timeout threshold.
  • Quick Fix Summary: Implement exponential backoff with jitter in your retry logic, and request a Service Quota increase if the baseline traffic genuinely exceeds defaults.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Exponential Backoff & JitterClient-side mitigation for bursty traffic causing intermittent ThrottlingExceptions.MediumLow
Service Quota IncreaseSustained high traffic that consistently hits the default API limits.Slow (AWS Approval)None
Caching API ResponsesRead-heavy workloads polling the same AWS resources (e.g., DescribeInstances).MediumMedium (Stale Data)
Tuning SDK TimeoutsAddressing 'Connect timeout on endpoint' or 'Read timeout' errors.FastLow

Understanding the Error

When interacting with Amazon Web Services (AWS) via the CLI, SDKs, or direct API calls, you may encounter rate-limiting and timeout errors. These mechanisms are designed to protect AWS infrastructure from abuse and ensure fair usage among all tenants. However, for high-throughput applications, they often surface as sudden, disruptive failures.

The most common error messages you will encounter are:

  • ThrottlingException: Rate exceeded
  • ProvisionedThroughputExceededException (DynamoDB specific)
  • botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the [API] operation: Rate exceeded
  • Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"
  • Connect timeout on endpoint URL

AWS APIs operate on a token bucket algorithm. You are granted a specific number of tokens (API calls) per second, with a burst capacity. Once the bucket is empty, subsequent requests are dropped, and AWS returns an HTTP 400 (Bad Request) or HTTP 503 (Service Unavailable) status with a throttling error. Timeouts, on the other hand, usually occur when the SDK waits longer for a response or connection than its configured threshold, which can be caused by network drops or transient AWS backend latency.

Step 1: Diagnose the Bottleneck

Before applying a fix, you must determine whether you are dealing with a hard rate limit or a network timeout. You can use AWS CloudTrail and Amazon CloudWatch to analyze the API calls.

  1. Analyze CloudWatch Metrics: Navigate to CloudWatch and check the Usage metrics for the specific service. Look for CallCount and compare it against your known quotas.
  2. Inspect SDK Logs: Enable debug logging in your AWS SDK (e.g., boto3.set_stream_logger('') in Python). Look for the HTTP status codes. A 429 Too Many Requests or 400 Bad Request with ThrottlingException confirms a rate limit. A timeout will usually throw a lower-level socket or urllib3 exception.
  3. Check Service Quotas: Go to the AWS Service Quotas console to see your current limits for the affected API operation. For example, the DescribeInstances API in EC2 has a strict default limit.

Step 2: Implement Client-Side Resiliency (Exponential Backoff)

The standard, AWS-recommended approach to handling ThrottlingException is implementing exponential backoff with jitter. Most modern AWS SDKs (like Boto3 for Python or the AWS SDK for Node.js) have built-in retry mechanisms, but they may need tuning for high-concurrency environments.

When a request fails due to rate limiting, the client should wait a short amount of time before retrying. If the retry fails, the wait time is doubled (exponential backoff). To prevent the "thundering herd" problem where multiple clients retry at the exact same millisecond, you add "jitter" (randomized delay) to the wait time.

If you are writing custom HTTP clients or need aggressive retry policies, you must implement this logic manually. See the Code Block section for a robust Python example.

Step 3: Request a Service Quota Increase

If your application legitimately requires a higher API request rate than the default limits, you must request a quota increase. This is not an immediate fix; it requires AWS support approval.

  1. Open the Service Quotas console in AWS.
  2. Select the AWS service (e.g., Amazon EC2).
  3. Search for the specific API or quota (e.g., DescribeInstances, RunInstances).
  4. Select the quota and click Request quota increase.
  5. Enter the desired value and provide a clear, detailed business justification. AWS Support will review the request and may ask for architectural details to ensure you aren't masking a poorly optimized application.

Step 4: Optimize API Usage (Caching and Batching)

Often, rate limits are hit because of inefficient API usage rather than pure scale.

  • Caching: If multiple microservices frequently call sts:AssumeRole or ec2:DescribeInstances, implement a local cache (like Redis or an in-memory TTL cache) to store the results. AWS responses rarely change second-by-second.
  • Batching: Instead of iterating through a list of 100 instance IDs and calling DescribeInstances 100 times, pass all 100 IDs in a single DescribeInstances API call. Most AWS 'Describe' APIs support bulk queries.

Step 5: Tuning SDK Timeouts

If you are experiencing timeouts (Read timeout or Connect timeout), the issue is often related to the SDK's default configuration or the network path (e.g., NAT Gateway exhaustion).

AWS SDKs have two primary timeout settings:

  • Connect Timeout: The time the SDK will wait to establish a TCP connection to the AWS endpoint.
  • Read Timeout: The time the SDK will wait for a response from the server after the connection is established.

You can override these defaults. For instance, in Boto3, you use the botocore.config.Config object. In heavily loaded Lambda functions or containers running in congested subnets, slightly increasing the connect timeout (e.g., from 1 second to 5 seconds) can resolve intermittent failures.

Frequently Asked Questions

python
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
import time
import random

# 1. Configuring SDK Timeouts and Built-in Retries
# Using the 'adaptive' retry mode which dynamically limits the rate of requests
custom_config = Config(
    retries = {
        'max_attempts': 10,
        'mode': 'adaptive'
    },
    connect_timeout=5, # Increase connect timeout to 5 seconds
    read_timeout=60    # Increase read timeout to 60 seconds
)

ec2_client = boto3.client('ec2', config=custom_config)

# 2. Manual Exponential Backoff with Jitter (If not relying on SDK)
def call_aws_with_backoff(api_func, *args, **kwargs):
    max_retries = 5
    base_delay = 1 # seconds

    for attempt in range(max_retries):
        try:
            return api_func(*args, **kwargs)
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt == max_retries - 1:
                    raise # Re-raise if max retries reached
                
                # Calculate delay: (2 ^ attempt) * base_delay + random jitter
                delay = (2 ** attempt) * base_delay
                jitter = random.uniform(0, 0.5 * delay)
                sleep_time = delay + jitter
                
                print(f"Throttled. Retrying in {sleep_time:.2f} seconds...")
                time.sleep(sleep_time)
            else:
                raise # Re-raise non-throttling errors

# Example usage of manual backoff
# response = call_aws_with_backoff(ec2_client.describe_instances)
E

Error Medic Editorial

The Error Medic Editorial team consists of senior Site Reliability Engineers and Cloud Architects dedicated to documenting obscure infrastructure edge cases and providing actionable, production-ready solutions.

Sources

Related Articles in Aws Api

Explore More API Errors Guides