Error Medic

How to Fix DynamoDB ThrottlingException (Read & Write Throttled Requests)

Diagnose and resolve AWS DynamoDB ThrottlingException. Learn how to fix read/write throttled requests, optimize partitions, and handle on-demand throttling.

Last updated:
Last verified:
1,162 words
Key Takeaways
  • Hot partitions caused by uneven access patterns are the most common cause of DynamoDB throttling.
  • Exceeding Provisioned Throughput (RCU/WCU) or On-Demand peak limits triggers a ThrottlingException.
  • Quick Fix: Implement exponential backoff and jitter in your application's retry logic.
  • Long-term Fix: Redesign your partition keys to evenly distribute read and write workloads.
DynamoDB Throttling Fix Approaches Compared
MethodWhen to UseTime to ImplementCost Impact
Switch to On-DemandUnpredictable workloads with sudden spikesMinutesPotentially Higher
Increase Auto-Scaling MaxTraffic is growing steadily but hitting capsMinutesModerate Increase
Implement Exponential BackoffImmediate mitigation for SDK ThrottlingExceptionsHoursNone
Redesign Partition KeysPersistent hot partitions despite high capacityDays/WeeksLower (Efficiency)

Understanding DynamoDB Throttling

When you interact with AWS DynamoDB, you might occasionally encounter a ThrottlingException or see metrics indicating ReadThrottleEvents and WriteThrottleEvents. This occurs when your application's request rate exceeds the capacity allocated to your DynamoDB table or secondary indexes (GSIs/LSIs).

The exact error message typically looks like this:

botocore.errorfactory.ProvisionedThroughputExceededException: An error occurred (ProvisionedThroughputExceededException) when calling the PutItem operation: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.

Or in newer AWS SDKs, simply: ThrottlingException: Rate of requests exceeds the allowed throughput.

Why Does Throttling Happen?

DynamoDB distributes your data across multiple physical partitions based on the partition key. Throttling generally falls into two categories:

  1. Table-Level Throttling: You have simply exceeded the total Read Capacity Units (RCUs) or Write Capacity Units (WCUs) provisioned for the table or index. If you are using On-Demand mode, you might have exceeded the table's peak throughput limit (which defaults to double your previous peak, or 40K RRU/WRU initially).
  2. Partition-Level Throttling (Hot Partitions): Even if your table has plenty of overall capacity, if a disproportionate amount of traffic targets a single partition key (e.g., all users writing to a status = 'active' key simultaneously), that specific physical partition will throttle. Each partition has a hard limit of 3,000 RCUs or 1,000 WCUs per second.

Step 1: Diagnose the Root Cause

Before changing capacity modes or rewriting code, you must determine what is being throttled and why.

1. Check CloudWatch Metrics: Navigate to the DynamoDB console and view the monitoring tab for your table. Look at:

  • ReadThrottleEvents and WriteThrottleEvents: Confirm if reads or writes (or both) are throttling.
  • ConsumedReadCapacityUnits vs ProvisionedReadCapacityUnits: If consumed is hitting the provisioned line, you have table-level throttling.

2. Enable CloudWatch Contributor Insights: This is the most critical diagnostic tool for hot partitions. By enabling Contributor Insights for DynamoDB, you can identify the exact partition keys that are being accessed most frequently.

If Contributor Insights shows a single key dominating the access pattern, you have a hot key issue. If access is evenly distributed but you are still throttling, you need more capacity.

Step 2: Immediate Mitigations

If production is suffering, you need immediate relief.

Tactic A: Switch to On-Demand Capacity If you are currently on Provisioned Capacity and traffic is spiking unpredictably, switching to On-Demand can instantly relieve table-level throttling. Note that On-Demand still has partition limits (3000 RCU / 1000 WCU per partition).

Tactic B: Implement Exponential Backoff and Jitter AWS SDKs have built-in retries, but if you are doing aggressive batch operations (BatchWriteItem, BatchGetItem), you must handle unprocessed items manually. Always use exponential backoff with jitter to avoid the thundering herd problem.

Step 3: Long-Term Fixes and Architecture Changes

Fixing Hot Partitions: If your issue is a hot partition, throwing more money (capacity) at the problem will not work because physical partition limits are hard limits. You must redesign your access pattern.

  • Add Random Suffixes (Write Sharding): If you are writing logs to a single date key (e.g., 2023-10-25), append a random number between 1 and 100 (2023-10-25.42). This distributes writes across 100 partitions. Note: This makes reading the data harder, as you must query all 100 suffixes.
  • Use Calculated Suffixes: Instead of random numbers, use a known attribute like UserId % 10 as the suffix.
  • Caching with DAX: If you have a read-heavy hot key (e.g., a global configuration item requested by every client), place DynamoDB Accelerator (DAX) or ElastiCache/Redis in front of DynamoDB to absorb the read traffic.

Fixing Global Secondary Index (GSI) Throttling: A heavily throttled GSI will eventually cause the base table to throttle writes as well (backpressure). Always ensure your GSIs have equal or greater WCU capacity than the base table, or use On-Demand for both.

Throttling on On-Demand Tables

A common misconception is that On-Demand tables cannot be throttled. They can. dynamodb throttling on demand usually happens when:

  1. Traffic spikes to more than double the previous historical peak within 30 minutes.
  2. You hit the hard partition limits (3000 Read Units / 1000 Write Units per partition). If you expect a massive 10x spike (like a Superbowl ad), you should temporarily switch to Provisioned, provision the expected peak, and then switch back to On-Demand to pre-warm the partitions.

Frequently Asked Questions

python
import boto3
import time
import random
from botocore.exceptions import ClientError

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('YourTableName')

# Example of handling UnprocessedItems in BatchWriteItem with exponential backoff
def batch_write_with_backoff(items):
    max_retries = 5
    retries = 0
    
    # Prepare the initial request format
    request_items = [{'PutRequest': {'Item': item}} for item in items]
    unprocessed = {'YourTableName': request_items}
    
    while unprocessed and retries < max_retries:
        try:
            response = dynamodb.meta.client.batch_write_item(RequestItems=unprocessed)
            unprocessed = response.get('UnprocessedItems', {})
            
            if unprocessed:
                # Calculate exponential backoff with jitter
                sleep_time = (2 ** retries) * 0.1 + random.uniform(0, 0.1)
                print(f"Throttled. Retrying in {sleep_time:.2f} seconds...")
                time.sleep(sleep_time)
                retries += 1
                
        except ClientError as e:
            if e.response['Error']['Code'] == 'ProvisionedThroughputExceededException':
                sleep_time = (2 ** retries) * 0.1 + random.uniform(0, 0.1)
                time.sleep(sleep_time)
                retries += 1
            else:
                raise e
                
    if unprocessed:
        print("Failed to process all items after max retries.")
E

Error Medic Editorial

Our SRE team specializes in AWS database performance, distributed systems, and resolving complex infrastructure bottlenecks at scale.

Sources

Related Articles in DynamoDB

Explore More Database Guides