Error Medic

Fixing 'ProvisionedThroughputExceededException': DynamoDB Slow Query & Timeout Troubleshooting

Resolve DynamoDB slow query, timeout, and throttling errors (ProvisionedThroughputExceededException) with actionable scaling, indexing, and query optimization s

Last updated:
Last verified:
979 words
Key Takeaways
  • Root Cause 1: Insufficient provisioned read/write capacity units (RCU/WCU) leading to throttling.
  • Root Cause 2: Inefficient queries using 'Scan' instead of 'Query', or lacking proper Global Secondary Indexes (GSIs).
  • Root Cause 3: Hot partitions caused by poorly distributed partition keys.
  • Quick Fix Summary: Enable On-Demand capacity or Auto Scaling, convert Scans to Queries, and implement exponential backoff.
DynamoDB Performance Fix Approaches Compared
MethodWhen to UseTime to ImplementRisk / Cost Impact
Switch to On-Demand CapacityImmediate relief for unpredictable traffic spikes5 minsHigh cost for sustained high traffic
Add/Optimize GSIsQueries filter on non-key attributes frequentlyHours to DaysIncreases storage and WCU costs
Implement Exponential BackoffHandling transient network or throttling timeouts1-2 HoursLow risk, requires code deployment
Redesign Partition KeysPersistent hot partition issues and table locksWeeksHigh risk, requires data migration

Understanding DynamoDB Slow Queries and Timeouts

When working with Amazon DynamoDB, encountering a slow query or a sudden timeout often points to one of a few common architectural or configuration bottlenecks. Unlike traditional RDBMS systems, DynamoDB doesn't suffer from traditional "table locks" in the same way, but it does experience partition-level throttling and throughput limits that manifest as extreme latency or connection timeouts. The most common error developers see is ProvisionedThroughputExceededException, or simply elevated HTTP 5xx errors and SDK timeouts (SdkClientException: Unable to execute HTTP request: Read timed out).

Step 1: Diagnose the Bottleneck

Before changing configurations, you must identify whether the issue is throttling, inefficient querying, or network-level timeouts.

  1. Check CloudWatch Metrics: Look at ProvisionedReadCapacityUnits vs ConsumedReadCapacityUnits, and specifically monitor ReadThrottleEvents and WriteThrottleEvents. If throttles are high, your capacity is too low or you have a hot partition.
  2. Enable Contributor Insights: This DynamoDB feature helps identify the exact partition keys that are being accessed most frequently (the "hot keys").
  3. Review the Code: Are you using Scan or Query? A Scan operation reads the entire table before filtering, which is notoriously slow and expensive. A Query uses the partition key and is highly efficient.

Step 2: Immediate Mitigation (The Quick Fix)

If your production environment is currently failing with timeouts and throttling, the fastest mitigation is adjusting capacity.

  • Switch to On-Demand Capacity: If you are currently using Provisioned capacity without Auto Scaling, switch the table to On-Demand. This allows DynamoDB to instantly accommodate the traffic spike, though it comes at a higher per-request cost.
  • Increase Provisioned Capacity: If you prefer to stay on Provisioned, manually bump the Read Capacity Units (RCUs) and Write Capacity Units (WCUs) well above the current consumed metrics.

Step 3: Long-Term Fixes and Optimization

To prevent slow queries and timeouts permanently, you need to address the root causes at the application and schema level.

1. Replace Scans with Queries Never use Scan for real-time application access. If you need to retrieve items based on attributes that are not the primary key, create a Global Secondary Index (GSI). This allows you to perform a fast Query against the new index.

2. Implement Exponential Backoff When DynamoDB throttles a request, it expects the client to retry. Ensure your AWS SDK is configured with an appropriate retry policy. The default SDKs usually handle this, but if you have strict API Gateway timeouts (e.g., 29 seconds), the retries might cause the API to timeout before DynamoDB succeeds.

3. Resolve Hot Partitions DynamoDB partitions data based on the Partition Key. If a massive volume of reads/writes targets the same key simultaneously, that specific partition will hit its hard limit (3000 RCUs or 1000 WCUs per partition), even if the table overall has plenty of capacity. This mimics the symptoms of a "table lock". Fix this by appending a random suffix to the partition key (Write Sharding) or ensuring your keys have high cardinality (e.g., UserID instead of Status=Active).

Frequently Asked Questions

bash
# 1. Check CloudWatch for Throttling Events
aws cloudwatch get-metric-statistics \
    --namespace AWS/DynamoDB \
    --metric-name ReadThrottleEvents \
    --dimensions Name=TableName,Value=YourTableName \
    --start-time $(date -u -d '-1 hour' '+%Y-%m-%dT%H:%M:%SZ') \
    --end-time $(date -u '+%Y-%m-%dT%H:%M:%SZ') \
    --period 300 \
    --statistics Sum

# 2. Update Table to On-Demand Capacity (Immediate Fix)
aws dynamodb update-table \
    --table-name YourTableName \
    --billing-mode PAY_PER_REQUEST

# 3. Example of a bad SCAN vs good QUERY in AWS CLI
# BAD (Slow, consumes high RCU):
aws dynamodb scan \
    --table-name Users \
    --filter-expression "#st = :status" \
    --expression-attribute-names '{"#st": "Status"}' \
    --expression-attribute-values '{ ":status": {"S": "ACTIVE"} }'

# GOOD (Fast, uses GSI):
aws dynamodb query \
    --table-name Users \
    --index-name StatusIndex \
    --key-condition-expression "#st = :status" \
    --expression-attribute-names '{"#st": "Status"}' \
    --expression-attribute-values '{ ":status": {"S": "ACTIVE"} }'
E

Error Medic Editorial

Error Medic Editorial is a team of certified Cloud Architects and SREs dedicated to resolving the toughest infrastructure bottlenecks and database performance issues.

Sources

Related Articles in DynamoDB

Explore More Database Guides