Why does my DynamoDB query suddenly slow down even though the table is under provisioned capacity?

This is almost always a hot partition problem rather than a global capacity problem. DynamoDB divides your provisioned capacity evenly across internal storage partitions, and each partition can serve at most 3,000 RCUs and 1,000 WCUs per second regardless of how much total capacity you have provisioned. If 80% of your traffic hits one partition key value (e.g., a shared tenant or a monotonically increasing timestamp), that single partition becomes a bottleneck. Enable Contributor Insights in the DynamoDB console and look for keys that dominate the TopKeys CloudWatch Logs stream. The fix is write sharding, choosing a higher-cardinality partition key, or adding a random suffix to the key and fanning out reads.

Does DynamoDB have table locks? My application behaves as if all writes are blocked.

DynamoDB does not have traditional row-level or table-level locks like PostgreSQL or MySQL. What developers call a 'table lock' is almost always one of three things: (1) ProvisionedThroughputExceededException causing all writes to be throttled and queued by the SDK until timeout, (2) a TransactionConflictException cascade where multiple concurrent TransactWriteItems calls conflict on the same item, causing retries that appear to block writers, or (3) a table in UPDATING or CREATING status where writes are temporarily rejected. Check the table status with 'aws dynamodb describe-table --table-name NAME --query Table.TableStatus' and check CloudWatch ThrottledRequests to distinguish these cases.

My DynamoDB SDK is throwing ReadTimeout after 5 seconds. How do I fix it without just increasing the timeout?

Increasing the timeout is treating the symptom. The ReadTimeout means the DynamoDB service took longer than your configured limit to respond — this usually happens because the request was throttled and queued internally, or the item being fetched is very large. First, confirm whether ThrottledRequests is non-zero in CloudWatch. If yes, follow the capacity remediation steps above. If ThrottledRequests is zero, check the item size: items larger than 400 KB (DynamoDB's max) force multiple internal reads and increase latency. Also verify your SDK retry mode is set to 'standard' (not 'legacy') which uses proper exponential backoff with jitter, and set connect_timeout=2, read_timeout=10 as a starting baseline.

Should I use on-demand or provisioned capacity to fix DynamoDB slowness?

On-demand eliminates throttling from capacity exhaustion instantly and requires zero capacity planning, but it costs roughly 6–7x more per RCU/WCU than provisioned at sustained load. Use on-demand when traffic is spiky or unpredictable. Provisioned with Auto Scaling is the right long-term choice for predictable workloads: set your target utilization to 70%, configure min/max bounds, and pair it with a CloudWatch alarm on ThrottledRequests to catch the 5–10 minute Auto Scaling lag during sudden spikes. If you have hot partitions, neither mode fully solves the problem — per-partition limits apply regardless of billing mode.

My DynamoDB Scan was fast last week and is slow this week. What changed?

Table size grew. A Scan reads every item sequentially, consuming 0.5 RCUs per 4 KB of data read. As your table grows from 1 GB to 10 GB, Scan duration and RCU cost grows proportionally. At 5 provisioned RCUs a 10 GB Scan takes approximately 2,500 seconds. The only real fix is eliminating the Scan: identify the FilterExpression attributes you are filtering on, create a Global Secondary Index (GSI) on those attributes, and rewrite the Scan as a Query against the GSI. As a temporary workaround you can add the 'Limit' and 'ExclusiveStartKey' parameters to paginate the Scan in small batches with sleep intervals between pages, reducing the RCU burst but extending total runtime.

DynamoDB Slow Query, Timeout & Table Lock: Complete Troubleshooting Guide

Fix DynamoDB slow queries, ProvisionedThroughputExceededException, and timeout errors. Step-by-step diagnosis with AWS CLI commands and proven solutions.

Last updated: February 23, 2026

Last verified: February 23, 2026

2,397 words

Key Takeaways

Hot partitions are the #1 root cause: a poorly chosen partition key concentrates traffic on a single shard, exhausting its 1,000 WCU/3,000 RCU slice and triggering ProvisionedThroughputExceededException.
DynamoDB has no traditional table locks, but TransactionConflictException and throttling produce identical symptoms — requests queue, latency spikes, and SDKs surface timeout errors after exhausting retries.
Scan operations are catastrophic at scale: a full-table Scan on a 100 GB table consumes all provisioned RCUs for seconds to minutes; replace Scans with Query + GSI and your p99 latency drops by 10–100x.
Quick fix sequence: (1) switch to on-demand capacity for instant headroom, (2) enable exponential backoff with jitter in your SDK, (3) add a high-cardinality suffix to your partition key, (4) evaluate DynamoDB Accelerator (DAX) for read-heavy workloads.

Fix Approaches Compared
Method	When to Use	Time to Apply	Risk
Switch to On-Demand Capacity	Unpredictable traffic spikes; provisioned capacity consistently exhausted	< 5 minutes (instant via console/CLI)	Low — pay-per-request pricing may increase cost
Increase Provisioned RCU/WCU	Predictable load with known baseline; want cost ceiling	< 2 minutes; takes effect immediately	Low — requires forecast accuracy to avoid over/under-provisioning
Redesign Partition Key (write sharding)	Chronic hot-partition errors on a specific key value (e.g., user_id='admin')	Hours to days — requires data migration	Medium — schema change; rollback is complex
Add a GSI on query attributes	Slow queries due to full-table Scans; filtering on non-key attributes	Minutes to create; backfill takes hours for large tables	Low — non-destructive; GSI has its own capacity
Deploy DynamoDB Accelerator (DAX)	Read-heavy workloads (>80% reads); p99 latency must be < 1 ms	30–60 minutes (cluster provisioning)	Medium — adds infrastructure cost and DAX-specific SDK requirement
Tune SDK retry/backoff settings	ProvisionedThroughputExceededException on burst traffic; SDK swallows retries silently	Minutes — code change only	Low — increased retry count raises end-to-end latency on failures
Enable DynamoDB Auto Scaling	Recurring throttling at predictable but variable peaks (e.g., daily batch jobs)	Minutes to configure; scaling lag is 5–10 minutes	Low — scaling lag means brief throttling during sudden spikes is still possible

Understanding DynamoDB Slow Queries, Timeouts, and "Table Lock" Behavior

DynamoDB is engineered for single-digit millisecond latency at any scale, but production teams routinely file incidents titled "DynamoDB table lock" or "DynamoDB hung" when they actually mean: requests are timing out, retries are exhausting, and the application has stalled. Understanding why separates a 5-minute fix from a multi-hour war room.

DynamoDB distributes data across internal partitions. Each partition handles a ceiling of 3,000 Read Capacity Units (RCUs) and 1,000 Write Capacity Units (WCUs) per second. When a single logical partition key attracts more traffic than its partition can serve, DynamoDB returns ProvisionedThroughputExceededException. AWS SDKs retry with exponential backoff by default — but if the condition persists, the SDK raises a final timeout exception that your application surfaces as a "slow query" or "connection timed out" error.

Symptoms and Their Exact Error Messages

Developers encounter several distinct error signatures depending on SDK and language:

Python (boto3):

botocore.exceptions.ClientError: An error occurred (ProvisionedThroughputExceededException)
  when calling the GetItem operation: The level of configured provisioned throughput for
  the table was exceeded. Consider increasing your provisioning level with the
  UpdateTable API.

Java (AWS SDK v2):

software.amazon.awssdk.services.dynamodb.model.ProvisionedThroughputExceededException:
  The level of configured provisioned throughput for the table was exceeded.

Node.js:

ProvisionedThroughputExceededException: The level of configured provisioned throughput
  for the table was exceeded.
    at Request.extractError (/node_modules/aws-sdk/lib/protocol/json.js:51:27)

Transaction conflict (mimics a lock):

TransactionCanceledException: Transaction cancelled, please refer cancellation reasons
  for specific reasons [TransactionConflict]

General timeout (SDK gave up retrying):

RequestTimeout: Request took too long to complete. (Node.js)
ReadTimeout: Read timeout on endpoint URL (Python)
connect ETIMEDOUT (Node.js low-level)

Step 1: Diagnose — Isolate the Root Cause in CloudWatch

Before touching any configuration, pull the CloudWatch metrics that tell the truth. There are four key metric namespaces to examine:

Metric	Namespace	What it reveals
`ThrottledRequests`	`AWS/DynamoDB`	Non-zero value confirms throttling is happening
`SuccessfulRequestLatency`	`AWS/DynamoDB`	p99 > 10 ms on GetItem = partition hotspot
`ConsumedReadCapacityUnits`	`AWS/DynamoDB`	Compare to provisioned; near 100% = exhausted
`SystemErrors`	`AWS/DynamoDB`	Non-zero = DynamoDB internal errors (contact AWS support)
`ReturnedItemCount`	`AWS/DynamoDB`	High count on Query/Scan = overfetching

Use AWS CLI to pull throttle history for the last hour:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ThrottledRequests \
  --dimensions Name=TableName,Value=YOUR_TABLE_NAME \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Sum

If ThrottledRequests Sum > 0 in any period, you have confirmed throttling. Now identify whether it's reads or writes:

# Check consumed vs provisioned reads
aws dynamodb describe-table --table-name YOUR_TABLE_NAME \
  --query 'Table.{RCU:ProvisionedThroughput.ReadCapacityUnits,
            WCU:ProvisionedThroughput.WriteCapacityUnits,
            TableStatus:TableStatus,
            ItemCount:ItemCount,
            SizeBytes:TableSizeBytes}'

To detect hot partitions specifically, enable DynamoDB Contributor Insights:

aws dynamodb update-contributor-insights \
  --table-name YOUR_TABLE_NAME \
  --contributor-insights-action ENABLE

Once enabled (takes ~15 minutes to populate), navigate to DynamoDB Console → Table → Contributor Insights to see the top-N partition key values consuming the most capacity. If one key accounts for >30% of traffic, you have a hot partition.

Step 2: Fix — Apply the Right Remedy

Fix A: Immediate Relief — Switch to On-Demand or Increase Provisioned Capacity

For immediate production relief without code changes:

# Option 1: Switch to on-demand (instant, no capacity planning required)
aws dynamodb update-table \
  --table-name YOUR_TABLE_NAME \
  --billing-mode PAY_PER_REQUEST

# Option 2: Double provisioned capacity (adjust numbers to your baseline)
aws dynamodb update-table \
  --table-name YOUR_TABLE_NAME \
  --provisioned-throughput ReadCapacityUnits=500,WriteCapacityUnits=200

Note: You can switch between billing modes once per 24 hours. On-demand eliminates throttling caused by capacity exhaustion but does not fix hot partitions — a single key still hits per-partition limits.

Fix B: Eliminate Full-Table Scans

A Scan reads every item in your table. On a 10 GB table with 5 RCU provisioned, a single Scan takes ~20 minutes and blocks other reads via throttling. Replace with Query plus a Global Secondary Index:

# WRONG — full table scan
response = table.scan(
    FilterExpression=Attr('status').eq('pending')
)

# RIGHT — query via GSI on 'status' attribute
response = table.query(
    IndexName='status-index',
    KeyConditionExpression=Key('status').eq('pending')
)

Create the GSI if it does not exist:

aws dynamodb update-table \
  --table-name YOUR_TABLE_NAME \
  --attribute-definitions AttributeName=status,AttributeType=S \
  --global-secondary-index-updates \
    '[{"Create":{"IndexName":"status-index",
       "KeySchema":[{"AttributeName":"status","KeyType":"HASH"}],
       "Projection":{"ProjectionType":"ALL"},
       "ProvisionedThroughput":{"ReadCapacityUnits":100,"WriteCapacityUnits":50}}}]'

Fix C: Write Sharding for Hot Partition Keys

If your partition key is inherently low-cardinality (e.g., tenant_id with one dominant tenant), distribute writes across virtual shards:

import random

def write_with_sharding(table, item, shard_count=10):
    shard_suffix = random.randint(0, shard_count - 1)
    item['pk'] = f"{item['original_pk']}#{shard_suffix}"
    table.put_item(Item=item)

def read_all_shards(table, original_pk, shard_count=10):
    """Fan-out reads across all shards and merge results."""
    results = []
    for shard in range(shard_count):
        sharded_key = f"{original_pk}#{shard}"
        resp = table.query(
            KeyConditionExpression=Key('pk').eq(sharded_key)
        )
        results.extend(resp['Items'])
    return results

Fix D: Configure SDK Retry Behavior

The default AWS SDK retry count (3 attempts) may be too low for burst scenarios. Tune with exponential backoff and jitter:

import boto3
from botocore.config import Config

# Increase max retries and use standard (exponential+jitter) retry mode
config = Config(
    retries={
        'max_attempts': 10,
        'mode': 'standard'  # enables exponential backoff with jitter
    },
    connect_timeout=5,
    read_timeout=10
)

dynamodb = boto3.resource('dynamodb', config=config)

Fix E: Add DAX for Read-Heavy Workloads

For workloads where >70% of operations are reads and p99 latency < 1 ms is required, DynamoDB Accelerator (DAX) is a fully managed write-through cache:

# Create a DAX cluster (requires VPC configuration)
aws dax create-cluster \
  --cluster-name my-dax-cluster \
  --node-type dax.r5.large \
  --replication-factor 2 \
  --iam-role-arn arn:aws:iam::ACCOUNT_ID:role/DAXRole \
  --subnet-group my-dax-subnet-group

Switch your application to the DAX client (Python example):

import amazondax
import boto3

dax = amazondax.AmazonDaxClient.resource(
    endpoints=['my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111']
)
table = dax.Table('YOUR_TABLE_NAME')
# All GetItem and Query calls now hit DAX first

Step 3: Prevent Recurrence — Enable Auto Scaling and Alarms

# Register the table as a scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace dynamodb \
  --resource-id table/YOUR_TABLE_NAME \
  --scalable-dimension dynamodb:table:ReadCapacityUnits \
  --min-capacity 5 \
  --max-capacity 1000

# Set a scaling policy targeting 70% utilization
aws application-autoscaling put-scaling-policy \
  --service-namespace dynamodb \
  --resource-id table/YOUR_TABLE_NAME \
  --scalable-dimension dynamodb:table:ReadCapacityUnits \
  --policy-name ReadAutoScaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration \
    '{"TargetValue":70.0,
      "PredefinedMetricSpecification":{"PredefinedMetricType":"DynamoDBReadCapacityUtilization"}}'

# Alert on throttling
aws cloudwatch put-metric-alarm \
  --alarm-name DynamoDB-ThrottledRequests \
  --metric-name ThrottledRequests \
  --namespace AWS/DynamoDB \
  --dimensions Name=TableName,Value=YOUR_TABLE_NAME \
  --statistic Sum \
  --period 60 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:ACCOUNT_ID:ops-alerts

Frequently Asked Questions

bash

#!/usr/bin/env bash
# DynamoDB Slow Query / Timeout Diagnostic Script
# Usage: TABLE_NAME=my-table AWS_REGION=us-east-1 bash dynamodb_diagnose.sh

set -euo pipefail
TABLE=${TABLE_NAME:?"Set TABLE_NAME"}
REGION=${AWS_REGION:-us-east-1}
END=$(date -u +%Y-%m-%dT%H:%M:%SZ)
START=$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-1H +%Y-%m-%dT%H:%M:%SZ)

echo "=== Table Status ==="
aws dynamodb describe-table --region "$REGION" --table-name "$TABLE" \
  --query 'Table.{Status:TableStatus,Items:ItemCount,SizeGB:TableSizeBytes,
           RCU:ProvisionedThroughput.ReadCapacityUnits,
           WCU:ProvisionedThroughput.WriteCapacityUnits,
           BillingMode:BillingModeSummary.BillingMode}'

echo ""
echo "=== GSIs and their capacity ==="
aws dynamodb describe-table --region "$REGION" --table-name "$TABLE" \
  --query 'Table.GlobalSecondaryIndexes[*].{Name:IndexName,
           Status:IndexStatus,
           RCU:ProvisionedThroughput.ReadCapacityUnits,
           WCU:ProvisionedThroughput.WriteCapacityUnits,
           Items:ItemCount}'

echo ""
echo "=== ThrottledRequests (last 60 min, 1-min resolution) ==="
aws cloudwatch get-metric-statistics \
  --region "$REGION" \
  --namespace AWS/DynamoDB \
  --metric-name ThrottledRequests \
  --dimensions Name=TableName,Value="$TABLE" \
  --start-time "$START" \
  --end-time "$END" \
  --period 60 \
  --statistics Sum \
  --query 'sort_by(Datapoints, &Timestamp)[*].{Time:Timestamp,Throttles:Sum}'

echo ""
echo "=== SuccessfulRequestLatency p99 (last 60 min) ==="
for OP in GetItem PutItem Query Scan; do
  echo "--- $OP ---"
  aws cloudwatch get-metric-statistics \
    --region "$REGION" \
    --namespace AWS/DynamoDB \
    --metric-name SuccessfulRequestLatency \
    --dimensions Name=TableName,Value="$TABLE" Name=Operation,Value="$OP" \
    --start-time "$START" \
    --end-time "$END" \
    --period 3600 \
    --statistics p99 Maximum Average \
    --query 'Datapoints[0].{p99:p99,Max:Maximum,Avg:Average}'
done

echo ""
echo "=== ConsumedReadCapacityUnits vs Provisioned ==="
aws cloudwatch get-metric-statistics \
  --region "$REGION" \
  --namespace AWS/DynamoDB \
  --metric-name ConsumedReadCapacityUnits \
  --dimensions Name=TableName,Value="$TABLE" \
  --start-time "$START" \
  --end-time "$END" \
  --period 60 \
  --statistics Sum \
  --query 'max_by(Datapoints, &Sum).{MaxConsumedPerMin:Sum,Time:Timestamp}'

echo ""
echo "=== Contributor Insights status ==="
aws dynamodb describe-contributor-insights \
  --region "$REGION" \
  --table-name "$TABLE" \
  --query '{Status:ContributorInsightsStatus}' 2>/dev/null || \
  echo "Contributor Insights not enabled. Enable with:"
  echo "aws dynamodb update-contributor-insights --table-name $TABLE --contributor-insights-action ENABLE"

echo ""
echo "=== Recommend: Check Scan operations in last hour ==="
aws cloudwatch get-metric-statistics \
  --region "$REGION" \
  --namespace AWS/DynamoDB \
  --metric-name ReturnedItemCount \
  --dimensions Name=TableName,Value="$TABLE" Name=Operation,Value=Scan \
  --start-time "$START" \
  --end-time "$END" \
  --period 3600 \
  --statistics Sum \
  --query 'Datapoints[0].Sum' 2>/dev/null && \
  echo "Non-zero = full table scans are occurring. Replace with Query+GSI."

echo ""
echo "=== Auto Scaling status ==="
aws application-autoscaling describe-scalable-targets \
  --region "$REGION" \
  --service-namespace dynamodb \
  --query "ScalableTargets[?ResourceId=='table/$TABLE'].{Dimension:ScalableDimension,Min:MinCapacity,Max:MaxCapacity}" \
  2>/dev/null || echo "Auto Scaling not configured for this table."

echo ""
echo "Diagnosis complete. Review ThrottledRequests and p99 latency above."

Error Medic Editorial

Error Medic Editorial is a team of senior SREs and cloud engineers with collective experience managing multi-petabyte DynamoDB deployments at Fortune 500 companies and high-growth startups. Our guides are tested against real production incidents and peer-reviewed for technical accuracy before publication.

Sources

Explore More Database Guides

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

ERROR: deadlock detected - Resolving PostgreSQL Deadlocks & Connection Exhaustion

Fix PostgreSQL deadlocks (ERROR: 40P01) and connection pool exhaustion. Learn to trace lock contention, enforce consistent lock ordering, and optimize transacti

Fixing 'Connection Refused' and Timeout Errors in InfluxDB: A Complete Guide

Diagnose and resolve InfluxDB connection refused, out of memory (OOM), and slow query timeouts. Learn to tune influxdb.conf and optimize cardinality.