Resolving Cassandra Read Timeout During Query: LOCAL_QUORUM, ONE, and SERIAL
Comprehensive guide to fixing Cassandra read timeouts (ReadTimeoutException) at LOCAL_QUORUM, ONE, and CAS SERIAL consistencies. Diagnose tombstones, GC pauses,
- Tombstone Overload: Scanning too many tombstones is the #1 cause of Cassandra read timeouts. Use nodetool tablestats to check.
- Garbage Collection (GC) Pauses: Stop-the-world JVM pauses exceeding read_request_timeout_in_ms will drop requests.
- Network or Disk I/O Bottlenecks: Slow disks or saturated NICs cause replica responses to miss the coordinator's timeout window.
- Unoptimized Queries: Large partition reads or ALLOW FILTERING without partition keys overwhelm the coordinator.
- Quick Fix: Check system.log for 'Scanned over X tombstones', run nodetool tpstats for dropped messages, and isolate the offending table.
| Method | When to Use | Time to Execute | Risk Level |
|---|---|---|---|
| Force Major Compaction | High tombstone count causing timeouts on specific tables | Hours to Days | High (Disk I/O intensive) |
| Tune GC / Heap | Long JVM pauses seen in gc.log or system.log | Minutes (Requires Restart) | Medium |
| Downgrade Consistency (e.g., QUORUM to ONE) | Emergency mitigation to restore availability | Immediate (App side) | High (Data staleness/Inconsistency) |
| Increase read_request_timeout_in_ms | Queries legitimately take longer due to payload size | Minutes (Rolling Restart) | Medium (Masks underlying issue) |
Understanding the Error
When working with Apache Cassandra, one of the most dreaded errors a developer or operator can encounter is the ReadTimeoutException. This error indicates that the coordinator node did not receive enough responses from replicas within the configured timeout window (default is typically 5000ms for reads).
Depending on your application's consistency requirements, the exact error message will vary. You might see:
Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)Cassandra timeout during read query at consistency LOCAL_ONECassandra timeout during cas write query at consistency SERIAL
The last one is particularly interesting because it occurs during Lightweight Transactions (LWTs). A Compare-And-Set (CAS) write utilizes the Paxos consensus protocol, and the read phase of this protocol must achieve SERIAL or LOCAL_SERIAL consistency. If the Paxos ballot fails to reach a quorum in time, this timeout is thrown.
Root Causes
Why do these timeouts happen? The distributed nature of Cassandra means the bottleneck could be anywhere from the disk of a single replica to the network fabric.
- Tombstone Overload: When you delete data in Cassandra, it isn't removed immediately. A marker called a 'tombstone' is written. During a read, Cassandra must scan and filter out these tombstones. If a query scans thousands of tombstones to find a few live rows, the CPU overhead and memory pressure will cause a timeout.
- Garbage Collection (GC) Pauses: Cassandra runs on the JVM. If the heap is poorly tuned, or if massive queries are creating high object churn, the JVM will trigger a 'Stop-the-World' garbage collection. If this pause exceeds
read_request_timeout_in_ms(default 5 seconds), the coordinator assumes the node is dead for that request. - Hardware Bottlenecks: High CPU utilization, saturated disk I/O (i.e., high iowait), or network packet loss will delay replica responses.
- Bad Query Patterns: Unbounded partition reads, cross-partition queries using
ALLOW FILTERING, or fetching massive payloads in a single request.
Step 1: Diagnose the Bottleneck
Before changing any configurations, you must identify why the reads are slow. Log into a Cassandra node and use the built-in diagnostic tools.
Check for Dropped Messages
Run nodetool tpstats. Look at the Dropped column for READ and MUTATION stages. If you see high numbers of dropped reads, the node is shedding load because it cannot process requests fast enough.
Analyze Latency Histograms
Run nodetool proxyhistograms. This provides a distribution of coordinator-level latencies. If the 99th percentile for reads is close to or above your timeout threshold (e.g., 5000000 microseconds), the cluster is generally degraded.
Identify the Offending Table
Run nodetool top partitions or nodetool tablestats. Look for tables with a high Maximum tombstones per slice or poor read latency.
Check the system.log (usually in /var/log/cassandra/system.log) for tombstone warnings:
WARN [ReadStage-2] 2023-10-27 10:00:00,000 ReadCommand.java:400 - Read 1000 live rows and 50000 tombstone cells for query SELECT * FROM keyspace.table WHERE...
Step 2: Implement the Fix
Fix A: Resolving Tombstone Issues
If your logs are full of tombstone warnings, you have a data model or maintenance problem.
- Run Compaction: If data has passed its
gc_grace_seconds, running compaction will permanently remove the tombstones. Usenodetool compact <keyspace> <table>. - Adjust Query Limits: Ensure your application isn't doing massive slice queries over highly deleted partitions.
- Change Compaction Strategy: If the table has heavy updates/deletes, switch from SizeTieredCompactionStrategy (STCS) to LeveledCompactionStrategy (LCS), which handles overwrites and deletes much more efficiently.
Fix B: Mitigating GC Pauses
Check the GC logs (gc.log). If you see pauses lasting 3-10 seconds, you need JVM tuning.
- Enable G1GC: If you are still using CMS on an older Cassandra version, switch to G1GC in
jvm.options. - Increase Heap: Ensure
MAX_HEAP_SIZEis adequately set (typically 8GB to 31GB, never more than 32GB due to compressed oops). - Reduce Churn: Stop querying large batches of data at once. Implement pagination using paging state.
Fix C: Addressing CAS / SERIAL Timeouts
Cassandra timeout during cas write query at consistency SERIAL means your LWTs are failing.
- LWTs require 4 round trips. They are extremely sensitive to latency.
- Ensure network latency between nodes (especially across data centers) is minimal.
- Check for high contention. If multiple threads are constantly trying to CAS the exact same partition key simultaneously, Paxos ballots will continuously fail and retry until the timeout is hit. Redesign the application logic to reduce contention on single partitions.
Fix D: The Band-Aid (cassandra.yaml)
If you have legitimately heavy queries (e.g., analytical workloads) and you cannot optimize them further, you can increase the timeout limits in cassandra.yaml. Warning: This masks the symptom; it does not cure the disease.
Edit /etc/cassandra/cassandra.yaml:
read_request_timeout_in_ms: 10000
range_request_timeout_in_ms: 20000
cas_contention_timeout_in_ms: 3000
Requires a rolling restart of the cluster to take effect.
Frequently Asked Questions
#!/bin/bash
# Cassandra Diagnostic Script for Read Timeouts
# 1. Check for dropped read messages in the last 5 minutes
echo "=== Dropped Messages ==="
nodetool tpstats | grep -E 'Message type|READ|MUTATION|RangeSlice'
# 2. Check coordinator level latencies (look at 99% line)
echo -e "\n=== Proxy Histograms ==="
nodetool proxyhistograms
# 3. Search system.log for tombstone warnings today
echo -e "\n=== Tombstone Warnings ==="
grep "tombstone" /var/log/cassandra/system.log | tail -n 10
# 4. Check for long GC pauses
echo -e "\n=== Long GC Pauses (> 1000ms) ==="
grep "stopped:" /var/log/cassandra/gc.log | awk '{print $NF}' | sed 's/seconds//g' | awk '$1 > 1.0 {print "Pause of " $1 " seconds found"}'
# 5. Identify tables with the most tombstones
echo -e "\n=== Top Tables by Tombstones ==="
nodetool tablestats | grep -E 'Table:|Maximum tombstones per slice' | awk 'NR%2{printf "%s ", $0; next;}1' | sort -k8 -n -r | head -n 5Error Medic Editorial
The Error Medic Editorial team consists of senior SREs and Database Administrators specializing in distributed systems, NoSQL databases, and high-availability infrastructure.