Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide
Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with
- Root cause #1: JVM heap exhaustion triggers a silent OS OOM kill — Cassandra writes no log entry. Check `dmesg | grep -i oom` and raise -Xmx in jvm.options to min(RAM/4, 8GB).
- Root cause #2: Native transport misconfiguration — listen_address set to 0.0.0.0 (invalid) or start_native_transport: false prevents port 9042 from opening even when the process is running.
- Root cause #3: Firewall rules blocking port 9042, or a full disk causing Cassandra to halt new connections to prevent data loss.
- Root cause #4: Commitlog or SSTable corruption prevents node startup — identified by FSReadError or CorruptSSTableException in /var/log/cassandra/system.log.
- Quick fix sequence: run `systemctl status cassandra`, `ss -tlnp | grep 9042`, `dmesg | grep oom`, then inspect system.log for ERROR or FATAL lines before attempting any restart.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Restart Cassandra service | Service stopped, no corruption errors in log | 2-5 min | Low |
| Increase JVM heap (-Xmx) | OOM kills in dmesg or GC pauses > 5s in system.log | 10 min + restart | Low |
| Fix listen_address / rpc_address | Process running but port 9042 not listening | 5 min + restart | Low |
| Open firewall for port 9042 | nc -zv from client fails, no ACCEPT rule in iptables | 2 min | Low |
| Remove corrupted commitlog | FSReadError on startup, node refuses to start | 15 min | Medium — recent writes may be lost |
| nodetool scrub --skip-corrupted | CorruptSSTableException during reads, node online | 30-120 min | Medium — corrupt rows dropped |
| nodetool repair -pr | Data inconsistency after node recovery | Hours | High — heavy I/O, low-traffic window |
| Restore from snapshot | Severe unrecoverable data corruption | Hours | High — requires recent valid backup |
Understanding Cassandra "Connection Refused" Errors
When a client receives Connection refused on port 9042, the TCP handshake itself is failing — Cassandra is not running, not bound to the expected address, or a firewall is dropping the SYN packet. This differs from authentication errors (TCP completes before rejecting) and read timeouts (connection succeeds but query is slow).
Common error messages:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
(tried: /10.0.0.1:9042 (TransportException: [/10.0.0.1:9042] Cannot connect))
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
{'10.0.0.1': ConnectionRefusedError(111, 'Connection refused')})
Step 1: Verify Cassandra Service State
# Check service state
systemctl status cassandra --no-pager -l
# Confirm JVM process exists
ps aux | grep CassandraDaemon | grep -v grep
# Verify port 9042 is bound
ss -tlnp | grep 9042
If ss -tlnp | grep 9042 returns nothing, Cassandra is not listening. Read the last startup attempt before restarting:
tail -100 /var/log/cassandra/system.log | grep -iE "error|exception|fatal|killed"
journalctl -u cassandra -n 50 --no-pager
Step 2: Diagnose Out-of-Memory (OOM) Kills
OOM kills are the most frequent silent cause of Cassandra failures. The Linux kernel terminates the JVM without writing to Cassandra's logs.
# Kernel OOM evidence
dmesg | grep -iE "oom|killed process" | grep -i java
# GC pressure warnings
grep -E "GCInspector.*[0-9]{4,}ms" /var/log/cassandra/system.log | tail -20
A GC pause warning preceding self-halt:
WARN [GCInspector] GCInspector.java:286 - G1 Young Generation GC in 11432ms.
G1 Eden Space: 8388608 -> 0; G1 Old Gen: 7516192768 -> 7814037504;
Pauses over 10 seconds trigger Cassandra's self-halt. Fix in /etc/cassandra/jvm.options (Java 8) or /etc/cassandra/jvm11-server.options (Java 11+):
# Rule: min(total_RAM / 4, 8GB). Never exceed 31GB.
# For a 32GB RAM server:
-Xms8G
-Xmx8G
Set -Xms equal to -Xmx to prevent resize pauses. Verify OS file descriptor limits:
ulimit -n # Minimum 100000 for production Cassandra
Step 3: Check Native Transport Configuration
If the process is running but port 9042 is not listening:
grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
/etc/cassandra/cassandra.yaml
Correct production settings:
listen_address: 10.0.0.1 # Specific IP — never 0.0.0.0
rpc_address: 0.0.0.0 # Accept client connections on all interfaces
broadcast_rpc_address: 10.0.0.1 # IP advertised to clients
native_transport_port: 9042
start_native_transport: true
Critical mistake: listen_address: 0.0.0.0 is invalid and causes startup failure. Always use a specific IP for listen_address and pair rpc_address: 0.0.0.0 with broadcast_rpc_address.
Step 4: Firewall and Network Verification
# Test TCP reachability from the application server
nc -zv <cassandra_node_ip> 9042
# iptables (RHEL, Amazon Linux, older Ubuntu)
iptables -L INPUT -n -v --line-numbers | grep -E "9042|DROP|REJECT"
# nftables (Debian 12+, Ubuntu 22.04+)
nft list ruleset | grep -B2 -A2 9042
Required open ports:
| Port | Purpose |
|---|---|
| 7000 | Inter-node gossip (plain) |
| 7001 | Inter-node gossip (TLS) |
| 7199 | JMX monitoring |
| 9042 | Native CQL (clients) |
Step 5: Slow Queries and Timeout Troubleshooting
If connections succeed but you see OperationTimedOutException or WriteTimeoutException, the problem is throughput, not connectivity.
# Check thread pool saturation — non-zero Dropped is critical
nodetool tpstats
# Examine latency percentiles
nodetool proxyhistograms
# Check compaction backlog
nodetool compactionstats
# Identify hot tables
nodetool tablestats <keyspace>.<table> | grep -E "Maximum|Mean|Local read|Local write"
Non-zero Dropped values in ReadStage, MutationStage, or ViewMutationStage mean Cassandra is shedding work. Enable slow query logging in cassandra.yaml:
slow_query_log_timeout_in_ms: 500
Then review:
grep "slow query" /var/log/cassandra/system.log | tail -20
Step 6: Data Corruption Diagnosis and Recovery
Commitlog corruption prevents startup with:
ERROR [main] CassandraDaemon.java:689 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.io.IOException:
Corrupt commitlog /var/lib/cassandra/commitlog/CommitLog-7-1639000000000.log
Commitlog recovery (risks loss of last few seconds of uncommitted writes):
systemctl stop cassandra
mv /var/lib/cassandra/commitlog /var/lib/cassandra/commitlog.bak.$(date +%s)
mkdir -p /var/lib/cassandra/commitlog
chown cassandra:cassandra /var/lib/cassandra/commitlog
systemctl start cassandra
For SSTable corruption discovered at runtime:
# Online scrub — drops corrupt rows, preserves valid data
nodetool scrub --skip-corrupted <keyspace> <table>
# Offline scrub for severe corruption
systemctl stop cassandra
sstablescrub --skip-corrupted <keyspace> <table>
systemctl start cassandra
# Restore replica consistency
nodetool repair -pr <keyspace>
Step 7: Validate Cluster Health
# All nodes should show UN (Up/Normal)
nodetool status
# Verify gossip propagation
nodetool gossipinfo | grep -E "STATUS|ENDPOINT_IP" | head -20
Healthy nodetool status output:
Datacenter: datacenter1
=======================
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.1 45.6 GiB 256 33.3% abc123-...-def456 rack1
UN 10.0.0.2 46.1 GiB 256 33.3% ghi789-...-jkl012 rack1
UN 10.0.0.3 44.9 GiB 256 33.4% mno345-...-pqr678 rack1
Any node showing DN (Down/Normal) is unreachable. Investigate with journalctl -u cassandra -n 200 --no-pager on that host immediately.
Frequently Asked Questions
#!/usr/bin/env bash
# Cassandra Connection Refused -- Rapid Diagnostic Script
# Run as root or the cassandra OS user on the affected node
# Usage: bash cassandra-diag.sh 2>&1 | tee /tmp/cass-diag-$(date +%Y%m%d-%H%M%S).log
set -uo pipefail
CASS_LOG="/var/log/cassandra/system.log"
CASS_YAML="/etc/cassandra/cassandra.yaml"
echo "=== [1] Cassandra Service Status ==="
systemctl status cassandra --no-pager -l 2>&1 | head -20 || true
echo ""
echo "=== [2] Port 9042 Listening? ==="
ss -tlnp | grep 9042 || echo "ALERT: Nothing is listening on port 9042"
echo ""
echo "=== [3] Recent Errors in system.log ==="
if [ -f "$CASS_LOG" ]; then
grep -iE "error|exception|fatal|oom|killed|corrupt" "$CASS_LOG" | tail -30
else
echo "system.log not found at $CASS_LOG"
fi
echo ""
echo "=== [4] OOM Killer Activity ==="
dmesg | grep -iE "oom|killed process" | grep -i java | tail -10 \
|| echo "No OOM kills found in dmesg"
echo ""
echo "=== [5] GC Pause Warnings (>= 1000ms) ==="
grep -E "GCInspector.*[0-9]{4,}ms" "$CASS_LOG" 2>/dev/null | tail -10 \
|| echo "No long GC pauses found"
echo ""
echo "=== [6] Network Address Configuration ==="
if [ -f "$CASS_YAML" ]; then
grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
"$CASS_YAML"
else
echo "cassandra.yaml not found at $CASS_YAML"
fi
echo ""
echo "=== [7] JVM Heap Settings ==="
for f in /etc/cassandra/jvm.options \
/etc/cassandra/jvm11-server.options \
/etc/cassandra/jvm17-server.options; do
if [ -f "$f" ]; then
echo "File: $f"
grep -E "^-Xm[sx]" "$f"
break
fi
done || echo "No JVM options file found at expected paths"
echo ""
echo "=== [8] Firewall Rules for Port 9042 ==="
iptables -L INPUT -n --line-numbers 2>/dev/null | grep -E "9042|DROP|REJECT" | head -10 \
|| nft list ruleset 2>/dev/null | grep -B2 -A2 9042 \
|| echo "Unable to inspect firewall rules"
echo ""
echo "=== [9] Disk Usage ==="
df -h /var/lib/cassandra /var/log/cassandra 2>/dev/null || df -h /
echo ""
echo "=== [10] Cluster Status ==="
nodetool status 2>/dev/null || echo "nodetool unavailable -- Cassandra may not be running"
echo ""
echo "=== [11] Thread Pool Dropped Messages ==="
nodetool tpstats 2>/dev/null | grep -E "Stage|Dropped" | head -30 \
|| echo "nodetool unavailable"
echo ""
echo "=== [12] Compaction Backlog ==="
nodetool compactionstats 2>/dev/null | head -20 || echo "nodetool unavailable"
echo ""
echo "=== Diagnostic Complete ==="
echo "Reference: https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html"Error Medic Editorial
Error Medic Editorial is a team of senior DevOps engineers and SREs with hands-on experience operating Apache Cassandra clusters at scale in production. Our troubleshooting guides are derived from real incident postmortems and reviewed for technical accuracy against current Cassandra documentation.
Sources
- https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html
- https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html
- https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/tools/toolsNodetool.html
- https://stackoverflow.com/questions/21173931/nohostavailableexception-all-hosts-tried-for-query-failed
- https://github.com/apache/cassandra/blob/trunk/doc/source/troubleshooting/index.rst