Why does Cassandra refuse connections immediately after a system reboot?

Two common causes: (1) The systemd unit is not enabled for auto-start — fix with `systemctl enable cassandra`. (2) Cassandra takes 2-5 minutes to finish bootstrapping before accepting client connections. During this window, `nodetool status` shows the node as `UJ` (Up/Joining). Wait for the state to reach `UN` (Up/Normal). If the service never reaches `UN`, check `journalctl -u cassandra -n 100 --no-pager` for fatal startup errors.

What does 'All host(s) tried for query failed' mean and how do I fix it?

This `NoHostAvailableException` from the DataStax driver means every contact point returned a connection-level error before any CQL query was attempted. Diagnosis: (1) verify the node is running with `systemctl status cassandra`; (2) test TCP reachability from the app server with `nc -zv 9042`; (3) if nc succeeds but the driver still fails, check for TLS mismatches or authentication configuration between the driver version and the server's cassandra.yaml.

How much JVM heap should I allocate to prevent Cassandra OOM kills?

Use `min(total_RAM / 4, 8GB)` and never exceed 31 GB. Above 31 GB, the JVM disables CompressedOops, dramatically increasing GC pressure. For a 64 GB RAM server set `-Xms16G -Xmx16G`. Always match Xms to Xmx to eliminate resize pauses. Leave remaining RAM for the OS page cache, which Cassandra uses heavily to accelerate SSTable reads without touching the JVM heap.

Cassandra is running and port 9042 is open but queries are timing out — where do I start?

Run `nodetool tpstats` first. Non-zero `Dropped` in `ReadStage`, `MutationStage`, or `ViewMutationStage` means Cassandra is rejecting work. Next run `nodetool compactionstats` — a large pending compaction backlog degrades read performance significantly. Finally check for hot partitions with `nodetool tablestats . `: a `Maximum live rows per slice` orders of magnitude above the mean indicates a hot partition receiving disproportionate traffic.

How do I distinguish commitlog corruption from SSTable corruption?

Commitlog corruption prevents startup — you see `Corrupt commitlog` or `FSReadError` in the first lines of the startup log and the process exits immediately. SSTable corruption is discovered during reads (producing `CorruptSSTableException`) or via `nodetool verify ` while the node stays running. Commitlog corruption risks only the last few seconds of writes and is resolved by moving the corrupt segments. SSTable corruption may cause permanent row-level data loss and requires `nodetool scrub` or a backup restore.

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

Last updated: February 23, 2026

Last verified: February 23, 2026

1,846 words

Key Takeaways

Root cause #1: JVM heap exhaustion triggers a silent OS OOM kill — Cassandra writes no log entry. Check `dmesg | grep -i oom` and raise -Xmx in jvm.options to min(RAM/4, 8GB).
Root cause #2: Native transport misconfiguration — listen_address set to 0.0.0.0 (invalid) or start_native_transport: false prevents port 9042 from opening even when the process is running.
Root cause #3: Firewall rules blocking port 9042, or a full disk causing Cassandra to halt new connections to prevent data loss.
Root cause #4: Commitlog or SSTable corruption prevents node startup — identified by FSReadError or CorruptSSTableException in /var/log/cassandra/system.log.
Quick fix sequence: run `systemctl status cassandra`, `ss -tlnp | grep 9042`, `dmesg | grep oom`, then inspect system.log for ERROR or FATAL lines before attempting any restart.

Fix Approaches Compared
Method	When to Use	Time	Risk
Restart Cassandra service	Service stopped, no corruption errors in log	2-5 min	Low
Increase JVM heap (-Xmx)	OOM kills in dmesg or GC pauses > 5s in system.log	10 min + restart	Low
Fix listen_address / rpc_address	Process running but port 9042 not listening	5 min + restart	Low
Open firewall for port 9042	nc -zv from client fails, no ACCEPT rule in iptables	2 min	Low
Remove corrupted commitlog	FSReadError on startup, node refuses to start	15 min	Medium — recent writes may be lost
nodetool scrub --skip-corrupted	CorruptSSTableException during reads, node online	30-120 min	Medium — corrupt rows dropped
nodetool repair -pr	Data inconsistency after node recovery	Hours	High — heavy I/O, low-traffic window
Restore from snapshot	Severe unrecoverable data corruption	Hours	High — requires recent valid backup

Understanding Cassandra "Connection Refused" Errors

When a client receives Connection refused on port 9042, the TCP handshake itself is failing — Cassandra is not running, not bound to the expected address, or a firewall is dropping the SYN packet. This differs from authentication errors (TCP completes before rejecting) and read timeouts (connection succeeds but query is slow).

Common error messages:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
  (tried: /10.0.0.1:9042 (TransportException: [/10.0.0.1:9042] Cannot connect))

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
  {'10.0.0.1': ConnectionRefusedError(111, 'Connection refused')})

Step 1: Verify Cassandra Service State

# Check service state
systemctl status cassandra --no-pager -l

# Confirm JVM process exists
ps aux | grep CassandraDaemon | grep -v grep

# Verify port 9042 is bound
ss -tlnp | grep 9042

If ss -tlnp | grep 9042 returns nothing, Cassandra is not listening. Read the last startup attempt before restarting:

tail -100 /var/log/cassandra/system.log | grep -iE "error|exception|fatal|killed"
journalctl -u cassandra -n 50 --no-pager

Step 2: Diagnose Out-of-Memory (OOM) Kills

OOM kills are the most frequent silent cause of Cassandra failures. The Linux kernel terminates the JVM without writing to Cassandra's logs.

# Kernel OOM evidence
dmesg | grep -iE "oom|killed process" | grep -i java

# GC pressure warnings
grep -E "GCInspector.*[0-9]{4,}ms" /var/log/cassandra/system.log | tail -20

A GC pause warning preceding self-halt:

WARN  [GCInspector] GCInspector.java:286 - G1 Young Generation GC in 11432ms.
G1 Eden Space: 8388608 -> 0; G1 Old Gen: 7516192768 -> 7814037504;

Pauses over 10 seconds trigger Cassandra's self-halt. Fix in /etc/cassandra/jvm.options (Java 8) or /etc/cassandra/jvm11-server.options (Java 11+):

# Rule: min(total_RAM / 4, 8GB). Never exceed 31GB.
# For a 32GB RAM server:
-Xms8G
-Xmx8G

Set -Xms equal to -Xmx to prevent resize pauses. Verify OS file descriptor limits:

ulimit -n  # Minimum 100000 for production Cassandra

Step 3: Check Native Transport Configuration

If the process is running but port 9042 is not listening:

grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
  /etc/cassandra/cassandra.yaml

Correct production settings:

listen_address: 10.0.0.1         # Specific IP — never 0.0.0.0
rpc_address: 0.0.0.0             # Accept client connections on all interfaces
broadcast_rpc_address: 10.0.0.1  # IP advertised to clients
native_transport_port: 9042
start_native_transport: true

Critical mistake: listen_address: 0.0.0.0 is invalid and causes startup failure. Always use a specific IP for listen_address and pair rpc_address: 0.0.0.0 with broadcast_rpc_address.

Step 4: Firewall and Network Verification

# Test TCP reachability from the application server
nc -zv <cassandra_node_ip> 9042

# iptables (RHEL, Amazon Linux, older Ubuntu)
iptables -L INPUT -n -v --line-numbers | grep -E "9042|DROP|REJECT"

# nftables (Debian 12+, Ubuntu 22.04+)
nft list ruleset | grep -B2 -A2 9042

Required open ports:

Port	Purpose
7000	Inter-node gossip (plain)
7001	Inter-node gossip (TLS)
7199	JMX monitoring
9042	Native CQL (clients)

Step 5: Slow Queries and Timeout Troubleshooting

If connections succeed but you see OperationTimedOutException or WriteTimeoutException, the problem is throughput, not connectivity.

# Check thread pool saturation — non-zero Dropped is critical
nodetool tpstats

# Examine latency percentiles
nodetool proxyhistograms

# Check compaction backlog
nodetool compactionstats

# Identify hot tables
nodetool tablestats <keyspace>.<table> | grep -E "Maximum|Mean|Local read|Local write"

Non-zero Dropped values in ReadStage, MutationStage, or ViewMutationStage mean Cassandra is shedding work. Enable slow query logging in cassandra.yaml:

slow_query_log_timeout_in_ms: 500

Then review:

grep "slow query" /var/log/cassandra/system.log | tail -20

Step 6: Data Corruption Diagnosis and Recovery

Commitlog corruption prevents startup with:

ERROR [main] CassandraDaemon.java:689 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.io.IOException:
  Corrupt commitlog /var/lib/cassandra/commitlog/CommitLog-7-1639000000000.log

Commitlog recovery (risks loss of last few seconds of uncommitted writes):

systemctl stop cassandra
mv /var/lib/cassandra/commitlog /var/lib/cassandra/commitlog.bak.$(date +%s)
mkdir -p /var/lib/cassandra/commitlog
chown cassandra:cassandra /var/lib/cassandra/commitlog
systemctl start cassandra

For SSTable corruption discovered at runtime:

# Online scrub — drops corrupt rows, preserves valid data
nodetool scrub --skip-corrupted <keyspace> <table>

# Offline scrub for severe corruption
systemctl stop cassandra
sstablescrub --skip-corrupted <keyspace> <table>
systemctl start cassandra

# Restore replica consistency
nodetool repair -pr <keyspace>

Step 7: Validate Cluster Health

# All nodes should show UN (Up/Normal)
nodetool status

# Verify gossip propagation
nodetool gossipinfo | grep -E "STATUS|ENDPOINT_IP" | head -20

Healthy nodetool status output:

Datacenter: datacenter1
=======================
Status=Up/Down  |/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns  Host ID           Rack
UN  10.0.0.1    45.6 GiB   256     33.3% abc123-...-def456  rack1
UN  10.0.0.2    46.1 GiB   256     33.3% ghi789-...-jkl012  rack1
UN  10.0.0.3    44.9 GiB   256     33.4% mno345-...-pqr678  rack1

Any node showing DN (Down/Normal) is unreachable. Investigate with journalctl -u cassandra -n 200 --no-pager on that host immediately.

Frequently Asked Questions

bash

#!/usr/bin/env bash
# Cassandra Connection Refused -- Rapid Diagnostic Script
# Run as root or the cassandra OS user on the affected node
# Usage: bash cassandra-diag.sh 2>&1 | tee /tmp/cass-diag-$(date +%Y%m%d-%H%M%S).log

set -uo pipefail

CASS_LOG="/var/log/cassandra/system.log"
CASS_YAML="/etc/cassandra/cassandra.yaml"

echo "=== [1] Cassandra Service Status ==="
systemctl status cassandra --no-pager -l 2>&1 | head -20 || true

echo ""
echo "=== [2] Port 9042 Listening? ==="
ss -tlnp | grep 9042 || echo "ALERT: Nothing is listening on port 9042"

echo ""
echo "=== [3] Recent Errors in system.log ==="
if [ -f "$CASS_LOG" ]; then
  grep -iE "error|exception|fatal|oom|killed|corrupt" "$CASS_LOG" | tail -30
else
  echo "system.log not found at $CASS_LOG"
fi

echo ""
echo "=== [4] OOM Killer Activity ==="
dmesg | grep -iE "oom|killed process" | grep -i java | tail -10 \
  || echo "No OOM kills found in dmesg"

echo ""
echo "=== [5] GC Pause Warnings (>= 1000ms) ==="
grep -E "GCInspector.*[0-9]{4,}ms" "$CASS_LOG" 2>/dev/null | tail -10 \
  || echo "No long GC pauses found"

echo ""
echo "=== [6] Network Address Configuration ==="
if [ -f "$CASS_YAML" ]; then
  grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
    "$CASS_YAML"
else
  echo "cassandra.yaml not found at $CASS_YAML"
fi

echo ""
echo "=== [7] JVM Heap Settings ==="
for f in /etc/cassandra/jvm.options \
         /etc/cassandra/jvm11-server.options \
         /etc/cassandra/jvm17-server.options; do
  if [ -f "$f" ]; then
    echo "File: $f"
    grep -E "^-Xm[sx]" "$f"
    break
  fi
done || echo "No JVM options file found at expected paths"

echo ""
echo "=== [8] Firewall Rules for Port 9042 ==="
iptables -L INPUT -n --line-numbers 2>/dev/null | grep -E "9042|DROP|REJECT" | head -10 \
  || nft list ruleset 2>/dev/null | grep -B2 -A2 9042 \
  || echo "Unable to inspect firewall rules"

echo ""
echo "=== [9] Disk Usage ==="
df -h /var/lib/cassandra /var/log/cassandra 2>/dev/null || df -h /

echo ""
echo "=== [10] Cluster Status ==="
nodetool status 2>/dev/null || echo "nodetool unavailable -- Cassandra may not be running"

echo ""
echo "=== [11] Thread Pool Dropped Messages ==="
nodetool tpstats 2>/dev/null | grep -E "Stage|Dropped" | head -30 \
  || echo "nodetool unavailable"

echo ""
echo "=== [12] Compaction Backlog ==="
nodetool compactionstats 2>/dev/null | head -20 || echo "nodetool unavailable"

echo ""
echo "=== Diagnostic Complete ==="
echo "Reference: https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html"

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps engineers and SREs with hands-on experience operating Apache Cassandra clusters at scale in production. Our troubleshooting guides are derived from real incident postmortems and reviewed for technical accuracy against current Cassandra documentation.

Sources

Explore More Database Guides

DynamoDB Slow Query, Timeout & Table Lock: Complete Troubleshooting Guide

Fix DynamoDB slow queries, ProvisionedThroughputExceededException, and timeout errors. Step-by-step diagnosis with AWS CLI commands and proven solutions.

ERROR: deadlock detected - Resolving PostgreSQL Deadlocks & Connection Exhaustion

Fix PostgreSQL deadlocks (ERROR: 40P01) and connection pool exhaustion. Learn to trace lock contention, enforce consistent lock ordering, and optimize transacti

Fixing 'Connection Refused' and Timeout Errors in InfluxDB: A Complete Guide

Diagnose and resolve InfluxDB connection refused, out of memory (OOM), and slow query timeouts. Learn to tune influxdb.conf and optimize cardinality.