Error Medic

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

Last updated:
Last verified:
1,846 words
Key Takeaways
  • Root cause #1: JVM heap exhaustion triggers a silent OS OOM kill — Cassandra writes no log entry. Check `dmesg | grep -i oom` and raise -Xmx in jvm.options to min(RAM/4, 8GB).
  • Root cause #2: Native transport misconfiguration — listen_address set to 0.0.0.0 (invalid) or start_native_transport: false prevents port 9042 from opening even when the process is running.
  • Root cause #3: Firewall rules blocking port 9042, or a full disk causing Cassandra to halt new connections to prevent data loss.
  • Root cause #4: Commitlog or SSTable corruption prevents node startup — identified by FSReadError or CorruptSSTableException in /var/log/cassandra/system.log.
  • Quick fix sequence: run `systemctl status cassandra`, `ss -tlnp | grep 9042`, `dmesg | grep oom`, then inspect system.log for ERROR or FATAL lines before attempting any restart.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart Cassandra serviceService stopped, no corruption errors in log2-5 minLow
Increase JVM heap (-Xmx)OOM kills in dmesg or GC pauses > 5s in system.log10 min + restartLow
Fix listen_address / rpc_addressProcess running but port 9042 not listening5 min + restartLow
Open firewall for port 9042nc -zv from client fails, no ACCEPT rule in iptables2 minLow
Remove corrupted commitlogFSReadError on startup, node refuses to start15 minMedium — recent writes may be lost
nodetool scrub --skip-corruptedCorruptSSTableException during reads, node online30-120 minMedium — corrupt rows dropped
nodetool repair -prData inconsistency after node recoveryHoursHigh — heavy I/O, low-traffic window
Restore from snapshotSevere unrecoverable data corruptionHoursHigh — requires recent valid backup

Understanding Cassandra "Connection Refused" Errors

When a client receives Connection refused on port 9042, the TCP handshake itself is failing — Cassandra is not running, not bound to the expected address, or a firewall is dropping the SYN packet. This differs from authentication errors (TCP completes before rejecting) and read timeouts (connection succeeds but query is slow).

Common error messages:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
  (tried: /10.0.0.1:9042 (TransportException: [/10.0.0.1:9042] Cannot connect))

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
  {'10.0.0.1': ConnectionRefusedError(111, 'Connection refused')})

Step 1: Verify Cassandra Service State

# Check service state
systemctl status cassandra --no-pager -l

# Confirm JVM process exists
ps aux | grep CassandraDaemon | grep -v grep

# Verify port 9042 is bound
ss -tlnp | grep 9042

If ss -tlnp | grep 9042 returns nothing, Cassandra is not listening. Read the last startup attempt before restarting:

tail -100 /var/log/cassandra/system.log | grep -iE "error|exception|fatal|killed"
journalctl -u cassandra -n 50 --no-pager

Step 2: Diagnose Out-of-Memory (OOM) Kills

OOM kills are the most frequent silent cause of Cassandra failures. The Linux kernel terminates the JVM without writing to Cassandra's logs.

# Kernel OOM evidence
dmesg | grep -iE "oom|killed process" | grep -i java

# GC pressure warnings
grep -E "GCInspector.*[0-9]{4,}ms" /var/log/cassandra/system.log | tail -20

A GC pause warning preceding self-halt:

WARN  [GCInspector] GCInspector.java:286 - G1 Young Generation GC in 11432ms.
G1 Eden Space: 8388608 -> 0; G1 Old Gen: 7516192768 -> 7814037504;

Pauses over 10 seconds trigger Cassandra's self-halt. Fix in /etc/cassandra/jvm.options (Java 8) or /etc/cassandra/jvm11-server.options (Java 11+):

# Rule: min(total_RAM / 4, 8GB). Never exceed 31GB.
# For a 32GB RAM server:
-Xms8G
-Xmx8G

Set -Xms equal to -Xmx to prevent resize pauses. Verify OS file descriptor limits:

ulimit -n  # Minimum 100000 for production Cassandra

Step 3: Check Native Transport Configuration

If the process is running but port 9042 is not listening:

grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
  /etc/cassandra/cassandra.yaml

Correct production settings:

listen_address: 10.0.0.1         # Specific IP — never 0.0.0.0
rpc_address: 0.0.0.0             # Accept client connections on all interfaces
broadcast_rpc_address: 10.0.0.1  # IP advertised to clients
native_transport_port: 9042
start_native_transport: true

Critical mistake: listen_address: 0.0.0.0 is invalid and causes startup failure. Always use a specific IP for listen_address and pair rpc_address: 0.0.0.0 with broadcast_rpc_address.

Step 4: Firewall and Network Verification

# Test TCP reachability from the application server
nc -zv <cassandra_node_ip> 9042

# iptables (RHEL, Amazon Linux, older Ubuntu)
iptables -L INPUT -n -v --line-numbers | grep -E "9042|DROP|REJECT"

# nftables (Debian 12+, Ubuntu 22.04+)
nft list ruleset | grep -B2 -A2 9042

Required open ports:

Port Purpose
7000 Inter-node gossip (plain)
7001 Inter-node gossip (TLS)
7199 JMX monitoring
9042 Native CQL (clients)

Step 5: Slow Queries and Timeout Troubleshooting

If connections succeed but you see OperationTimedOutException or WriteTimeoutException, the problem is throughput, not connectivity.

# Check thread pool saturation — non-zero Dropped is critical
nodetool tpstats

# Examine latency percentiles
nodetool proxyhistograms

# Check compaction backlog
nodetool compactionstats

# Identify hot tables
nodetool tablestats <keyspace>.<table> | grep -E "Maximum|Mean|Local read|Local write"

Non-zero Dropped values in ReadStage, MutationStage, or ViewMutationStage mean Cassandra is shedding work. Enable slow query logging in cassandra.yaml:

slow_query_log_timeout_in_ms: 500

Then review:

grep "slow query" /var/log/cassandra/system.log | tail -20

Step 6: Data Corruption Diagnosis and Recovery

Commitlog corruption prevents startup with:

ERROR [main] CassandraDaemon.java:689 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.io.IOException:
  Corrupt commitlog /var/lib/cassandra/commitlog/CommitLog-7-1639000000000.log

Commitlog recovery (risks loss of last few seconds of uncommitted writes):

systemctl stop cassandra
mv /var/lib/cassandra/commitlog /var/lib/cassandra/commitlog.bak.$(date +%s)
mkdir -p /var/lib/cassandra/commitlog
chown cassandra:cassandra /var/lib/cassandra/commitlog
systemctl start cassandra

For SSTable corruption discovered at runtime:

# Online scrub — drops corrupt rows, preserves valid data
nodetool scrub --skip-corrupted <keyspace> <table>

# Offline scrub for severe corruption
systemctl stop cassandra
sstablescrub --skip-corrupted <keyspace> <table>
systemctl start cassandra

# Restore replica consistency
nodetool repair -pr <keyspace>

Step 7: Validate Cluster Health

# All nodes should show UN (Up/Normal)
nodetool status

# Verify gossip propagation
nodetool gossipinfo | grep -E "STATUS|ENDPOINT_IP" | head -20

Healthy nodetool status output:

Datacenter: datacenter1
=======================
Status=Up/Down  |/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns  Host ID           Rack
UN  10.0.0.1    45.6 GiB   256     33.3% abc123-...-def456  rack1
UN  10.0.0.2    46.1 GiB   256     33.3% ghi789-...-jkl012  rack1
UN  10.0.0.3    44.9 GiB   256     33.4% mno345-...-pqr678  rack1

Any node showing DN (Down/Normal) is unreachable. Investigate with journalctl -u cassandra -n 200 --no-pager on that host immediately.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# Cassandra Connection Refused -- Rapid Diagnostic Script
# Run as root or the cassandra OS user on the affected node
# Usage: bash cassandra-diag.sh 2>&1 | tee /tmp/cass-diag-$(date +%Y%m%d-%H%M%S).log

set -uo pipefail

CASS_LOG="/var/log/cassandra/system.log"
CASS_YAML="/etc/cassandra/cassandra.yaml"

echo "=== [1] Cassandra Service Status ==="
systemctl status cassandra --no-pager -l 2>&1 | head -20 || true

echo ""
echo "=== [2] Port 9042 Listening? ==="
ss -tlnp | grep 9042 || echo "ALERT: Nothing is listening on port 9042"

echo ""
echo "=== [3] Recent Errors in system.log ==="
if [ -f "$CASS_LOG" ]; then
  grep -iE "error|exception|fatal|oom|killed|corrupt" "$CASS_LOG" | tail -30
else
  echo "system.log not found at $CASS_LOG"
fi

echo ""
echo "=== [4] OOM Killer Activity ==="
dmesg | grep -iE "oom|killed process" | grep -i java | tail -10 \
  || echo "No OOM kills found in dmesg"

echo ""
echo "=== [5] GC Pause Warnings (>= 1000ms) ==="
grep -E "GCInspector.*[0-9]{4,}ms" "$CASS_LOG" 2>/dev/null | tail -10 \
  || echo "No long GC pauses found"

echo ""
echo "=== [6] Network Address Configuration ==="
if [ -f "$CASS_YAML" ]; then
  grep -E "^(listen_address|rpc_address|broadcast_rpc_address|native_transport_port|start_native_transport)" \
    "$CASS_YAML"
else
  echo "cassandra.yaml not found at $CASS_YAML"
fi

echo ""
echo "=== [7] JVM Heap Settings ==="
for f in /etc/cassandra/jvm.options \
         /etc/cassandra/jvm11-server.options \
         /etc/cassandra/jvm17-server.options; do
  if [ -f "$f" ]; then
    echo "File: $f"
    grep -E "^-Xm[sx]" "$f"
    break
  fi
done || echo "No JVM options file found at expected paths"

echo ""
echo "=== [8] Firewall Rules for Port 9042 ==="
iptables -L INPUT -n --line-numbers 2>/dev/null | grep -E "9042|DROP|REJECT" | head -10 \
  || nft list ruleset 2>/dev/null | grep -B2 -A2 9042 \
  || echo "Unable to inspect firewall rules"

echo ""
echo "=== [9] Disk Usage ==="
df -h /var/lib/cassandra /var/log/cassandra 2>/dev/null || df -h /

echo ""
echo "=== [10] Cluster Status ==="
nodetool status 2>/dev/null || echo "nodetool unavailable -- Cassandra may not be running"

echo ""
echo "=== [11] Thread Pool Dropped Messages ==="
nodetool tpstats 2>/dev/null | grep -E "Stage|Dropped" | head -30 \
  || echo "nodetool unavailable"

echo ""
echo "=== [12] Compaction Backlog ==="
nodetool compactionstats 2>/dev/null | head -20 || echo "nodetool unavailable"

echo ""
echo "=== Diagnostic Complete ==="
echo "Reference: https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html"
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps engineers and SREs with hands-on experience operating Apache Cassandra clusters at scale in production. Our troubleshooting guides are derived from real incident postmortems and reviewed for technical accuracy against current Cassandra documentation.

Sources

Related Articles in Cassandra

Explore More Database Guides