Error Medic

PostgreSQL 'Connection Refused': Fix Service Failures, OOM, High CPU, Segfaults, and Crashes

Fix PostgreSQL connection refused errors fast. Step-by-step commands for service not starting, disk full, OOM kills, too many connections, segfaults, and high C

Last updated:
Last verified:
2,373 words
Key Takeaways
  • 'Connection refused' almost always means PostgreSQL is not running or is listening on the wrong interface—check with systemctl status postgresql and tail the server log in /var/log/postgresql/
  • Disk full (ENOSPC in WAL directory), OOM kills by the Linux kernel, and wrong file permissions on PGDATA or the Unix socket directory are the three most common startup failure causes
  • FATAL: too many connections is solved immediately by terminating idle sessions and permanently by deploying PgBouncer in transaction pooling mode—raising max_connections alone is a temporary stopgap
  • Segfaults and core dumps are crash-safe in PostgreSQL: the server replays WAL on restart, but you must identify the root cause (bad RAM, a known bug, or a corrupt extension) to prevent recurrence
  • High CPU from autovacuum storms or missing indexes is diagnosed with pg_stat_activity and EXPLAIN (ANALYZE, BUFFERS); kill runaway queries and tune autovacuum cost parameters for lasting relief
Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart PostgreSQL serviceService crashed, inactive, or stuck< 1 minLow
Free disk space / expand volumeENOSPC in logs, WAL directory >80% full5–30 minLow
Adjust max_connections and restartFATAL: too many connections at startup2–5 minMedium — requires restart
Deploy PgBouncer (transaction mode)Persistent connection exhaustion in production30–60 minLow — zero downtime
Tune shared_buffers / work_mem + restartOOM kills, postmaster memory pressure10 min + restartMedium
Terminate idle/long-running queriesHigh CPU, lock waits, autovacuum blocked< 1 minLow
Fix PGDATA and socket permissionsPermission denied on startup or connect2–5 minLow
Upgrade PostgreSQL minor versionSegfault from a known upstream bug1–4 hoursHigh — test on replica first

Understanding 'PostgreSQL Connection Refused'

When you run psql and see:

psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed:
    Connection refused

or the TCP variant:

psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused

it means either PostgreSQL never started, it crashed after starting, or it is listening on a different address or port than you expect. Each scenario has a distinct signature in the logs.


Step 1: Confirm Whether PostgreSQL Is Running

systemctl status postgresql          # Debian/Ubuntu
systemctl status postgresql@14-main  # Debian multi-version
systemctl status postgresql-14       # RHEL/CentOS

If the unit is failed or activating (auto-restart), pull the crash reason:

journalctl -u postgresql --since "30 minutes ago" --no-pager

Also check the PostgreSQL server log:

tail -n 100 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# RHEL path:
tail -n 100 /var/lib/pgsql/data/log/postgresql-$(date +%a).log

Startup failure signatures and their causes:

Log line Root cause
No space left on device Disk full — WAL or data directory
Out of memory Linux OOM killer terminated postmaster
Address already in use Port 5432 held by another process
data directory ... has wrong ownership PGDATA not owned by postgres user
permission denied on socket path /var/run/postgresql permissions wrong
database system identifier differs Wrong PGDATA pointed at wrong cluster
server process (PID N) was terminated by signal 11 Segfault / core dump

Step 2: Fix Disk Full (ENOSPC)

PostgreSQL will refuse all connections and crash the moment it cannot write a WAL segment. Check disk usage first:

df -h /var/lib/postgresql
du -sh /var/lib/postgresql/*/main/pg_wal/
du -sh /var/log/postgresql/

Immediate remediation:

# Delete archived WAL backup labels older than 7 days
find /var/lib/postgresql/*/main/pg_wal/ -name '*.backup' -mtime +7 -delete

# Rotate and compress old server logs
find /var/log/postgresql/ -name '*.log' -mtime +14 -delete

# If a pg_dump or temporary sort spilled to disk, clean /tmp
du -sh /tmp && find /tmp -name 'pgsql_tmp*' -delete

After freeing space, start PostgreSQL and run VACUUM FULL on the most bloated tables to reclaim physical space.


Step 3: Diagnose and Fix Out-of-Memory (OOM) Kills

The Linux kernel OOM killer terminates the PostgreSQL postmaster when system memory is exhausted. You will see in dmesg:

kernel: Out of memory: Killed process 18342 (postgres) total-vm:8GB, anon-rss:7GB

Check for OOM events:

dmesg | grep -i 'out of memory\|killed process\|oom_kill'
grep -i 'oom\|killed' /var/log/syslog | tail -30

Memory configuration in postgresql.conf:

shared_buffers = 2GB            # 25% of total RAM
work_mem = 64MB                 # per sort/hash; multiply by max_connections for worst case
maintenance_work_mem = 512MB   # VACUUM, CREATE INDEX, ALTER TABLE
effective_cache_size = 6GB      # planner hint only — 50–75% of total RAM
max_connections = 100           # lower this to reduce peak memory footprint

Protect the postmaster from the OOM killer permanently via systemd:

# /etc/systemd/system/postgresql.service.d/oom.conf
[Service]
OOMScoreAdjust=-1000
systemctl daemon-reload && systemctl restart postgresql

Step 4: Resolve 'Too Many Connections'

The full error is:

FATAL:  remaining connection slots are reserved for non-replication superuser connections

or simply FATAL: too many connections. Superusers always retain one reserved slot, so connect immediately as postgres:

psql -U postgres -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state ORDER BY 1 DESC;"

Terminate stale idle connections without a restart:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < now() - interval '5 minutes'
  AND pid <> pg_backend_pid();

Raise the hard limit in postgresql.conf and restart:

max_connections = 500

For a production fix without repeated restarts, install PgBouncer:

apt-get install pgbouncer   # Debian/Ubuntu
yum install pgbouncer       # RHEL

Minimal /etc/pgbouncer/pgbouncer.ini:

[databases]
mydb = host=127.0.0.1 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 20
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

Step 5: Fix Permission Denied Errors

OS-level permission errors prevent startup:

pg_ctl: could not open file ".../postmaster.pid": Permission denied
could not connect to server: Permission denied
  Is the server running locally and accepting connections on socket "/var/run/postgresql/.s.PGSQL.5432"?

Fix ownership:

chown -R postgres:postgres /var/lib/postgresql
chmod 700 /var/lib/postgresql/14/main
chown postgres:postgres /var/run/postgresql
chmod 775 /var/run/postgresql

For database-level authentication errors (FATAL: password authentication failed or FATAL: no pg_hba.conf entry for host), edit /etc/postgresql/14/main/pg_hba.conf:

# TYPE  DATABASE  USER    ADDRESS          METHOD
host    all       myapp   10.0.0.0/8       scram-sha-256
local   all       all                      peer

Reload without a full restart:

SELECT pg_reload_conf();
-- or from the shell:
systemctl reload postgresql

Step 6: Handle Segfaults and Core Dumps

PostgreSQL logs a segfault as:

LOG:  server process (PID 22791) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: SELECT * FROM large_table ORDER BY id;
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing

PostgreSQL is crash-safe: it replays WAL on the next startup and returns to a consistent state. Start the service and verify:

systemctl start postgresql
tail -f /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# Wait for: LOG: database system is ready to accept connections

To diagnose the root cause, enable core dumps:

# Allow postgres user to dump cores
ulimit -c unlimited
echo '/tmp/core.%e.%p.%t' | tee /proc/sys/kernel/core_pattern

# Reproduce and analyze
gdb $(which postgres) /tmp/core.postgres.22791.*
(gdb) bt full
(gdb) info registers

Common segfault causes:

  • Known PostgreSQL bug — upgrade to the latest minor release (e.g., 14.10 → 14.12).
  • Faulty RAM — run memtest86+ for a full memory test cycle.
  • Corrupt or incompatible extension — comment out extensions in shared_preload_libraries one at a time and restart.
  • Kernel version incompatibility — check the PostgreSQL release notes for your version.

Step 7: Diagnose High CPU and Slow Queries

High CPU without an outright crash degrades response times until the service becomes effectively unavailable.

-- Active queries running longer than 30 seconds
SELECT pid,
       now() - query_start AS duration,
       left(query, 100)    AS query_snippet,
       state,
       wait_event_type,
       wait_event
FROM pg_stat_activity
WHERE state <> 'idle'
  AND query_start < now() - interval '30 seconds'
ORDER BY duration DESC;

-- Tables with the most dead tuples (autovacuum bloat)
SELECT relname,
       n_dead_tup,
       n_live_tup,
       last_autovacuum,
       last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 15;

Kill a runaway query:

SELECT pg_cancel_backend(pid);    -- sends SIGINT (graceful)
SELECT pg_terminate_backend(pid); -- sends SIGTERM (forceful)

Tune autovacuum to be less disruptive:

autovacuum_vacuum_scale_factor = 0.01   # trigger earlier on large tables
autovacuum_vacuum_cost_delay = 2ms      # reduce I/O throttle
autovacuum_max_workers = 4

Step 8: Port Already in Use

If PostgreSQL cannot bind port 5432:

ss -tlnp | grep 5432
lsof -i TCP:5432

Kill the conflicting process or change PostgreSQL's port:

# postgresql.conf
port = 5433

Update all application connection strings and reload.


Monitoring and Prevention

  • Disk alerting: alert at 80% full on the PostgreSQL volume.
  • Connection monitoring: alert when pg_stat_activity count exceeds 80% of max_connections.
  • pg_stat_statements: enable this extension to track aggregate query performance over time.
  • pg_basebackup + WAL archiving: maintain a point-in-time recovery target of at least 24 hours.
  • Prometheus + postgres_exporter: provides dashboards for connections, cache hit ratio, dead tuples, and replication lag.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# postgresql-diagnose.sh — Rapid triage for PostgreSQL failures
# Run as root or the postgres OS user

set -euo pipefail

PG_LOG_DIR="${PG_LOG_DIR:-/var/log/postgresql}"
PGDATA="${PGDATA:-/var/lib/postgresql}"
PG_PORT="${PGPORT:-5432}"

sep() { echo; echo "=== $* ==="; }

# ── 1. Service status ──────────────────────────────────────────────────────
sep "1. PostgreSQL Service Status"
systemctl status postgresql --no-pager -l 2>/dev/null \
  || systemctl status 'postgresql*' --no-pager -l 2>/dev/null \
  || pg_ctl status 2>/dev/null \
  || echo "Cannot determine service status"

# ── 2. Listening sockets ───────────────────────────────────────────────────
sep "2. Port ${PG_PORT} Listeners"
ss -tlnp | grep "${PG_PORT}" || echo "Nothing listening on port ${PG_PORT}"

# ── 3. Disk space ──────────────────────────────────────────────────────────
sep "3. Disk Usage"
df -h "${PGDATA}"
echo "--- WAL directory size ---"
du -sh "${PGDATA}"/**/pg_wal/ 2>/dev/null || true
echo "--- Server log directory size ---"
du -sh "${PG_LOG_DIR}" 2>/dev/null || true

# ── 4. Recent log errors ───────────────────────────────────────────────────
sep "4. Recent FATAL/ERROR/PANIC in Server Log"
LATEST_LOG=$(ls -t "${PG_LOG_DIR}"/*.log 2>/dev/null | head -1 || true)
if [[ -n "${LATEST_LOG}" ]]; then
  grep -E 'FATAL|ERROR|PANIC|segfault|oom' "${LATEST_LOG}" | tail -50
else
  echo "No log files found in ${PG_LOG_DIR}"
fi

# ── 5. OOM events ──────────────────────────────────────────────────────────
sep "5. OOM Kill Events (dmesg)"
dmesg | grep -i 'out of memory\|oom_kill\|killed process' | tail -20 || echo "None found"

# ── 6. Connection counts ───────────────────────────────────────────────────
sep "6. Current Connection Counts"
psql -U postgres -Aqt -c \
  "SELECT lpad(count(*)::text,5) || ' ' || coalesce(state,'(null)') AS state_count
   FROM pg_stat_activity GROUP BY state ORDER BY count(*) DESC;" \
  2>/dev/null || echo "Cannot connect to PostgreSQL — skipping live checks"

# ── 7. Long-running queries ────────────────────────────────────────────────
sep "7. Queries Running > 30 Seconds"
psql -U postgres -c \
  "SELECT pid,
          now() - query_start                   AS duration,
          wait_event_type || '/' || wait_event  AS wait,
          left(query, 80)                        AS query_snippet
   FROM pg_stat_activity
   WHERE state <> 'idle'
     AND query_start < now() - interval '30 seconds'
   ORDER BY duration DESC
   LIMIT 20;" \
  2>/dev/null || true

# ── 8. Dead-tuple bloat ────────────────────────────────────────────────────
sep "8. Top Tables by Dead Tuples (run VACUUM ANALYZE on these)"
psql -U postgres -c \
  "SELECT relname,
          n_dead_tup,
          n_live_tup,
          round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup,0) * 100, 1) AS dead_pct,
          last_autovacuum::date
   FROM pg_stat_user_tables
   WHERE n_dead_tup > 10000
   ORDER BY n_dead_tup DESC
   LIMIT 10;" \
  2>/dev/null || true

# ── 9. File permissions ────────────────────────────────────────────────────
sep "9. Critical Directory Permissions"
ls -ld /var/run/postgresql "${PGDATA}" 2>/dev/null || true

# ── 10. Memory pressure ────────────────────────────────────────────────────
sep "10. System Memory"
free -h

echo
echo "Diagnostic complete. Share this output with your DBA or open a support ticket."
E

Error Medic Editorial

The Error Medic Editorial team is composed of senior DevOps engineers and SREs with combined decades of experience operating PostgreSQL clusters at scale — from single-server deployments to multi-terabyte high-availability installations. They specialize in Linux performance tuning, database reliability engineering, incident response, and post-mortem analysis.

Sources

Related Articles in PostgreSQL

Explore More Linux Sysadmin Guides