Error Medic

Troubleshooting PostgreSQL Connection Refused & Service Failures

A complete DevOps guide to fixing PostgreSQL connection refused errors, service crashes, OOM, disk full, high CPU, and too many connections.

Last updated:
Last verified:
1,755 words
Key Takeaways
  • Connection refused usually means the PostgreSQL service is stopped, restarting, or listening on the wrong interface.
  • Service crashes are frequently caused by Out of Memory (OOM) killer, Disk Full errors on the WAL directory, or permission issues.
  • Connection limits ('too many connections') will block new clients; use a connection pooler like PgBouncer instead of blindly increasing max_connections.
  • Quick Fix: Check service status ('systemctl status postgresql') and inspect the latest logs ('journalctl -u postgresql' or '/var/log/postgresql/') to identify the exact root cause.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart Service (`systemctl restart`)Service is stopped or hung without data corruptionLowLow
Modify `listen_addresses` / `pg_hba.conf`Network connections are rejected or routing failsMediumLow
Increase `max_connections`Seeing 'FATAL: too many connections' in logsLowMedium (Can cause OOM)
Install & Configure PgBouncerHigh connection churn or persistent 'too many connections'HighLow
Tune `shared_buffers` & `work_mem`Experiencing OOM kills or high memory utilizationMediumMedium (Requires restart)
Clear Disk Space / Expand VolumeLogs show 'No space left on device' (Disk Full)MediumHigh (If deleting wrong files)

Understanding the Error: PostgreSQL Connection Refused

When you encounter a psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused error, it signifies that the client application successfully reached the host, but no PostgreSQL server process was listening on the specified port. This is arguably the most common PostgreSQL error, but it's often a symptom of a deeper issue, ranging from simple configuration oversights to severe resource exhaustion leading to a crash.

In this comprehensive guide, we will troubleshoot not only the standard "connection refused" scenario but also the cascading failures that cause it: service not starting, out of memory (OOM) kills, disk full errors, high CPU utilization, and connection limit exhaustion.

Step 1: Diagnose the Initial State

The first step is always to check the status of the PostgreSQL service and examine the most recent logs. A stopped service is the primary reason for a connection refusal.

1.1 Check Service Status

Use systemctl to verify if the service is active, inactive, or failed. If the service is inactive, it explains the connection refusal. The critical question is why it stopped.

1.2 Inspect PostgreSQL Logs

The logs are the ultimate source of truth. Depending on your OS and installation, they are typically found in /var/log/postgresql/, /var/lib/pgsql/data/log/, or accessible via journalctl -u postgresql. Look for FATAL, PANIC, or ERROR messages immediately preceding the time the connections started failing.

Step 2: Root Cause Analysis and Fixes

Based on your log analysis, you will typically fall into one of the following root cause categories.

Scenario A: The Service is Not Starting (Configuration or Permissions)

If systemctl status postgresql shows the service as failed immediately after trying to start, the issue is likely configuration or filesystem permissions.

  • Listen Addresses and Port: Ensure postgresql.conf has listen_addresses = '*' (or a specific IP) and port = 5432. If it's set to localhost but you attempt to connect externally, the connection will be refused by the OS before it even reaches Postgres.
  • pg_hba.conf: While a pg_hba.conf rejection usually results in a "fatal: no pg_hba.conf entry" or "permission denied" error rather than a pure "connection refused", severe misconfigurations here can prevent startup or block specific subnets.
  • Permission Denied: If the postgres system user cannot read the data directory, configuration files, or SSL certificates, the service will crash on startup. Fix ownership using chown -R postgres:postgres /var/lib/postgresql/data (adjust path as necessary) and ensure permissions are strictly 0700 for the data directory.
Scenario B: "Too Many Connections"

If the service is running but refusing new connections, you might see FATAL: sorry, too many clients already in the logs. This often presents as a "connection refused" or a timeout to the end-user application layer.

  • The Band-Aid Fix: You can increase max_connections in postgresql.conf. However, this is dangerous because every PostgreSQL connection forks a new OS process, consuming a baseline amount of memory. Setting this too high is a primary trigger for OOM kills.
  • The Engineering Fix: Implement a connection pooler like PgBouncer or Odyssey. A pooler sits between your application and PostgreSQL, multiplexing thousands of application connections onto a small, fixed number of actual database connections, drastically reducing memory overhead and context switching.
Scenario C: Resource Exhaustion (OOM, High CPU, Disk Full)

When a database server runs out of critical resources, the OS or the PostgreSQL engine itself will take drastic measures, leading to a crash or a forced restart. During this recovery phase, connections will be refused.

  • Out of Memory (OOM) and Segfaults: If you see Out of memory: Killed process in your system logs (dmesg or /var/log/messages), the Linux OOM killer terminated the PostgreSQL postmaster process to save the OS. This happens when memory configurations like shared_buffers, work_mem, or maintenance_work_mem are too high for the available system RAM. A segmentation fault (segfault) or core dump can also occur due to memory corruption, extensions with memory leaks, or hardware issues.
    • Fix: Tune postgresql.conf. A general rule of thumb is to set shared_buffers to 25% of total RAM. Strictly limit work_mem because this allocation can be multiplied by the number of active connections and the number of sort/hash operations in a single complex query. Lower max_connections if no pooler is used.
  • Disk Full: A PANIC: could not write to file "pg_wal/xlog..." : No space left on device error will immediately crash the database to prevent silent data corruption.
    • Fix: Free up disk space immediately. You may need to clear old log files (not WAL files!), expand the logical volume, or set up WAL archiving to move Write-Ahead Logs to a cheaper storage tier like S3 to prevent the pg_wal directory from filling up.
  • High CPU and Slow Queries: A system pinned at 100% CPU might not actively refuse network connections at the socket level, but connection timeouts from the application will masquerade as connection failures. High CPU is often caused by missing indexes forcing sequential scans on massive tables, unoptimized complex joins, or severe bloat.
    • Fix: Enable and use the pg_stat_statements extension to identify the most time-consuming queries. Use EXPLAIN ANALYZE on these queries to understand the query plan, add appropriate indexes, and ensure autovacuum is keeping statistics up to date.
Scenario D: Crashes and Core Dumps

If the PostgreSQL logs show server process (PID ...) was terminated by signal 9 or mention generating a core dump without an obvious OOM message, PostgreSQL encountered a critical internal failure.

  • Hardware Issues: Faulty RAM or failing local NVMe/SSD storage can cause unexpected panics that look like software bugs.
  • Software Bugs: While rare in core PostgreSQL, bugs do exist. Ensure you are running the latest minor release for your major PostgreSQL version (e.g., if on 15.2, upgrade to 15.x latest). Also, heavily scrutinize any third-party extensions (like PostGIS or TimescaleDB) as they run in the same memory space and a bug there will crash the entire database.

Step 3: Preventative Maintenance and SRE Best Practices

To avoid waking up to a critical "connection refused" alert, implement proactive monitoring and SRE practices:

  1. Monitor Connection Utilization: Set up alerts when active connections reach 80% of max_connections.
  2. Monitor Disk Space Strictly: Alert at 75%, 85%, and 90% utilization of the data and WAL directories. A full disk is an avoidable outage.
  3. Mandatory Connection Pooling: Make PgBouncer a standard component of your database architecture, even for small workloads, to protect the database engine.
  4. Autovacuum Tuning: Ensure autovacuum is aggressively configured. Default settings are often too passive for high-write databases, leading to table bloat, slow queries (high CPU), and eventually transaction ID wraparound issues.

Frequently Asked Questions

bash
# 1. Check if the PostgreSQL service is running and view its status
sudo systemctl status postgresql

# 2. View the last 50 lines of PostgreSQL logs to find PANIC, FATAL, or ERROR messages
sudo journalctl -u postgresql -n 50 --no-pager

# 3. Check for Out of Memory (OOM) kills in system logs
sudo dmesg -T | grep -i 'out of memory\|postgres\|killed'

# 4. Check disk space, specifically the partition holding /var/lib/postgresql
df -h

# 5. Fix ownership if permission denied errors are found on startup (adjust path for your OS)
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
sudo chmod 0700 /var/lib/postgresql/15/main

# 6. Check active connections and limits (run via psql if accessible)
# psql -U postgres -c "SHOW max_connections;"
# psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
E

Error Medic Editorial

Our SRE and DevOps editorial team specializes in deep-dive troubleshooting for mission-critical Linux infrastructure and open-source databases. We turn late-night pagers into robust, scalable architectures.

Sources

Related Articles in PostgreSQL

Explore More Linux Sysadmin Guides