Question 1

How do I fix 'too many connections' errors in PostgreSQL or MySQL?

Accepted Answer

Don't just increase max_connections — each connection uses significant memory. Instead, deploy a connection pooler (PgBouncer for PostgreSQL, ProxySQL for MySQL) in front of the database. Set your application's connection pool size to match actual concurrency needs, typically 2–4× your CPU core count. Audit your application for connection leaks: ensure every connection is returned to the pool after use, especially in error paths.

Question 2

What causes database deadlocks and how do I prevent them?

Accepted Answer

Deadlocks occur when two transactions each hold a lock the other needs. The database resolves this by killing one transaction. Prevent them by: accessing tables and rows in a consistent order across all transactions, keeping transactions as short as possible, using the lowest necessary isolation level (READ COMMITTED is usually fine), and adding indexes to reduce lock scope. Always implement retry logic for deadlock errors — retry the entire transaction, not just the failed query.

Question 3

How do I identify and fix slow database queries?

Accepted Answer

Enable slow query logging (log_min_duration_statement in PostgreSQL, slow_query_log in MySQL) with a threshold around 100–500 ms. Use EXPLAIN ANALYZE to see the execution plan of slow queries. Look for sequential scans on large tables (add an index), nested loop joins (consider join order or add indexes), and high row estimates vs. actual rows (update statistics with ANALYZE). Common application fixes include eliminating N+1 queries, adding pagination, and caching frequently-read data.

Question 4

How do I handle database replication lag?

Accepted Answer

First, monitor it continuously — set alerts for lag exceeding your tolerance (e.g., 5 seconds). If lag is growing, check: replica CPU and IO utilization (it may need more resources), long-running queries on the replica (they can block replication), and large transactions on the primary (break bulk operations into smaller batches). Design your application with replication awareness: route writes to the primary, reads to replicas, and use the primary for reads requiring strong consistency.

Question 5

Should I use connection pooling, and which pooler should I choose?

Accepted Answer

Yes, always use connection pooling in production. For PostgreSQL, PgBouncer is the standard choice — it's lightweight, battle-tested, and supports transaction-level pooling. For MySQL, ProxySQL adds query routing and caching on top of pooling. Most ORMs and database drivers also have built-in pool support. Set your pool size based on your database's capacity (not your application's concurrency) and configure idle timeouts to reclaim unused connections.

Question 6

How do I recover from a database crash or data corruption?

Accepted Answer

Stop writes immediately to prevent further corruption. Check the database's crash recovery logs — PostgreSQL uses WAL replay, MySQL/InnoDB has crash recovery built in. Run integrity checks (pg_checksums for PostgreSQL, mysqlcheck for MySQL, db.repairDatabase() for MongoDB). If corruption is detected, restore from the most recent known-good backup and replay transaction logs to minimize data loss. This is why automated backups with point-in-time recovery are non-negotiable in production.

Question 7

When should I scale vertically vs. horizontally?

Accepted Answer

Scale vertically (bigger instance) first — it's simpler and avoids application changes. Move to a bigger instance class when you're consistently hitting CPU, memory, or IO limits. Scale horizontally (read replicas, sharding) when vertical scaling reaches its ceiling or when your read/write ratio strongly favors reads. Read replicas are straightforward; sharding adds significant application complexity and should be a last resort. Consider caching (Redis, Memcached) before sharding — it often eliminates the need entirely.

Symptom	Likely Cause	First Step
Connection refused	Database not running or wrong host/port	Verify service status; check host, port, and firewall rules
Too many connections	Connection pool exhaustion or connection leaks	Implement connection pooling; check for leaked connections; review pool size
Authentication failed	Wrong credentials or pg_hba.conf / user grants	Verify username/password; check host-based auth config and user privileges
Deadlock detected	Transactions locking rows in conflicting order	Add retry logic; ensure consistent row access order; shorten transactions
Lock wait timeout exceeded	Long-running transaction blocking others	Identify blocking query; kill or wait for it; add missing indexes
Slow queries / high latency	Missing indexes or inefficient query plans	Run EXPLAIN ANALYZE; add indexes on WHERE/JOIN columns; optimize query
Replication lag increasing	Replica under-provisioned or large write bursts	Scale replica resources; break large transactions into smaller batches
Disk space full	Uncontrolled data growth or WAL/binlog accumulation	Free space; set up log rotation; archive or purge old data; increase storage
Out of memory (OOM killed)	work_mem or buffer pool too large for available RAM	Tune memory settings; reduce max_connections; upgrade instance size
Corrupted data or crash recovery	Unclean shutdown, disk failure, or bug	Restore from backup; run integrity checks (pg_checksums, mysqlcheck)

Database Error Troubleshooting: Connection, Performance & Recovery

Browse by Category

Common Patterns & Cross-Cutting Themes

Connection Exhaustion & Pool Management

Lock Contention & Deadlocks

Query Performance & Slow Queries

Replication Lag & High Availability

Quick Troubleshooting Guide

Category Deep Dives

Cassandra

DynamoDB

Elasticsearch

InfluxDB

MariaDB

MongoDB

MySQL

Oracle DB

Other

PostgreSQL

Redis

SQL Server

Frequently Asked Questions