Ultimate Linux Troubleshooting Guide: Fixing 'Permission Denied', 'Connection Refused', '502 Bad Gateway', and 'OOM Killer'
Comprehensive SRE guide to resolving common Linux system errors: permission denied, port 22 connection refused, 502 Bad Gateway across stacks, and Linux OOM.
- 'Permission denied' errors usually stem from incorrect file ownership (chown), missing execution bits (chmod +x), or SELinux/AppArmor enforcing policies.
- 'Connection refused' on Port 22 or 80 is often caused by stopped services (sshd, nginx/apache), restrictive firewall rules (iptables, ufw, AWS Security Groups), or services bound to localhost instead of 0.0.0.0.
- '502 Bad Gateway' indicates a reverse proxy (Nginx/OpenResty/Apache) cannot communicate with the upstream application server (Node.js, Gunicorn, PHP-FPM) due to the upstream crashing, timing out, or listening on the wrong socket/port.
- Resource exhaustion manifests as 'OOM killer' terminating processes to free memory, or database 'Too many connections' when connection pools run dry. Monitoring and resource limits are key fixes.
| Error Symptom | Primary Root Cause Category | Diagnostic Tool | First Remediation Step |
|---|---|---|---|
| permission denied linux | File System / Security Context | ls -la, getfacl, dmesg (for SELinux) | Check user execution bit (chmod +x) or chown to current user. |
| port 22 connection refused | Networking / Daemon State | systemctl status sshd, netstat -tulpn | Verify SSH daemon is running and Security Groups/Firewall allow port 22. |
| 502 bad gateway | Reverse Proxy / Upstream | tail -f /var/log/nginx/error.log | Restart upstream service (e.g., systemctl restart php8.1-fpm or pm2 restart all). |
| oom killer linux | Memory / Resource Limits | dmesg -T | grep -i oom | Add swap space or optimize application memory footprint. |
| 1040 too many connections | Database Connection Pooling | SHOW PROCESSLIST; (MySQL) | Increase max_connections or implement a connection pooler like PgBouncer. |
Comprehensive Linux System Diagnostics
As a DevOps engineer or SRE, encountering errors is a daily reality. However, the vast majority of server outages, deployment failures, and application crashes boil down to a handful of fundamental categories: permission issues, network connectivity failures, reverse proxy upstream errors, and resource exhaustion. This guide provides a deep dive into diagnosing and resolving these critical failure modes.
1. Resolving 'Permission Denied' Errors
The permission denied linux error is perhaps the most ubiquitous hurdle for developers and system administrators. Linux relies on a strict Discretionary Access Control (DAC) system, often augmented by Mandatory Access Control (MAC) like SELinux or AppArmor.
Shell Scripts and Executables
When you encounter permission denied shell script or zsh permission denied kali linux upon trying to run a script, the issue is almost always a missing executable bit. Files created or downloaded do not automatically have execute permissions for security reasons.
Diagnosis:
Run ls -la script.sh. If the output looks like -rw-r--r--, there is no x (execute) flag.
Resolution:
chmod +x script.sh
./script.sh
This same principle applies when installing packages manually, such as facing google chrome stable_current_amd64 deb permission denied. Always ensure you are running installation commands with elevated privileges (sudo dpkg -i package.deb) and that the file itself is readable.
Repository and Configuration Files
Errors like bash etc yum repos d kubernetes repo permission denied usually happen when standard users try to write to system directories. Using sudo echo "..." > /etc/... will fail because the shell redirection (>) happens before sudo escalates privileges.
Resolution:
Use tee to write to protected files:
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Advanced Permissions: Azure, NTFS, and Kali
- azure cloud shell permission denied: Often relates to mounted storage. Ensure the storage account linked to your Cloud Shell hasn't had its IAM roles modified.
- ntfs deny permissions: When mounting Windows NTFS drives in Linux, standard
chmodwon't work. You must specify theuid,gid,dmask, andfmaskoptions in your/etc/fstabormountcommand. - kali linux permission denied / ubuntu permission denied: When operating in highly secure or hardened distributions, always verify if AppArmor or SELinux is actively blocking an action, even if standard file permissions look correct. Check
dmesgor/var/log/audit/audit.log.
2. SSH and Network 'Connection Refused' Errors
A connection refused error implies that the network packet reached the destination server, but no service was actively listening on that port, or a firewall actively rejected the connection (though firewalls often drop packets resulting in a timeout, an explicit REJECT rule causes a connection refused).
SSH: Port 22 Connection Refused
Whether it's ec2 port 22 connection refused, cpanel port 22 connection refused, or a generic port 22 connection refused, being locked out of your server is a critical emergency.
Diagnosis & Resolution:
- Is the service running? If you have out-of-band access (like AWS Systems Manager or a hypervisor console), run
sudo systemctl status sshd. If it's dead,sudo systemctl start sshd. - Is it listening on the right interface? Check
/etc/ssh/sshd_configto ensureListenAddressis correct and it isn't bound strictly to a local interface if you need external access. - Firewalls and Security Groups: The most common cause for
aws port 22 connection refusedorport 22 connection refused ec2is an AWS Security Group that doesn't allow inbound TCP on port 22 from your current IP address. Update the SG rules via the AWS Console. - Cisco and Network Gear: If you see
cisco connection refusedorthe remote system refused the connection cisco(often seen inputty connection refusedscenarios), verify that the management plane hasn't locked out your IP due to failed login attempts (ACLs or SSH rate limiting).
Public Key Authentication Denied
Errors like ubuntu permission denied publickey or git github com permission denied publickey linux mean the server rejected your SSH key.
Resolution for GitHub:
- Ensure your key is loaded:
ssh-add -l. - Ensure the key is added to your GitHub account.
- Test the connection:
ssh -T git@github.com.
Resolution for Ubuntu/EC2:
Ensure the private key permissions are secure (chmod 400 private_key.pem) and that the corresponding public key is correctly placed in ~/.ssh/authorized_keys on the server.
Web Traffic: Port 80 Connection Refused
If you encounter aws port 80 connection refused or nodejs connection refused, verify that your web server (Nginx/Apache) or Node.js application is actively running and bound to 0.0.0.0 (all interfaces) rather than 127.0.0.1 (localhost only).
3. Untangling '502 Bad Gateway' Errors
The 502 Bad Gateway error is the bane of modern web architectures. It means that an edge server or reverse proxy (Nginx, Apache, OpenResty, Tengine, AWS ALB) received an invalid response from the upstream application server.
The Anatomy of a 502
When Nginx (openresty 502 bad gateway, 502 bad gateway ubuntu) passes a request to a backend, it expects a timely, valid HTTP response. If the backend is dead, crashing, or takes too long, Nginx throws a 502.
Framework-Specific 502 Troubleshooting
- Node.js (
node js 502 bad gateway,502 bad gateway node js): The Node application (Express/NestJS) likely crashed due to an unhandled exception or ran out of memory. Check PM2 logs (pm2 logs) or systemd logs (journalctl -u my-node-app). Ensure your reverse proxy configuration points to the correct local port. - Django & Python (
django 502 bad gateway,502 bad gateway elastic beanstalk django): In WSGI setups (Gunicorn/uWSGI), a 502 usually means Gunicorn is not running, or it crashed while processing a heavy request. In AWS Elastic Beanstalk, check the/var/log/eb-engine.logand/var/log/web.stdout.log. Often, this is caused by a syntax error in the code causing the WSGI worker to fail on boot. - PHP & PHP-FPM (
php bad gateway,phpmyadmin 502 bad gateway,valet 502 bad gateway): Nginx communicates with PHP via a Unix socket (e.g.,/run/php/php8.1-fpm.sock) or a TCP port. A 502 means the PHP-FPM service is down, or the socket path in your Nginx config (fastcgi_pass) doesn't match the actual socket path. Restart PHP-FPM. - Magento 2 (
502 bad gateway magento 2,magento 502 bad gateway): Magento is resource-intensive. A 502 often occurs during compilation (setup:di:compile), caching, or heavy Elasticsearch queries that cause the PHP process to time out. Increasemax_execution_timeandmemory_limitinphp.ini. - Monitoring Tools (
librenms 502 bad gateway): LibreNMS relies on PHP-FPM and a database. If the poller consumes all resources, PHP-FPM might drop connections. Check the LibreNMS validate script (./validate.php). - Alternative Proxies (
openresty 502,502 bad gateway tengine): These Nginx forks operate similarly. Check their respective error logs. Often, Lua scripts in OpenResty might crash, leading to a gateway error.
4. Resource Exhaustion: OOM Killer and Connection Limits
When a server runs out of critical resources, the symptoms can be chaotic, ranging from dropped connections to suddenly terminated processes.
Out of Memory: linux oom and oom killer linux
The Linux kernel has an Out-Of-Memory (OOM) killer. When the system's RAM and swap are exhausted, the kernel steps in and terminates a process to save the system from a complete crash. It uses a heuristic to pick the "badness" of a process, often targeting memory-hungry applications like Java, MySQL, or Node.js.
Diagnosis: If your service mysteriously restarts or you get a 502 Bad Gateway, always check if the OOM killer was invoked:
sudo dmesg -T | grep -i 'killed process'
Resolution:
- Add Swap Space:
fallocate -l 2G /swapfile && mkswap /swapfile && swapon /swapfile. - Optimize application memory usage.
- Increase the server's physical RAM.
- Tune
oom_score_adjto protect critical services (use with extreme caution).
Database Exhaustion: 1040 too many connections
Databases have a hard limit on concurrent connections. If your application leaks connections or experiences a massive traffic spike, you will see 1040 too many connections (MySQL/MariaDB) or similar errors in PostgreSQL, often manifesting as too many connections rds in AWS.
Diagnosis:
Log into the database (if possible) and run SHOW PROCESSLIST; (MySQL) or check pg_stat_activity (PostgreSQL) to see where connections are coming from.
Resolution:
- Immediate: Restart the application servers to forcefully close dangling connections, or temporarily increase
max_connectionsin your database parameter group/config. - Long-term: Implement connection pooling. Use PgBouncer for PostgreSQL or ProxySQL for MySQL. Ensure your application framework is correctly configured to return connections to the pool after use.
By systematically checking logs, understanding the flow of network traffic, and monitoring system resources, DevOps teams can quickly identify the root cause of these common Linux system errors and restore service reliability.
Frequently Asked Questions
#!/bin/bash
# Quick SRE Diagnostic Script for Linux Servers
echo "--- Checking for recent OOM Killer invocations ---"
dmesg -T | grep -iE "out of memory|killed process" | tail -n 5
echo -e "\n--- Checking listening ports (investigate connection refused) ---"
sudo ss -tulpn | grep -E ":22|:80|:443|:3000|:8080"
echo -e "\n--- Checking Nginx Error Logs (investigate 502 Bad Gateway) ---"
sudo tail -n 10 /var/log/nginx/error.log 2>/dev/null || echo "Nginx logs not found."
echo -e "\n--- Checking Memory Usage ---"
free -hError Medic Editorial
Written by our team of Senior DevOps and Site Reliability Engineers. We specialize in untangling complex Linux, cloud, and containerized infrastructure issues, providing actionable solutions for developers and sysadmins.