Why does my Docker container keep restarting with exit code 137?

Exit code 137 (128 + SIGKILL) means the container was killed by the Linux OOM killer. Verify with: docker inspect --format='{{.State.OOMKilled}}' — if it returns true, set a memory limit: docker run -m 2g your-image or add mem_limit: 2g to your docker-compose.yml service. Also run free -h to check if the host itself is low on memory; if so, add swap space or upgrade the instance.

What causes 'no space left on device' in Docker and how do I fix it quickly?

Docker fills disk by accumulating image layers, stopped container filesystems, build cache, and anonymous volumes under /var/lib/docker. Check breakdown with: docker system df. For an immediate fix run: docker system prune -a --volumes (removes all unused resources). For a permanent fix, move Docker's data directory to a larger partition by setting 'data-root' in /etc/docker/daemon.json and restarting the daemon.

Why is my Docker container returning 502 or 504 errors behind nginx or Traefik?

502 Bad Gateway means nginx cannot reach the upstream container — it is likely down or crashed. Run docker ps -a and docker logs to confirm. 504 Gateway Timeout means the container is alive but too slow to respond within the proxy timeout — check docker stats for CPU/memory saturation and run docker exec top to identify the bottleneck. Also verify the port binding with docker port .

How do I debug a Docker container that crashes immediately on startup?

Run the container interactively to see all stdout/stderr: docker run -it --rm your-image. Check the correct entrypoint with: docker inspect your-image --format='{{.Config.Entrypoint}} {{.Config.Cmd}}'. For crashed stopped containers, check the exit code: docker inspect --format='{{.State.ExitCode}} OOM={{.State.OOMKilled}}'. Exit code 1 means application error (check logs), 137 means OOM kill, 126 means permission error on entrypoint, and 127 means entrypoint binary not found.

Docker Permission Denied: Complete Fix Guide for Crashes, OOM, Disk Full, 502/504 & More

Q: How do I fix 'Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock'?

This means your Linux user is not in the docker group. Run: sudo usermod -aG docker $USER then immediately apply with: newgrp docker (or log out and back in). Confirm with: groups | grep docker then test with: docker ps. If you are running inside a CI runner, pass --group-add $(stat -c '%g' /var/run/docker.sock) to any DinD container.

Fix docker permission denied, OOM kills, no space left on device, 502/504 errors, and high CPU with step-by-step Linux commands and a diagnostic script.

Last updated: February 23, 2026

Last verified: February 23, 2026

2,486 words

Key Takeaways

Permission denied on /var/run/docker.sock is caused by your Linux user not belonging to the docker group — fix with: sudo usermod -aG docker $USER && newgrp docker
Exit code 137 means the container was OOM-killed by the Linux kernel — set a memory limit with docker run -m 2g or mem_limit: 2g in docker-compose.yml
'no space left on device' errors require pruning stopped containers, dangling images, and unused volumes with docker system prune -a --volumes
502 Bad Gateway and connection refused errors almost always mean the Docker daemon is down or the container crashed before binding its port — check systemctl status docker and docker logs
Quick fix summary: (1) verify daemon is running, (2) fix user group membership, (3) prune disk space, (4) set memory/CPU resource limits, (5) read docker logs for root cause

Fix Approaches Compared
Method	When to Use	Time	Risk
sudo usermod -aG docker $USER	Permission denied on docker.sock	< 1 min + re-login	Low
docker system prune -a --volumes	No space left on device / disk full	1–10 min	Medium — deletes unused data
docker run -m 2g	Container OOM killed (exit code 137)	< 1 min	Low
sudo systemctl restart docker	Daemon unresponsive / 502 Bad Gateway	< 1 min	Medium — stops all containers
Edit daemon.json data-root	Docker partition permanently full	5–15 min + migration	High — requires data copy
docker update --cpus='1.5'	Container consuming excessive CPU	< 1 min	Low
DOCKER_BUILDKIT=1 docker build	Slow Docker image builds	Varies	None
sudo journalctl -xeu docker.service	Daemon fails to start / core dumps	Diagnostic only	None

Understanding Docker Errors on Linux

Docker errors range from a simple Unix socket permission mismatch to kernel-level OOM kills and filesystem exhaustion. Every problem has a clear diagnostic path. This guide covers each failure class with the exact error strings you will see and the commands to resolve them.

Exact Error Messages You Will Encounter

Permission denied connecting to the daemon:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: dial unix /var/run/docker.sock: connect: permission denied

Container OOM killed:

Error response from daemon: Cannot start container <id>: [8] System error: cannot allocate memory
Killed

Exit code will be 137 (128 + SIGKILL).

Disk / filesystem full:

Error response from daemon: no space left on device
Write /var/lib/docker/tmp/GetImageBlob: no space left on device

Daemon not running:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

DNS or registry connection refused:

dial tcp: lookup registry-1.docker.io: connection refused

502 / 504 from reverse proxy (nginx, Traefik, Caddy):

502 Bad Gateway
504 Gateway Timeout
upstream connect error or disconnect/reset before headers. reset reason: connection failure

Step 1: Verify the Docker Daemon Is Running

Every Docker failure starts with this check. If the daemon is down, every command fails.

sudo systemctl status docker

If output shows Active: failed or Active: inactive:

sudo systemctl start docker

If it refuses to start, inspect the systemd journal:

sudo journalctl -xeu docker.service --no-pager | tail -60

Common daemon startup failures include corrupted /var/lib/docker, invalid JSON in /etc/docker/daemon.json, or a port/socket conflict. Validate the config file before restarting:

sudo dockerd --validate --config-file /etc/docker/daemon.json

If the config is malformed, reset it to a safe default:

echo '{}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker

Step 2: Fix Docker Permission Denied Errors

The permission denied error on /var/run/docker.sock is the most common Docker issue on Linux. Docker's Unix socket is owned by the docker group; only root or group members can access it.

Check your current group membership:

groups $USER

Add your user to the docker group:

sudo usermod -aG docker $USER

Apply without logging out:

newgrp docker

Or fully log out and back in, then verify:

groups | grep docker
docker ps

Security note: Members of the docker group have effective root access to the host system. For production environments, consider rootless Docker instead:

dockerd-rootless-setuptool.sh install
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock

CI/CD environments where the runner user is not in the docker group:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
  --group-add $(stat -c '%g' /var/run/docker.sock) \
  your-image

Step 3: Diagnose 502, 504, and Connection Refused Errors

These errors appear when a reverse proxy (nginx, Traefik, Caddy) cannot reach the upstream container. The proxy returns 502 Bad Gateway when the container is down and 504 Gateway Timeout when the container is alive but responding too slowly.

Check all container states:

docker ps -a

Containers with status Exited or Restarting are your culprits. Read their logs:

docker logs --tail=200 <container_name>
docker logs --since=30m <container_name>

Inspect the exit code and OOM status:

docker inspect <container_id> \
  --format='ExitCode: {{.State.ExitCode}} | OOMKilled: {{.State.OOMKilled}} | Error: {{.State.Error}}'

Verify port bindings are correct:

docker port <container_id>
ss -tlnp | grep <expected_port>

Test the application from inside the container:

docker exec -it <container_id> curl -v http://localhost:<internal_port>/health

For 504 timeouts, check whether the app is deadlocked or CPU-starved:

docker exec <container_id> ps aux
docker exec <container_id> top -b -n1 | head -20

Step 4: Fix OOM and Out of Memory Errors

Exit code 137 nearly always means OOM kill. Confirm definitively:

docker inspect <container_id> --format='{{.State.OOMKilled}}'
# Returns: true
dmesg | grep -iE 'out of memory|oom|killed process'

Set a container memory limit at run time:

docker run -m 2g --memory-swap 2g your-image

Setting --memory-swap equal to -m disables swap for that container. Set it larger to permit swap.

In docker-compose.yml:

services:
  app:
    image: your-image
    mem_limit: 2g
    memswap_limit: 2g

Update a running container without restarting:

docker update --memory 2g --memory-swap 2g <container_name>

Check host memory availability:

free -h
cat /proc/meminfo | grep -E 'MemAvailable|SwapFree'

If the host itself is under memory pressure, add a swap file:

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Step 5: Fix Docker Disk Full and No Space Left on Device

Docker accumulates data aggressively: image layers, stopped container filesystems, anonymous volumes, and build cache. First, understand the breakdown:

docker system df
docker system df -v
df -h /var/lib/docker

Incremental cleanup (safe):

docker container prune    # remove stopped containers
docker image prune        # remove dangling images only
docker image prune -a     # remove ALL unused images
docker volume prune       # remove unused named volumes
docker builder prune      # remove build cache

Full cleanup (removes all unused resources):

docker system prune -a --volumes

Move Docker data to a larger disk (permanent fix):

Stop Docker: sudo systemctl stop docker
Copy data to new location: sudo rsync -aP /var/lib/docker/ /mnt/large-disk/docker/
Update /etc/docker/daemon.json:

{
  "data-root": "/mnt/large-disk/docker"
}

Start Docker: sudo systemctl start docker
Verify: docker info | grep 'Docker Root Dir'

Automate cleanup with cron to prevent recurrence:

# Add to root's crontab (sudo crontab -e):
0 3 * * 0 docker system prune -f --filter 'until=168h' >> /var/log/docker-cleanup.log 2>&1

Enable log rotation in /etc/docker/daemon.json to prevent logs from filling disk:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Step 6: Fix Docker High CPU Usage

Identify the offending container:

docker stats --no-stream

Apply CPU limits:

# Limit to 1.5 cores at run time
docker run --cpus='1.5' your-image

# Update a running container without restart
docker update --cpus='1.5' <container_name>

In docker-compose (v3 Swarm syntax):

services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1.50'

Profile the process inside the container:

docker exec -it <container_id> top -b -n3
docker exec -it <container_id> sh -c 'ps aux --sort=-%cpu | head -20'

Step 7: Analyze Container Crashes and Core Dumps

When a container crashes with a segmentation fault or generates a core dump:

# Check dmesg for segfaults or signal 11
dmesg | grep -E 'segfault|core dumped|signal 11'

# Get crash context from journald
sudo journalctl -u docker --since '1 hour ago' | grep -iE 'fatal|panic|segfault|core'

# Read the crash log from the container
docker logs --tail=200 <container_id>

# Enable core dumps and ptrace for deep debugging
docker run --ulimit core=-1 \
  --cap-add SYS_PTRACE \
  -v /tmp/cores:/cores \
  your-image

Identify the crash log location inside the container:

docker exec <container_id> ls /var/crash/ 2>/dev/null || echo 'No /var/crash directory'
docker exec <container_id> find /var/log -name '*.log' -newer /proc/1 2>/dev/null | head -10

Step 8: Fix Slow Docker Performance

Enable BuildKit for significantly faster image builds:

DOCKER_BUILDKIT=1 docker build -t myapp .

Or enable it permanently in /etc/docker/daemon.json:

{"features": {"buildkit": true}}

Reduce build context with .dockerignore:

node_modules
.git
*.log
dist
__pycache__
.pytest_cache

Fix DNS resolution slowness (often causes 2+ second delays on every network call):

# Test DNS inside a container
docker run --rm busybox nslookup google.com

# If slow, override DNS in /etc/docker/daemon.json:
# {"dns": ["8.8.8.8", "8.8.4.4"]}
# Then: sudo systemctl restart docker

Verify the storage driver is overlay2 (not the slow devicemapper loop mode):

docker info | grep 'Storage Driver'
# Should show: Storage Driver: overlay2

To switch to overlay2, add to /etc/docker/daemon.json:

{"storage-driver": "overlay2"}

Then restart Docker. Note: this does NOT migrate existing images.

Step 9: Emergency Docker Recovery

If Docker is completely non-functional:

# 1. Gracefully stop all containers
docker stop $(docker ps -q) 2>/dev/null || true

# 2. Stop the daemon
sudo systemctl stop docker

# 3. Validate config
sudo dockerd --validate --config-file /etc/docker/daemon.json

# 4. If config is corrupt, reset it
echo '{}' | sudo tee /etc/docker/daemon.json

# 5. Restart
sudo systemctl start docker

# LAST RESORT: Full reset — loses all containers, images, and volumes
sudo systemctl stop docker
sudo rm -rf /var/lib/docker/*
sudo systemctl start docker

Frequently Asked Questions

bash

#!/usr/bin/env bash
# Docker Comprehensive Diagnostics Script
# Usage: bash docker-diag.sh 2>&1 | tee /tmp/docker-diag.log

set -uo pipefail

HR='================================================================='

echo "$HR"
echo "DOCKER DIAGNOSTICS — $(date)"
echo "$HR"

echo ""
echo "--- 1. Daemon Status ---"
systemctl is-active docker 2>/dev/null && echo "Daemon: RUNNING" || echo "Daemon: STOPPED"
systemctl is-enabled docker 2>/dev/null | xargs -I{} echo "Enabled: {}"

echo ""
echo "--- 2. Docker Version ---"
docker version 2>/dev/null | head -8 || echo "ERROR: Cannot connect to daemon"

echo ""
echo "--- 3. User Group Check ---"
groups | grep -q docker \
  && echo "OK: Current user is in the docker group" \
  || echo "WARNING: Not in docker group. Fix: sudo usermod -aG docker $USER && newgrp docker"

echo ""
echo "--- 4. Docker Socket Permissions ---"
ls -la /var/run/docker.sock 2>/dev/null || echo "WARNING: docker.sock not found"

echo ""
echo "--- 5. Disk Usage ---"
df -h /var/lib/docker 2>/dev/null || df -h / 2>/dev/null
echo ""
docker system df 2>/dev/null || echo "(cannot reach daemon)"

echo ""
echo "--- 6. All Containers (running + stopped) ---"
docker ps -a 2>/dev/null || echo "(cannot reach daemon)"

echo ""
echo "--- 7. Container Resource Usage ---"
docker stats --no-stream 2>/dev/null || echo "(cannot reach daemon)"

echo ""
echo "--- 8. OOM Events (dmesg) ---"
dmesg 2>/dev/null | grep -iE 'out of memory|oom_kill|killed process' | tail -20 \
  || echo "No OOM events found (or dmesg requires root)"

echo ""
echo "--- 9. Daemon Logs (last hour) ---"
sudo journalctl -u docker --no-pager --since '1 hour ago' 2>/dev/null | tail -40 \
  || echo "Cannot read journald (try: sudo journalctl -u docker)"

echo ""
echo "--- 10. Daemon Config ---"
if [ -f /etc/docker/daemon.json ]; then
  echo "Contents of /etc/docker/daemon.json:"
  cat /etc/docker/daemon.json
else
  echo "No /etc/docker/daemon.json found (using all defaults)"
fi

echo ""
echo "--- 11. Storage Driver and Root Dir ---"
docker info 2>/dev/null | grep -E 'Storage Driver|Docker Root Dir|Logging Driver|Cgroup Driver' \
  || echo "(cannot reach daemon)"

echo ""
echo "--- 12. Crash Indicators (segfault / core dump) ---"
dmesg 2>/dev/null | grep -iE 'segfault|signal 11|core dumped' | tail -10 \
  || echo "No segfaults found in dmesg"

echo ""
echo "$HR"
echo "Diagnostics complete. Review warnings above."
echo "Full output saved to: /tmp/docker-diag.log (if redirected)"
echo "$HR"

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps engineers and SREs with combined decades of experience managing containerized workloads on Linux in production. We specialize in Docker, Kubernetes, and cloud-native infrastructure troubleshooting — translating real incident postmortems into actionable, command-first guides.

Sources

Explore More Linux Sysadmin Guides

Apache Crash & Not Working: Complete Troubleshooting Guide (Connection Refused, OOM, Core Dump)

Fix Apache crashes, connection refused errors, and OOM kills in minutes. Step-by-step diagnostic commands, MPM tuning, and core dump analysis included.

AWS Redis Connection Refused: Troubleshooting ECONNREFUSED and tcp 127.0.0.1:6379

Fix 'Redis connection refused' errors in AWS, Kubernetes, Laravel, and WSL. Learn how to diagnose binding issues, security groups, and network configurations.

Comprehensive Guide to Fixing Nginx 502 Bad Gateway, 504 Timeouts, and Core Crashes

Diagnose and resolve Nginx 502 Bad Gateway, 504 Timeouts, connection refused errors, out-of-memory crashes, and permission denied issues with this SRE guide.