Why does my service show 'Failed with result oom-kill' even though the host still has free RAM?

This means the service hit its per-cgroup MemoryMax ceiling, not the host's physical memory limit. systemd enforces whatever value is set in the unit file (or inherited from a slice like system.slice). Run `systemctl show myapp.service --property=MemoryMax,MemoryCurrent` to compare the configured limit against live usage. If MemoryMax is lower than what the workload needs, raise it with `systemctl edit myapp.service` and set MemoryMax to an appropriate value, then reload with `systemctl daemon-reload`.

What is the difference between the kernel OOM killer and systemd-oomd, and how do I know which one fired?

The kernel OOM killer activates only after a memory allocation actually fails—it is reactive, can introduce kernel-level latency spikes, and may kill processes holding file or database locks without warning. systemd-oomd is proactive: it polls PSI (Pressure Stall Information) counters and kills cgroups when memory pressure sustains above a threshold, resulting in cleaner SIGKILL delivery through systemd. Identify the source by running `journalctl -k | grep 'Out of memory'` for kernel kills versus `journalctl -u systemd-oomd` for oomd-initiated kills. The systemd journal entry for oomd reads: 'Killed /system.slice/myapp.service due to memory pressure'.

My service logs 'permission denied' but the file is owned by the correct user—what else could cause this?

Three additional enforcement layers exist beyond POSIX ownership: (1) systemd sandboxing—check `systemctl show myapp.service | grep ProtectSystem` for values like 'strict' or 'full', which mount the filesystem read-only system-wide regardless of file ownership; fix by adding `ReadWritePaths=/your/path` in a drop-in. (2) SELinux or AppArmor MAC policies—inspect with `ausearch -m avc -ts recent` or `journalctl | grep apparmor=DENIED`; correct labels with `restorecon -Rv /path`. (3) PrivateTmp=yes—creates an isolated /tmp namespace, so any hardcoded /tmp paths point to a service-private directory invisible to the host.

How do I find which process inside a service is consuming excessive CPU?

Use `systemd-cgtop -d 1` for a live per-cgroup CPU breakdown. For process-level detail within a cgroup, read `cat /sys/fs/cgroup/system.slice/myapp.service/cpu.stat` or run `top` and filter by the PIDs listed under `systemctl status myapp.service`. If the CPU spike is not from the process itself but from a restart loop, check the restart count with `systemctl show myapp.service --property=NRestarts` and count journal events with `journalctl -u myapp.service --since '10 minutes ago' | grep -c Failed`.

Why does `journalctl -u myapp.service` show no output even though the service is clearly failing?

This happens when the service fails before its process writes to the journal, or when stdout/stderr are not connected to journal transport. Verify with `systemctl show myapp.service --property=StandardOutput,StandardError`—both should be 'journal' or 'journal+console'. If they show 'null' or 'inherit', add `StandardOutput=journal` and `StandardError=journal` to the [Service] section and reload. Also check whether a custom `SyslogIdentifier=` is set; if so, query with `journalctl -t myidentifier` instead of `-u`.

systemd OOM Killed, Failed, High CPU & Permission Denied: Complete Troubleshooting Guide

Fix systemd OOM kills, failed services, high CPU, and permission denied errors with journalctl diagnostics, MemoryMax tuning, and cgroup fixes. Step-by-step com

Last updated: February 23, 2026

Last verified: February 23, 2026

2,213 words

Key Takeaways

OOM kills (result 'oom-kill') occur when a service exceeds its per-cgroup MemoryMax ceiling or when systemd-oomd proactively terminates processes under sustained memory pressure—distinct from the host running out of RAM
Failed services ('systemd[1]: myapp.service: Failed with result exit-code') stem from ExecStart path errors, missing runtime dependencies, sandboxing directives blocking filesystem access, or insufficient POSIX/SELinux/AppArmor permissions
Quick fix path: run 'journalctl -xe -u SERVICE' to pinpoint the failure type, then apply the matching remedy—raise MemoryMax, fix file ownership, add ReadWritePaths, set OOMPolicy=continue, or adjust CPUQuota and StartLimitBurst

Fix Approaches Compared
Method	When to Use	Time	Risk
Increase MemoryMax via drop-in override	Service OOM-killed; workload legitimately needs more RAM than the current limit	5 min	Low — a higher ceiling still enforces limits
Set OOMScoreAdjust=-900 in unit file	Critical service must survive kernel-level OOM events on a busy host	2 min	Medium — can destabilize host if RAM is fully exhausted
Tune /etc/systemd/oomd.conf thresholds	systemd-oomd kills services under normal load; pressure thresholds too aggressive	10 min	Low — raises the bar before proactive kills fire
Add ReadWritePaths= to unit file	ProtectSystem=strict or ProtectSystem=full blocks writes to a required directory	5 min	Low — minimal sandbox relaxation scoped to one path
restorecon / semanage fcontext (SELinux)	AVC denial in audit log causes permission denied despite correct POSIX ownership	10 min	Low — restores intended MAC policy label
CPUQuota= plus StartLimitBurst=	Service restart-loops or spikes CPU, threatening host stability	5 min	Low — throttles CPU and limits restart velocity

Understanding systemd OOM and Service Failures

systemd places every service inside a cgroup (control group), giving the kernel granular visibility into per-service resource consumption. When a service's anonymous memory exceeds its configured MemoryMax= ceiling, or when host-level memory pressure becomes critical, one of two OOM killers fires:

Kernel OOM killer — activated only after a memory allocation fails. It scores all processes and terminates the highest-scoring one. Reactions are abrupt and can cause data corruption if the target holds open file descriptors or database locks.
systemd-oomd (systemd ≥ v246) — a userspace daemon that monitors PSI (Pressure Stall Information) metrics and proactively kills cgroups before the kernel must act. Its kills are cleaner but can be over-aggressive with default thresholds.

Exact Error Messages to Look For

In journalctl, kernel OOM events produce:

kernel: Out of memory: Killed process 14321 (java) total-vm:8192000kB, anon-rss:6291456kB, file-rss:0kB
systemd[1]: myapp.service: Main process exited, code=killed, status=9/KILL
systemd[1]: myapp.service: Failed with result 'oom-kill'.
systemd[1]: Failed to start My Application Service.

systemd-oomd kills look like:

systemd-oomd[812]: Killed /system.slice/myapp.service due to memory pressure for /system.slice/myapp.service being 70.12% > 60.00% for > 20s with reclaim activity
systemd[1]: myapp.service: Consumed 14.231s CPU time.
systemd[1]: myapp.service: Failed with result 'oom-kill'.

Permission denied service failures:

myapp[14400]: open /var/lib/myapp/data/records.db: permission denied
systemd[1]: myapp.service: Control process exited, code=exited, status=1/FAILURE
systemd[1]: myapp.service: Failed with result 'exit-code'.

Missing executable:

systemd[1]: myapp.service: Executable not found
systemd[1]: myapp.service: Failed at step EXEC spawning /usr/local/bin/myapp: No such file or directory
systemd[1]: Failed to start My Application Service.

Step 1: First-Pass Diagnosis

Always read the journal before touching any configuration:

# Full status with recent log lines
systemctl status myapp.service

# Last 200 lines of service journal
journalctl -u myapp.service -n 200 --no-pager

# Kernel ring buffer for OOM events since last boot
journalctl -k -b | grep -iE 'out of memory|oom|killed process'

# All errors and above since last boot
journalctl -b -p err..emerg --no-pager

# systemd-oomd kills in the last hour
journalctl -u systemd-oomd --since '1 hour ago'

# All failed units on the system
systemctl list-units --state=failed

Step 2: Diagnose and Fix OOM Kills

Identify current memory usage per cgroup:

# Real-time cgroup resource view
systemd-cgtop -d 1 -n 5

# Current limit and usage for a specific service
systemctl show myapp.service --property=MemoryMax,MemoryHigh,MemoryCurrent

# Kernel-level cgroup memory counters
cat /sys/fs/cgroup/system.slice/myapp.service/memory.current
cat /sys/fs/cgroup/system.slice/myapp.service/memory.max

Increase the memory ceiling with a drop-in override (preferred over editing the unit file directly):

systemctl edit myapp.service

In the editor, add:

[Service]
# Hard ceiling — processes OOM-killed if exceeded
MemoryMax=4G
# Soft ceiling — kernel reclaims memory aggressively above this value
MemoryHigh=3G
# Continue running after an OOM kill (default is 'stop')
OOMPolicy=continue

Apply the change:

systemctl daemon-reload
systemctl restart myapp.service

Protect a critical service from the kernel OOM killer:

The kernel assigns each process an oom_score. Lower scores survive longer. -1000 means never kill:

[Service]
# Range: -1000 (immune) to 1000 (kill first). Use -900 for critical services.
OOMScoreAdjust=-900

Tune systemd-oomd to reduce false-positive kills:

Edit /etc/systemd/oomd.conf:

[OOM]
# Intervene only when swap is 90%+ consumed
SwapUsedLimit=90%
# Memory pressure percentage threshold before considering a kill
DefaultMemoryPressureLimit=80%
# Sustained pressure window before a kill is issued
DefaultMemoryPressureDurationSec=30s

Restart oomd: systemctl restart systemd-oomd

Add swap space to extend available memory:

free -h && swapon --show
fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

Step 3: Fix Permission Denied Errors

Permission denied in systemd services originates from three distinct enforcement layers. You must check all three.

Layer 1 — POSIX filesystem permissions:

# Determine which user the service runs as
systemctl show myapp.service --property=User,Group

# Trace the full path for permission bits
namei -l /var/lib/myapp/data

# Correct ownership
chown -R myapp:myapp /var/lib/myapp/
chmod 750 /var/lib/myapp/

Layer 2 — systemd sandboxing directives:

Modern hardened unit files mount the filesystem read-only, silently overriding POSIX permissions:

systemctl show myapp.service | grep -E 'ProtectSystem|ProtectHome|ReadOnly|PrivateTmp|InaccessiblePaths|NoNewPrivileges'

If ProtectSystem=strict or ProtectSystem=full is set, all paths are read-only by default. Fix with a drop-in:

[Service]
ProtectSystem=strict
# Explicitly carve out writable paths
ReadWritePaths=/var/lib/myapp /run/myapp

Note: PrivateTmp=yes creates an isolated /tmp namespace, so paths your code hardcodes under /tmp will not be the same directory the host sees.

Layer 3 — SELinux or AppArmor MAC policies:

For SELinux (RHEL, Fedora, CentOS Stream):

# Find AVC denials from the last 5 minutes
ausearch -m avc -ts recent 2>/dev/null | tail -30
# Also visible in the journal
journalctl | grep 'avc: denied' | tail -20
# Restore the correct file context
restorecon -Rv /var/lib/myapp/
# If restorecon does not resolve it, create a targeted policy module
audit2allow -a -M myapp_local
semodule -i myapp_local.pp

For AppArmor (Ubuntu, Debian):

aa-status
journalctl | grep 'apparmor="DENIED"' | tail -20
# Put profile in complain mode to stop blocking without disabling logging
aa-complain /etc/apparmor.d/usr.bin.myapp

Step 4: Investigate and Throttle High CPU

Identify the consuming service and process:

# Live cgroup CPU view, 1-second refresh
systemd-cgtop -d 1

# Kernel cgroup CPU accounting
cat /sys/fs/cgroup/system.slice/myapp.service/cpu.stat

Detect restart-loop CPU amplification — a service crashing and restarting hundreds of times per hour will saturate CPU even if the process itself is lightweight:

# Count start/stop events in the last 10 minutes
journalctl -u myapp.service --since '10 minutes ago' | grep -cE 'Started|Stopped|Failed'

# Check cumulative restart count
systemctl show myapp.service --property=NRestarts,ActiveEnterTimestamp

Apply CPU quota and restart rate limiting in a drop-in:

[Service]
# Hard CPU quota: 50% of one logical core
CPUQuota=50%
# Scheduling weight relative to other services (default 100)
CPUWeight=50

# Stop restarting after 3 failures within any 5-minute window
Restart=on-failure
RestartSec=30s
StartLimitIntervalSec=300
StartLimitBurst=3

Step 5: Capture and Analyze Core Dumps

By default, systemd routes core dumps through systemd-coredump:

# List recorded dumps
coredumpctl list

# Detail for the most recent crash
coredumpctl info

# Extract the core file for analysis
coredumpctl dump -o /tmp/myapp.core -- /usr/bin/myapp

# Open in GDB
gdb /usr/bin/myapp /tmp/myapp.core
(gdb) bt full
(gdb) info registers

If no dumps appear, check /etc/systemd/coredump.conf:

[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=8G
ExternalSizeMax=8G
KeepFree=1G

Step 6: Fix Services That Will Not Start

If systemctl start myapp.service immediately fails or stalls:

# Validate the unit file syntax before applying any changes
systemd-analyze verify /etc/systemd/system/myapp.service

# Test the ExecStart command manually under the service user
sudo -u myapp /usr/bin/myapp --config /etc/myapp/config.yaml

# Check ordering dependencies and identify circular requirements
systemctl list-dependencies myapp.service
systemd-analyze critical-chain myapp.service

# Trace startup timing
systemd-analyze blame | head -20

If the service times out before reporting ready, it likely starts slowly or uses the wrong Type=:

[Service]
# Default is 90s; increase for slow-starting JVM or Python apps
TimeoutStartSec=300
# Use Type=simple unless the process explicitly calls sd_notify()
Type=simple

After every unit file change, reload and verify:

systemctl daemon-reload
systemctl restart myapp.service
systemctl status myapp.service
journalctl -u myapp.service -n 30 --no-pager

Frequently Asked Questions

bash

#!/usr/bin/env bash
# systemd Service Diagnostic Script
# Usage: sudo bash systemd-diagnose.sh myapp.service

SVC=${1:-myapp.service}
DIVIDER='------------------------------------------------------------'

echo "=== systemd Diagnostic Report: $SVC ==="
date
echo

echo '--- Unit Status ---'
systemctl status "$SVC" 2>&1
echo "$DIVIDER"

echo '--- Recent Journal (last 60 lines) ---'
journalctl -u "$SVC" -n 60 --no-pager 2>&1
echo "$DIVIDER"

echo '--- Resource Limits and Security Properties ---'
systemctl show "$SVC" \
  --property=MemoryMax,MemoryHigh,MemoryCurrent,MemorySwapMax \
  --property=CPUQuota,CPUWeight,OOMPolicy,OOMScoreAdjust \
  --property=NRestarts,User,Group,ProtectSystem,PrivateTmp \
  --property=ReadWritePaths,InaccessiblePaths,NoNewPrivileges 2>&1
echo "$DIVIDER"

echo '--- Kernel OOM Events (current boot) ---'
journalctl -k -b | grep -iE 'out of memory|oom kill|killed process' | tail -20
echo "$DIVIDER"

echo '--- systemd-oomd Events (last 24 hours) ---'
journalctl -u systemd-oomd --since '24 hours ago' --no-pager 2>&1 | tail -20
echo "$DIVIDER"

echo '--- All Failed Units ---'
systemctl list-units --state=failed
echo "$DIVIDER"

echo '--- cgroup Memory Counters ---'
CGPATH="/sys/fs/cgroup/system.slice/${SVC}"
if [ -d "$CGPATH" ]; then
  echo "memory.current : $(cat ${CGPATH}/memory.current 2>/dev/null || echo N/A)"
  echo "memory.max     : $(cat ${CGPATH}/memory.max 2>/dev/null || echo N/A)"
  echo "memory.high    : $(cat ${CGPATH}/memory.high 2>/dev/null || echo N/A)"
  echo "cpu.stat       :"
  cat "${CGPATH}/cpu.stat" 2>/dev/null || echo 'N/A'
else
  echo "cgroup path not found: $CGPATH (service may not be running)"
fi
echo "$DIVIDER"

echo '--- Unit File Syntax Validation ---'
systemd-analyze verify "$SVC" 2>&1
echo "$DIVIDER"

echo '--- Dependency Chain ---'
systemctl list-dependencies "$SVC" --no-pager 2>&1
echo "$DIVIDER"

echo '--- SELinux AVC Denials (last 5 minutes) ---'
if command -v ausearch >/dev/null 2>&1; then
  ausearch -m avc -ts recent 2>/dev/null | tail -20
else
  echo 'ausearch not available; check AppArmor:'
  journalctl | grep 'apparmor="DENIED"' | tail -20
fi
echo
echo '=== Diagnostic Complete ==='

Error Medic Editorial

Error Medic Editorial is a team of senior Linux engineers and site reliability engineers with extensive experience operating large-scale infrastructure on RHEL, Ubuntu, Debian, and Arch Linux. Our guides prioritize exact error messages, real diagnostic commands, and root-cause analysis over surface-level workarounds.

Sources

Explore More Linux Sysadmin Guides

Apache Crash & Not Working: Complete Troubleshooting Guide (Connection Refused, OOM, Core Dump)

Fix Apache crashes, connection refused errors, and OOM kills in minutes. Step-by-step diagnostic commands, MPM tuning, and core dump analysis included.

AWS Redis Connection Refused: Troubleshooting ECONNREFUSED and tcp 127.0.0.1:6379

Fix 'Redis connection refused' errors in AWS, Kubernetes, Laravel, and WSL. Learn how to diagnose binding issues, security groups, and network configurations.

Comprehensive Guide to Fixing Nginx 502 Bad Gateway, 504 Timeouts, and Core Crashes

Diagnose and resolve Nginx 502 Bad Gateway, 504 Timeouts, connection refused errors, out-of-memory crashes, and permission denied issues with this SRE guide.