What is the difference between ErrImagePull and ImagePullBackOff?

ErrImagePull is the immediate status when the kubelet first fails to download a container image. When the kubelet continuously fails to pull the image upon subsequent retries, it implements an exponential delay between attempts. During this delay period, the pod status is reported as ImagePullBackOff.

How do I fix a 'kubernetes permission denied' error causing a CrashLoopBackOff?

This usually means the container process lacks the filesystem permissions to read/write specific paths, or it is trying to bind to a privileged port without root access. You may need to adjust the container's SecurityContext to run as a specific user (runAsUser), modify the permissions of mounted volumes (fsGroup), or grant specific capabilities (like CAP_NET_BIND_SERVICE).

Why does my pod show OOMKilled when the node has plenty of free memory?

OOMKilled is triggered when a container exceeds its own defined memory limit within the pod specification (`resources.limits.memory`), regardless of how much total physical memory is available on the underlying worker node. You must increase the pod's limit or optimize the application's memory usage.

How can I diagnose a 'kubernetes connection refused' error when pulling an image?

This indicates the node cannot establish a TCP connection to the container registry. Verify that the registry URL is correct, the registry service is up and running, and there are no network policies, security groups, or firewalls blocking outbound traffic from your worker nodes to the registry's IP address on the required port (usually 443).

What should I do if I encounter a 'kubernetes certificate expired' error for my registry?

This error means the SSL certificate presented by the container registry is either expired or signed by an untrusted Certificate Authority. You need to either renew the certificate on the registry server or, if it's a private CA, ensure that the CA certificate is installed and trusted in the trust store of the operating system on all your Kubernetes worker nodes.

Resolving Kubernetes ImagePullBackOff, CrashLoopBackOff, and OOMKilled Errors

A comprehensive guide to diagnosing and fixing critical Kubernetes pod failures, including ImagePullBackOff, OOMKilled, CrashLoopBackOff, and network errors.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,590 words

Key Takeaways

ImagePullBackOff usually stems from incorrect image names, missing tags, or missing authentication secrets for private registries.
CrashLoopBackOff indicates your container starts but exits prematurely; application logs are the primary diagnostic tool.
OOMKilled means the container exceeded its memory limit; you must either optimize application memory usage or increase the limit.
Network-related errors like 'connection refused' or 'timeout' often indicate node-level egress issues, firewall rules blocking access to the registry, or DNS resolution failures.
Always start troubleshooting with 'kubectl describe pod' to review the event log, which provides the exact reason for the failure.

Diagnostic Approaches Compared
Method	When to Use	Time	Risk
kubectl describe pod	Initial diagnosis for state issues like ImagePullBackOff or OOMKilled	Fast (< 1 min)	None (Read-only)
kubectl logs	Investigating application-level crashes (CrashLoopBackOff)	Fast (< 2 mins)	None (Read-only)
Adjusting Resource Limits	Fixing frequent OOMKilled errors	Medium (Requires redeploy)	Low (May impact node capacity)
Updating imagePullSecrets	Fixing authentication issues with private registries	Medium (Requires secret update and pod restart)	Low

Understanding Kubernetes Pod Errors

When deploying applications to Kubernetes, pod lifecycle errors are inevitable. A pod might fail to start, continuously restart, or abruptly terminate. Understanding the mechanics behind errors like ImagePullBackOff, CrashLoopBackOff, and OOMKilled is essential for maintaining high availability. This guide dives deep into these common states, exploring their root causes and providing actionable resolution steps.

The ImagePullBackOff and ErrImagePull States

The deployment process begins with the kubelet attempting to pull the specified container image from a registry. If this fails, Kubernetes transitions the pod to the ErrImagePull state. After subsequent failed retries, the delay between attempts increases exponentially (the 'backoff'), resulting in the ImagePullBackOff state.

Root Causes:

Typographical Errors: The most frequent cause is a simple typo in the image repository name or the tag. If the registry cannot locate my-app:v1.0.1 because the actual tag is v1.0.2, the pull will fail.
Authentication Failures: Private registries require credentials. If the imagePullSecrets are missing from the pod specification, or if the secret contains invalid/expired credentials, the registry will return an unauthorized error.
Network and TLS Issues: The Kubernetes node must be able to reach the container registry over the network. Errors like kubernetes connection refused or kubernetes timeout point to firewall rules blocking outbound traffic on port 443, or DNS resolution failures on the node. Furthermore, a kubernetes certificate expired error indicates that the registry's SSL certificate is invalid, or the node does not trust the Certificate Authority that signed it.

Diagnostic Steps: The primary tool here is the describe command. Running kubectl describe pod <pod-name> will reveal the specific error in the Events section. Look for messages like Failed to pull image... rpc error: code = Unknown desc = Error response from daemon: pull access denied.

Deciphering CrashLoopBackOff

A CrashLoopBackOff indicates that Kubernetes successfully pulled the image and started the container, but the main process inside the container immediately crashed or exited. Kubernetes then attempts to restart the container, leading to a loop of crashes and restarts.

Root Causes:

Application Bugs: Unhandled exceptions or fatal errors in the application code during startup.
Configuration Errors: Missing required environment variables, incorrectly mounted ConfigMaps, or malformed configuration files.
Permissions Issues: The application might be trying to write to a read-only filesystem or bind to a privileged port (under 1024) without the necessary SecurityContext capabilities, resulting in a kubernetes permission denied error.
Liveness Probe Failures: If a liveness probe is configured aggressively and the application takes too long to initialize, Kubernetes might kill the container before it's ready, triggering a restart loop.

Diagnostic Steps: To understand why the application is crashing, you must inspect its output. Use kubectl logs <pod-name>. If the container is currently in a backoff state and not running, use the --previous flag (kubectl logs <pod-name> --previous) to view the logs from the last failed execution.

Resolving OOMKilled (Out of Memory)

An OOMKilled status (kubernetes oom killed or kubernetes out of memory) means the container's processes consumed more memory than the limit allocated to it in the pod specification. When this threshold is breached, the Linux kernel's Out-Of-Memory killer terminates the container process to protect the stability of the node.

Root Causes:

Inadequate Memory Limits: The configured memory limit in the deployment YAML is simply too low for the application's normal baseline operation or peak load requirements.
Memory Leaks: The application code contains a memory leak, causing its footprint to grow continuously over time until it inevitably hits the limit.
Spike in Workload: A sudden influx of requests or a resource-intensive background job causes a temporary but fatal spike in memory consumption.

Diagnostic Steps: Running kubectl describe pod <pod-name> will show the Last State of the container as Terminated with the Reason: OOMKilled. To confirm if the issue is a sudden spike or a slow leak, you should monitor the pod's memory usage over time using tools like Prometheus and Grafana, or basic metrics via kubectl top pod <pod-name>.

Step-by-Step Fixes

Fixing Image Pull Issues

Verify the Image: Manually check your container registry (e.g., Docker Hub, AWS ECR, GCP GCR) to confirm the exact spelling of the image repository and the existence of the specific tag.
Validate Secrets: If using a private registry, ensure a kubernetes.io/dockerconfigjson secret exists in the same namespace as the pod. Verify its contents by decoding the base64 string. Ensure the pod spec references it correctly under imagePullSecrets.
Check Node Connectivity: If you suspect network timeouts or connection refused errors, SSH into one of the Kubernetes worker nodes and attempt to manually pull the image using docker pull or crictl pull to isolate node-level network issues from Kubernetes configuration issues.

Fixing Crash Loops

Analyze the Stack Trace: The output of kubectl logs is your source of truth. Look for stack traces or explicit error messages from your application framework.
Review Configuration: Cross-reference the environment variables expected by your application with those provided in the deployment YAML, ConfigMaps, and Secrets.
Test Locally: Attempt to run the exact same container image locally using Docker with the same environment variables to reproduce the crash outside the Kubernetes environment.

Mitigating OOMKilled

Increase Limits: If the application legitimately requires more memory, increase the resources.limits.memory in your deployment specification. Ensure you also adjust resources.requests.memory appropriately.
Profile the Application: If raising the limit only delays the inevitable crash, your application likely has a memory leak. Use language-specific profiling tools (e.g., pprof for Go, VisualVM for Java, memory profilers for Node.js/Python) to identify the source of the leak and patch the code.

Frequently Asked Questions

bash

# 1. Initial investigation: Identify pods in a failed state
kubectl get pods -n <namespace>

# 2. Diagnose ImagePullBackOff, OOMKilled, or scheduling issues:
# Scroll to the 'Events' section at the bottom of the output.
kubectl describe pod <pod-name> -n <namespace>

# 3. Diagnose CrashLoopBackOff: View the application logs.
kubectl logs <pod-name> -n <namespace>

# If the pod is currently crashing, view the logs of the previous instantiation.
kubectl logs <pod-name> -n <namespace> --previous

# 4. Check resource utilization to anticipate OOMKilled errors (requires metrics-server).
kubectl top pod <pod-name> -n <namespace>

# 5. Fix missing image pull secrets: Create the secret for a private registry.
kubectl create secret docker-registry private-reg-cred \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=my-user \
  --docker-password=my-password \
  --docker-email=my-email@example.com -n <namespace>

# Then, patch the service account or deployment to use this secret:
# kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "private-reg-cred"}]}' -n <namespace>

Error Medic Editorial

The Error Medic Editorial team consists of seasoned DevOps engineers and Site Reliability Experts dedicated to demystifying complex cloud-native challenges and providing practical, battle-tested solutions.

Sources

Explore More DevOps Config Guides

Ansible Failed: Fix Connection Refused, Permission Denied & Timeout Errors

Fix Ansible failures including connection refused, permission denied, and timeout errors. Step-by-step diagnosis with real commands and verified solutions.

ArgoCD 'connection refused' Error: Complete Troubleshooting Guide (2024)

Fix ArgoCD 'connection refused', CrashLoopBackOff, ImagePullBackOff, and timeout errors with step-by-step diagnostic commands and proven solutions.

ArgoCD Connection Refused: Fix CrashLoopBackOff, ImagePullBackOff, Permission Denied & Timeout Errors

Fix ArgoCD connection refused errors: diagnose CrashLoopBackOff, ImagePullBackOff, permission denied, and timeout with step-by-step kubectl commands and config

Understanding Kubernetes Pod Errors

The ImagePullBackOff and ErrImagePull States

Deciphering CrashLoopBackOff

Resolving OOMKilled (Out of Memory)

Step-by-Step Fixes

Fixing Image Pull Issues

Fixing Crash Loops

Mitigating OOMKilled

Frequently Asked Questions

What is the difference between ErrImagePull and ImagePullBackOff?

How do I fix a 'kubernetes permission denied' error causing a CrashLoopBackOff?

Why does my pod show OOMKilled when the node has plenty of free memory?

How can I diagnose a 'kubernetes connection refused' error when pulling an image?

What should I do if I encounter a 'kubernetes certificate expired' error for my registry?

Sources

Related Articles in Kubernetes

Explore More DevOps Config Guides