Error Medic

How to Fix ImagePullBackOff and Evicted Pods in Kubernetes

Comprehensive guide to troubleshooting and fixing Kubernetes ImagePullBackOff, ErrImagePull, and Evicted pod statuses. Learn root causes and permanent fixes.

Last updated:
Last verified:
1,349 words
Key Takeaways
  • ImagePullBackOff usually means the container image is missing, the tag is wrong, or registry authentication (ImagePullSecrets) is failing.
  • Evicted pods are typically caused by node resource pressure, most commonly exhausted memory or ephemeral storage.
  • Use 'kubectl describe pod <pod-name>' to identify the exact reason for ImagePullBackOff or Eviction.
  • The 'cluster-autoscaler.kubernetes.io/safe-to-evict' annotation controls whether the Cluster Autoscaler can evict a pod during node scale-down.
  • Clear evicted pods in bulk using 'kubectl delete pods --field-selector status.phase=Failed'.
Common Fix Approaches Compared
MethodWhen to UseTimeRisk
Verify Image Tag/NameWhen 'kubectl describe' shows 'NotFound' or 'manifest unknown'LowLow
Create ImagePullSecretWhen pulling from a private registry results in 'Unauthorized' or 'Access Denied'MediumLow
Increase Node Resources/RequestsWhen pods are Evicted due to Memory or Ephemeral Storage pressureHighMedium
Add safe-to-evict AnnotationWhen Cluster Autoscaler refuses to scale down a node due to local storage podsLowLow

Understanding ImagePullBackOff and Evicted Pods in Kubernetes

When managing a Kubernetes cluster, whether it's on Azure Kubernetes Service (AKS), Amazon EKS, Google GKE, or Docker Desktop, encountering pod lifecycle errors is inevitable. Two of the most common and disruptive statuses you will encounter are ImagePullBackOff (often preceded by ErrImagePull) and Evicted.

While they manifest differently, both indicate that Kubernetes cannot run your workload as requested. ImagePullBackOff is a failure at the container startup phase, whereas Evicted means a running pod was forcefully terminated by the kubelet to save the node from complete resource starvation.

Diagnosing ImagePullBackOff and ErrImagePull

The ImagePullBackOff status means that Kubernetes tried to pull the container image specified in your pod manifest, failed, and is now backing off (delaying) further attempts. The initial failure state is ErrImagePull.

Root Causes of ImagePullBackOff
  1. Typo in the Image Name or Tag: The most common cause. If you specify nginx:latestt instead of nginx:latest, the container runtime cannot find the manifest.
  2. Private Registry Authentication: If you are using a private registry (like Azure Container Registry or AWS ECR) and haven't provided the correct credentials via an ImagePullSecret, the registry will reject the pull request with an Unauthorized error.
  3. Network Constraints: The worker node might not have outbound internet access or DNS resolution to reach the container registry.
  4. Rate Limiting: Docker Hub and other public registries impose rate limits. If your cluster shares a single NAT gateway IP, you might be hitting the toomanyrequests error.
Step 1: Diagnose the Pull Failure

To find out exactly why the image pull is failing, describe the pod:

kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events section at the bottom. You will likely see something like:

Failed to pull image "myregistry.azurecr.io/my-app:v1": rpc error: code = Unknown desc = Error response from daemon: Get "https://myregistry.azurecr.io/v2/my-app/manifests/v1": unauthorized: authentication required

Step 2: Fix ImagePullBackOff
  • For Typos: Correct the deployment manifest and run kubectl apply -f deployment.yaml.
  • For Private Registries: Create a secret containing your Docker credentials:
kubectl create secret docker-registry my-registry-secret \
  --docker-server=myregistry.azurecr.io \
  --docker-username=<your-username> \
  --docker-password=<your-password> \
  --docker-email=<your-email>

Then, add imagePullSecrets to your Pod spec:

spec:
  containers:
  - name: my-app
    image: myregistry.azurecr.io/my-app:v1
  imagePullSecrets:
  - name: my-registry-secret

Understanding Pod Eviction in Kubernetes

An Evicted pod status means the kubelet on a worker node terminated the pod. This is not a crash; it's a deliberate action taken by the node to preserve its own stability.

The Kubernetes Eviction Policy

Kubernetes monitors node resources heavily. If a node starts running out of critical, incompressible resources (like memory or disk space), it triggers the eviction policy.

Common eviction triggers include:

  • MemoryPressure: The node is running out of RAM.
  • DiskPressure / Ephemeral Storage: The node's root filesystem or the container runtime's image filesystem is full. Pods writing large amounts of data to emptyDir volumes or their local container filesystem without requesting ephemeral-storage limits are prime culprits.
  • PIDPressure: Too many processes are running on the node.

When you see a pod stuck in the Evicted state, it leaves behind a tombstone record. The pod itself is dead, but the API object remains so you can inspect the eviction logs and status.

Step 1: Diagnose Pod Eviction

Describe the evicted pod to see the exact reason:

kubectl describe pod <evicted-pod-name>

Look at the Status and Message fields. You'll often see something like: Message: The node was low on resource: ephemeral-storage. Container my-app was using 50Gi, which exceeds its request of 0.

Step 2: Prevent Eviction

To prevent pods from getting evicted:

  1. Set Resource Requests and Limits: Always define requests and limits for CPU, memory, and importantly, ephemeral-storage.
  2. Optimize Logging: If an application logs excessively to stdout/stderr, those logs consume local disk space until log rotation occurs. Use log forwarding to offload them.
  3. Use Persistent Volumes: Don't use emptyDir for large datasets. Attach a PersistentVolumeClaim (PVC).

The Role of cluster-autoscaler.kubernetes.io/safe-to-evict

Sometimes you want pods to be evicted, specifically when the Cluster Autoscaler is trying to scale down an underutilized node. By default, the autoscaler will not evict certain pods, such as those using local storage (emptyDir). This prevents the node from scaling down.

If your pod uses emptyDir strictly for temporary, non-critical cache and you want the autoscaler to feel free to terminate it to save cloud costs, add the following annotation to your pod spec:

metadata:
  annotations:
    "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"

Conversely, if you have a critical pod that should never be randomly evicted during scale-down, you can set this to "false".

Cleaning Up Evicted Pods

Kubernetes does not automatically delete evicted pods immediately because it assumes you want to read their failure messages. Over time, these can clutter your dashboard and CLI output.

You can manually clean up evicted pods using a field selector. See the code block below for the exact command.

Frequently Asked Questions

bash
# 1. Diagnose ImagePullBackOff by checking pod events
kubectl describe pod <pod-name> -n <namespace>

# 2. Create an ImagePullSecret for a private registry
kubectl create secret docker-registry my-registry-key \
  --docker-server=your-registry.com \
  --docker-username=your-user \
  --docker-password=your-pwd \
  --docker-email=your-email@example.com

# 3. Find all Evicted pods across all namespaces
kubectl get pods --all-namespaces | grep Evicted

# 4. Clean up (delete) all Evicted pods in the current namespace
kubectl delete pods --field-selector status.phase=Failed

# 5. Clean up all Evicted pods in ALL namespaces
kubectl delete pods --all-namespaces --field-selector status.phase=Failed
E

Error Medic Editorial

Error Medic Editorial is a team of certified Kubernetes administrators and DevOps engineers dedicated to simplifying cloud-native troubleshooting and site reliability.

Sources

Related Articles in Kubernetes

Explore More DevOps Config Guides