Error Medic

Resolving "Helm Upgrade Failed": Stuck Releases, Immutable Fields, and Timeouts

Comprehensive guide to fixing 'Helm upgrade failed' errors. Learn how to resolve stuck operations, bypass immutable field restrictions, and fix timeout issues.

Last updated:
Last verified:
1,778 words
Key Takeaways
  • Root Cause 1: 'Another operation is in progress' occurs when a previous Helm operation crashes or is interrupted, leaving a stuck state secret.
  • Root Cause 2: 'Field is immutable' happens when attempting to modify locked Kubernetes resource specifications, such as StatefulSet volume claims.
  • Root Cause 3: 'Timed out waiting for the condition' is caused by pods failing to reach a ready state before the Helm wait threshold expires.
  • Quick Fix: Clear stuck releases by deleting the pending Helm secret or rollback; fix immutable fields by deleting the conflicting resource; resolve timeouts by inspecting pod events and readiness probes.
Helm Upgrade Failure Resolution Methods
MethodWhen to UseTime to ExecuteRisk Level
Helm RollbackWhen the new deployment configuration is fundamentally broken and you need to restore service.MinutesLow
Delete Stuck Helm SecretWhen encountering the 'another operation is in progress' error due to a crashed CI/CD pipeline.FastMedium
Targeted Resource DeletionWhen facing 'field is immutable' errors on specific resources like Jobs or StatefulSets.VariableMedium
Helm Upgrade --forceWhen resource definitions have changed drastically and standard 3-way merge patching fails.FastHigh

Understanding the "Helm Upgrade Failed" Error

When a helm upgrade command fails, it can leave your Kubernetes cluster in an inconsistent state, halt CI/CD pipelines, and cause significant deployment anxiety. Helm acts as a package manager for Kubernetes, using a complex process called a 3-way merge patch to compare the old manifest, the new manifest, and the live cluster state. When this process encounters a conflict, a timeout, or a corrupted state history, the upgrade fails.

In this comprehensive guide, we will break down the most common "Helm upgrade failed" scenarios, their root causes, and the exact steps to remediate them safely in a production environment.

Scenario 1: "Another operation (install/upgrade/rollback) is in progress"

The Symptom:

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

The Root Cause: Helm tracks the state of releases using Kubernetes Secrets (or ConfigMaps in older versions) stored in the same namespace as the release. When you initiate a helm upgrade, Helm immediately creates a new secret representing the target release version with a status of pending-upgrade.

If the Helm client is killed (e.g., your CI runner times out, network connection drops, or a user presses Ctrl+C), this secret remains stuck in the pending-upgrade state. The next time you try to upgrade, Helm sees this pending secret and assumes another user or pipeline is currently operating on the release.

The Fix: Manually Clearing the Stuck State

To resolve this, you must manipulate Helm's backend storage.

  1. Identify the stuck secret: Run the following command to find the secrets Helm uses to track your release: kubectl get secrets -n <your-namespace> -l owner=helm,name=<your-release-name>

  2. Look for the pending release: You will typically see a list of secrets like sh.helm.release.v1.my-app.v1, ...v2, ...v3. Look at the labels or decode the secret to find the one stuck in pending.

  3. Delete or modify the stuck secret: The most straightforward approach is to delete the stuck pending secret, reverting Helm's pointer back to the previous successful release. kubectl delete secret sh.helm.release.v1.<your-release-name>.v<stuck-revision> -n <your-namespace>

Once deleted, verify the status using helm history <your-release-name> -n <your-namespace>. The status should no longer show as pending, and you can safely retry your upgrade.

Scenario 2: "Field is immutable"

The Symptom:

Error: UPGRADE FAILED: cannot patch "my-statefulset": Invalid value: ... field is immutable

The Root Cause: Kubernetes enforces immutability on certain fields within specific API resources. The most notorious examples are:

  • volumeClaimTemplates within a StatefulSet.
  • spec.completions or spec.selector within a Job.
  • Certain labels acting as primary selectors in Deployments.

Helm tries to perform an in-place update (PATCH) on these resources. If your Helm chart changes an immutable field (for instance, increasing the storage size in a StatefulSet template), the Kubernetes API server outright rejects the request, causing the Helm upgrade to fail.

The Fix: Targeted Deletion or Workarounds

There are a few ways to handle this, depending on the resource:

  1. For StatefulSets (Storage changes): You cannot simply change the volumeClaimTemplate. You must manually update the PersistentVolumeClaims (PVCs) associated with the StatefulSet via kubectl edit pvc, and then update the StatefulSet. If you are changing something other than storage, you may need to delete the StatefulSet using the --cascade=orphan flag, which deletes the controller but leaves the Pods running, then let Helm recreate the controller: kubectl delete statefulset my-statefulset -n <namespace> --cascade=orphan Then run helm upgrade.

  2. For Jobs: Jobs are generally meant to run to completion and remain for log inspection. If a Helm chart includes a Job (e.g., a database migration hook), and you change its spec, Helm will fail. The standard practice is to add Helm annotations to automatically delete the job upon completion or before a new upgrade: "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded If you are stuck right now, manually delete the job: kubectl delete job <job-name> and retry.

  3. Using --force: You can append --force to your helm upgrade command. This forces Helm to effectively perform a replace (DELETE followed by POST) instead of a PATCH for resources that conflict. Warning: This will cause downtime as resources are completely destroyed and recreated, and it can orphan resources if not used carefully.

Scenario 3: "Timed out waiting for the condition"

The Symptom:

Error: UPGRADE FAILED: timed out waiting for the condition

The Root Cause: This happens when you run helm upgrade with the --wait or --atomic flag. Helm applies the manifests to the cluster and then monitors the resources to ensure they reach a "Ready" state. By default, Helm waits 5 minutes (5m0s).

If your Pods fail to start (e.g., CrashLoopBackOff, ImagePullBackOff, failing Readiness Probes, or lacking CPU/Memory quota), Helm will wait until the timeout is reached, then fail the upgrade.

The Fix: Diagnostics and Timeout Adjustments

  1. Investigate the Pods immediately: While Helm is hanging or immediately after it fails, check the pod status: kubectl get pods -n <namespace> Identify pods that are not 1/1 Running.

  2. Check Pod Events and Logs: kubectl describe pod <failing-pod-name> -n <namespace> Look at the Events section at the bottom. You will likely see the real root cause here: Failed to pull image, Liveness probe failed, Insufficient memory, etc. Fix the underlying application or configuration issue.

  3. Increase the Timeout: If your application is simply slow to start (e.g., a massive Java application or a database executing complex schema migrations on startup), the default 5 minutes might not be enough. Increase it using the --timeout flag: helm upgrade my-release my-chart/ --wait --timeout 15m

Scenario 4: CRD Upgrades and "Release Not Found"

Sometimes you'll see errors related to Custom Resource Definitions (CRDs). By design, Helm 3 does not manage CRD updates during a helm upgrade to prevent accidental data loss in custom resources. If your new chart version requires a newer CRD, the upgrade will fail complaining about unknown fields.

The Fix: You must manually apply the new CRDs before running the helm upgrade: kubectl apply -f https://raw.githubusercontent.com/vendor/project/v2.0.0/crds/crd.yaml

Similarly, if an initial helm install fails, Helm tracks the failed release. If you attempt to helm upgrade it subsequently, it might fail. You should use helm upgrade --install to ensure Helm knows to treat it as an installation if a valid deployment doesn't already exist, or rollback the failed installation first.

Best Practices to Avoid Upgrade Failures

  • Always use --atomic in CI/CD: This flag combines --wait with an automatic rollback. If the upgrade fails, Helm automatically restores the previous state, preventing your environment from being left broken.
  • Test upgrades with --dry-run: Running helm upgrade --dry-run --debug will validate your templates and render the manifests without applying them, catching syntax errors and obvious template failures early.
  • Use the Helm Diff Plugin: Before upgrading, use helm diff upgrade my-release my-chart/ to see exactly what Kubernetes resources will be modified, added, or deleted. This allows you to spot immutable field changes before they break your pipeline.

Frequently Asked Questions

bash
# Diagnostic and Remediation Script for Stuck Helm Releases

NAMESPACE="your-app-namespace"
RELEASE_NAME="your-release-name"

# 1. List all secrets associated with the Helm release to identify the stuck revision
echo "Fetching Helm release secrets..."
kubectl get secrets -n $NAMESPACE -l owner=helm,name=$RELEASE_NAME | awk '{print $1}'

# 2. Check the status of the release history
helm history $RELEASE_NAME -n $NAMESPACE

# 3. If the latest revision is stuck in 'pending-upgrade', find its exact secret name
# Replace <revision-number> with the stuck revision from the history command
STUCK_REVISION="5"
STUCK_SECRET="sh.helm.release.v1.${RELEASE_NAME}.v${STUCK_REVISION}"

# 4. Delete the stuck secret to revert state to the previous successful release
echo "Deleting stuck secret: $STUCK_SECRET"
kubectl delete secret $STUCK_SECRET -n $NAMESPACE

# 5. Verify the release is no longer stuck
helm history $RELEASE_NAME -n $NAMESPACE

# 6. Retry the upgrade with extended timeout and atomic rollback
helm upgrade $RELEASE_NAME ./my-chart \
  --namespace $NAMESPACE \
  --atomic \
  --timeout 10m
E

Error Medic Editorial

Error Medic Editorial is composed of Senior Site Reliability Engineers and DevOps specialists dedicated to providing actionable, production-ready solutions for Kubernetes, Helm, and cloud-native infrastructure challenges.

Sources

Related Articles in Helm

Explore More DevOps Config Guides