Error Medic

How to Fix "cert-manager certificate expired" (x509: certificate has expired) in Kubernetes

Fix Kubernetes cert-manager certificate expired errors (x509). Learn to diagnose failed renewals, troubleshoot ACME challenges, and force manual certificate ren

Last updated:
Last verified:
1,553 words
Key Takeaways
  • Certificates typically expire because the automated renewal process failed silently due to DNS-01 or HTTP-01 validation blocking.
  • Stuck CertificateRequests, Let's Encrypt rate limits, and misconfigured Ingress classes are the most common root causes.
  • Quick Fix: Use 'cmctl renew <cert-name>' to trigger a forced manual renewal, or delete the failing Challenge and CertificateRequest objects.
  • cert-manager's webhook component failure can block all validations; always ensure the cert-manager pods are Running and Ready.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Force Renewal (cmctl)Validation failed temporarily due to a transient network or API issue1 minLow
Delete Secret & RequestState is corrupted, or CertificateRequest is stuck in a failed loop2 minsMedium (Creates brief TLS termination downtime if active)
Reconfigure Issuer/ChallengeDNS/HTTP01 challenge is permanently failing due to infrastructure changes15 minsLow
Restart cert-manager WebhookValidations are failing with 'failed calling webhook' errors2 minsLow

Understanding the Error

When a cert-manager certificate expires in a Kubernetes cluster, users typically experience immediate outages on secure endpoints. Browsers throw NET::ERR_CERT_DATE_INVALID, APIs return curl: (60) SSL certificate problem: certificate has expired, and your Ingress controller logs will be flooded with x509: certificate has expired or is not yet valid errors.

cert-manager is designed to automatically renew certificates before they expire (usually 30 days prior for Let's Encrypt). If a certificate has actually expired, it means the automated renewal process has been failing silently for weeks. This failure is rarely a bug in cert-manager itself; rather, it is almost always an infrastructure issue preventing the ACME server (like Let's Encrypt) from validating domain ownership via HTTP-01 or DNS-01 challenges.

The cert-manager Resource Chain

To troubleshoot effectively, you must understand the resource chain cert-manager uses to issue a certificate:

  1. Certificate: The top-level resource requesting a specific domain.
  2. CertificateRequest: Created by the Certificate when a new keypair/cert is needed.
  3. Order: Created for ACME issuers to represent the request to the CA.
  4. Challenge: Created by the Order to prove domain control.

When a certificate expires, the breakdown usually happens at the Challenge or Order level.

Step 1: Diagnose the Failure

Do not start deleting resources blindly. Find out exactly why the renewal failed.

1. Check the Certificate Status

Run the following command to check the READY status of your certificates: kubectl get certificates -A

If the READY column says False, describe the certificate to read the events: kubectl describe certificate <certificate-name> -n <namespace>

Look at the Conditions section at the bottom. You might see Issuing or a specific error message stating why it failed.

2. Trace the Request, Order, and Challenge

If the Certificate is stuck issuing, move down the chain: kubectl get certificaterequest,order,challenge -n <namespace>

You will likely see a challenge object that has been pending for a long time. Describe it: kubectl describe challenge <challenge-name> -n <namespace>

The State and Reason fields in the Challenge events will tell you exactly what Let's Encrypt saw when it tried to validate your domain. Common reasons include:

  • Waiting for HTTP-01 challenge propagation: failed to perform self check GET request
  • DNS record for _acme-challenge.example.com not found
  • 403 Forbidden or Rate limit exceeded

Step 2: Resolve Common Root Causes

Root Cause A: HTTP-01 Challenge Failing

HTTP-01 challenges work by spinning up a temporary pod and Ingress route to serve a specific token. If Let's Encrypt cannot reach this token via http://<your-domain>/.well-known/acme-challenge/<token>, validation fails.

Fixes:

  1. Ingress Class mismatch: Ensure your Issuer or ClusterIssuer specifies the correct ingress class name (e.g., nginx or traefik). If you recently upgraded your ingress controller, the class name might have changed.
  2. Firewalls/WAF: Check if a Web Application Firewall (like Cloudflare or AWS WAF) is blocking HTTP traffic to .well-known/acme-challenge/ paths.
  3. Global Redirects: If your Ingress strictly redirects all HTTP traffic to HTTPS, the challenge might fail if the TLS certificate is already completely expired and invalid, preventing the ACME server from following the redirect securely. Temporarily disable the HTTPS redirect, or configure your ingress to bypass redirects for the .well-known path.

Root Cause B: DNS-01 Challenge Failing

DNS-01 challenges create a TXT record _acme-challenge.<your-domain>. cert-manager needs API access to your DNS provider (e.g., AWS Route53, Cloudflare, Google Cloud DNS) to create this record.

Fixes:

  1. IAM Permissions: If using AWS IRSA or GCP Workload Identity, verify the cert-manager service account still has the correct IAM role bound. Service account tokens may have rotated, or permissions may have been revoked.
  2. API Token Expiration: If you are using a Kubernetes Secret to store a Cloudflare or DigitalOcean API token, check if the token itself has expired or been revoked.
  3. Propagation Delay: Sometimes DNS takes longer to propagate than cert-manager expects. You can increase the DNS01 propagation check delay in your Issuer configuration.

Root Cause C: Webhook Unavailability

Sometimes the cert-manager-webhook pod crashes or the ValidatingWebhookConfiguration gets out of sync, preventing cert-manager from modifying any of its custom resources.

Fix: Check the cert-manager pods: kubectl get pods -n cert-manager If the webhook pod is crash-looping or has restarts, check its logs. You may need to delete the webhook pod to force a restart, or re-apply the cert-manager manifests if the TLS certificates for the webhook itself have expired.

Step 3: Force the Renewal Process

Once you have resolved the underlying infrastructure issue (e.g., fixed the Ingress route or updated the DNS API token), cert-manager uses exponential backoff and might not retry immediately. You should force it to retry.

Using cmctl (Recommended): If you have the cmctl CLI tool installed: cmctl renew <certificate-name> -n <namespace>

Using kubectl (Manual approach): If you don't have cmctl, you can trigger a renewal by deleting the stuck CertificateRequest and the underlying Secret (Note: deleting the secret means the expired cert is gone entirely until the new one is issued, which causes hard TLS drops rather than expired cert warnings). kubectl delete certificaterequest -l app.kubernetes.io/name=cert-manager -n <namespace>

Alternatively, edit the Certificate resource and add a dummy annotation (like kubectl annotate cert <name> force-renew=$(date +%s)) which sometimes nudges the controller.

Step 4: Verification

Watch the logs of the cert-manager controller to ensure the issuance succeeds: kubectl logs -n cert-manager -l app=cert-manager -f

You should see lines indicating the Order was created, the Challenge was presented, and finally, the Certificate was issued successfully. Verify the new expiration date: echo | openssl s_client -showcerts -servername your-domain.com -connect your-domain.com:443 2>/dev/null | openssl x509 -inform pem -noout -dates

Frequently Asked Questions

bash
# 1. Check the status of all certificates in the cluster
kubectl get certificates -A

# 2. Describe the failing certificate to find the root cause
kubectl describe certificate <certificate-name> -n <namespace>

# 3. Check for stuck CertificateRequests, Orders, or Challenges
kubectl get certificaterequest,order,challenge -n <namespace>

# 4. Describe the specific challenge to see why validation failed
kubectl describe challenge <challenge-name> -n <namespace>

# 5. Check cert-manager controller logs for detailed API errors
kubectl logs -n cert-manager deploy/cert-manager

# 6. FORCE RENEWAL (Once the infrastructure issue is fixed)
# Option A: Using cmctl (Recommended)
cmctl renew <certificate-name> -n <namespace>

# Option B: Using kubectl to clean up stuck requests
kubectl delete certificaterequest <certificaterequest-name> -n <namespace>
kubectl delete challenge <challenge-name> -n <namespace>

# 7. Verify the newly issued certificate date on your live endpoint
echo | openssl s_client -showcerts -servername example.com -connect example.com:443 2>/dev/null | openssl x509 -inform pem -noout -dates
E

Error Medic Editorial

Our team of Senior DevOps and Site Reliability Engineers specializes in Kubernetes infrastructure, observability, and cloud-native troubleshooting.

Sources

Related Articles in Cert Manager

Explore More DevOps Config Guides