Troubleshooting 'cert-manager certificate expired' and 'x509: certificate has expired' in Kubernetes
Fix 'cert-manager certificate expired' errors by verifying ClusterIssuer status, checking Order and Challenge resources, and forcing a manual certificate renewa
- Always check the 'Ready' status of the Certificate object using 'kubectl get cert'.
- Trace the issuance pipeline: Certificate -> CertificateRequest -> Order -> Challenge.
- DNS-01 challenge failures (e.g., IAM permissions, propagation delays) are the most common root cause of expired certificates.
- Use 'cmctl renew' or safely delete the TLS Secret to force cert-manager to re-issue an expired certificate.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| cmctl renew | Certificate is stuck in 'Issuing' or expired but configuration is correct. | < 1 min | Low |
| Delete TLS Secret | Certificate controller is completely stuck or secret is corrupted. | 1-2 mins | Medium (Causes immediate downtime if cert wasn't already expired) |
| Restart cert-manager pods | Webhook errors or controller cache is out of sync. | 2 mins | Low |
| Update Issuer/ClusterIssuer | Credentials expired or rate limits reached (switching to staging). | 5 mins | Low |
Understanding the Error
When you encounter a cert-manager certificate expired or x509: certificate has expired error in your Kubernetes cluster, it means that cert-manager failed to automatically renew the TLS certificate before its expiration date. By default, cert-manager attempts to renew certificates 30 days before they expire. If it reaches the expiration date, your applications will experience downtime as clients (like web browsers or API consumers) reject the invalid SSL/TLS connection.
Typical error messages associated with this include:
- Browser:
NET::ERR_CERT_DATE_INVALID - Curl/CLI:
curl: (60) SSL certificate problem: certificate has expired - Kubernetes Events:
Issuing certificate as Secret does not exist or is missing datafollowed by stalled progress.
Step 1: Diagnose the Issuance Pipeline
cert-manager uses a specific pipeline to issue certificates. When a certificate expires, it means one of these pipeline resources is stuck or has failed. You must trace the failure from the top down.
First, check the status of your certificates:
kubectl get certificates -A
Look for certificates where READY is False or the EXPIRATION date has passed.
Next, describe the specific broken certificate to look for events:
kubectl describe certificate <cert-name> -n <namespace>
If the certificate is stuck, check the CertificateRequest:
kubectl get certificaterequest -n <namespace>
kubectl describe certificaterequest <request-name> -n <namespace>
If using ACME (like Let's Encrypt), the CertificateRequest will spawn an Order. Check the Order:
kubectl get orders -n <namespace>
kubectl describe order <order-name> -n <namespace>
Finally, the Order spawns a Challenge (either HTTP-01 or DNS-01). This is where 90% of certificate renewals fail:
kubectl get challenges -n <namespace>
kubectl describe challenge <challenge-name> -n <namespace>
Look closely at the Events section of the Challenge description. You will often see errors like Waiting for DNS-01 challenge propagation: DNS record for "_acme-challenge.example.com" not found or Error presenting challenge: unauthorized.
Step 2: Common Root Causes and Fixes
1. DNS-01 Challenge Propagation Failures
If you are using DNS-01 challenges, cert-manager needs to create a TXT record in your DNS provider (e.g., Route53, Cloudflare). If the IAM role, API token, or service account used by the ClusterIssuer has expired or lacks permissions, the challenge will fail.
Fix: Verify the credentials referenced in your ClusterIssuer Secret. Update the API token if it has expired. Ensure your DNS provider's API is reachable from the cluster.
2. Let's Encrypt Rate Limits
If you have been repeatedly deleting and recreating certificates, you may hit Let's Encrypt's strict rate limits (e.g., 5 duplicate certificates per week).
Fix: Look for 429 Too Many Requests in the Order or Challenge logs. If you hit this, you must wait until the rate limit resets. In the future, always use the Let's Encrypt Staging environment (https://acme-staging-v02.api.letsencrypt.org/directory) for testing.
3. cert-manager Webhook Issues
The cert-manager webhook component intercepts and validates cert-manager resources. If the webhook's own internal TLS certificate is expired or misconfigured, it will block all certificate issuance.
Error symptom: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook... x509: certificate has expired or is not yet valid.
Fix: Restart the webhook pod, or manually delete the webhook's mutating/validating webhook configurations and let cert-manager recreate them:
kubectl delete mutatingwebhookconfiguration cert-manager-webhook
kubectl delete validatingwebhookconfiguration cert-manager-webhook
kubectl rollout restart deployment/cert-manager-webhook -n cert-manager
4. HTTP-01 Challenge Routing Issues
For HTTP-01 challenges, cert-manager temporarily spins up an ingress resource and a pod to serve a specific token at /.well-known/acme-challenge/. If your Ingress Controller (like NGINX or ALB) isn't routing traffic properly, or if you have strict network policies blocking external access to this path, the challenge will fail.
Fix: Check your Ingress Controller logs. Ensure no WAF (Web Application Firewall) is blocking the Let's Encrypt validation servers. Verify that the temporary ingress resource created by cert-manager is acquiring an IP address.
Step 3: Forcing a Manual Renewal
Once you have identified and fixed the root cause (e.g., fixed IAM permissions for DNS-01), you need to tell cert-manager to try again.
The safest way is using cmctl (the cert-manager CLI tool):
cmctl renew <cert-name> -n <namespace>
If you don't have cmctl installed, you can trigger a renewal by deleting the Kubernetes Secret that stores the expired certificate data. cert-manager will detect that the Secret is missing and immediately kick off a new issuance process:
kubectl delete secret <cert-secret-name> -n <namespace>
Warning: Only do this if the certificate is already expired, as deleting the secret will temporarily break TLS for any pods currently mounting it until the new one is issued.
Monitoring and Prevention
To prevent this from happening again, implement proactive monitoring. Do not rely solely on cert-manager's internal mechanisms.
- Use Prometheus and Alertmanager to alert on the
certmanager_certificate_expiration_timestamp_secondsmetric. - Set up alerts for any
Certificateobjects where theReadycondition isFalsefor more than 1 hour. - Implement synthetic checks (like Blackbox Exporter) to externally probe your endpoints and alert if an SSL certificate has less than 14 days of validity remaining.
Frequently Asked Questions
#!/bin/bash
# Diagnostic script to thoroughly inspect a stuck cert-manager certificate
NAMESPACE="your-namespace"
CERT_NAME="your-cert-name"
echo "=== 1. Certificate Status ==="
kubectl describe certificate $CERT_NAME -n $NAMESPACE | tail -n 20
echo "\n=== 2. Associated CertificateRequest ==="
CR_NAME=$(kubectl get certificaterequest -n $NAMESPACE -l "cert-manager.io/certificate-name=$CERT_NAME" -o jsonpath='{.items[0].metadata.name}')
kubectl describe certificaterequest $CR_NAME -n $NAMESPACE | tail -n 20
echo "\n=== 3. Associated Order ==="
ORDER_NAME=$(kubectl get order -n $NAMESPACE -l "cert-manager.io/certificate-name=$CERT_NAME" -o jsonpath='{.items[0].metadata.name}')
kubectl describe order $ORDER_NAME -n $NAMESPACE | tail -n 20
echo "\n=== 4. Associated Challenges ==="
CHALLENGE_NAMES=$(kubectl get challenge -n $NAMESPACE -l "acme.cert-manager.io/order-name=$ORDER_NAME" -o name)
for CHALLENGE in $CHALLENGE_NAMES; do
echo "Describing $CHALLENGE:"
kubectl describe $CHALLENGE -n $NAMESPACE | tail -n 20
done
# Command to force renewal once the issue is fixed:
# cmctl renew $CERT_NAME -n $NAMESPACEError Medic Editorial
Senior Site Reliability Engineers specializing in Kubernetes, cloud-native infrastructure, and automated TLS certificate management.