Kubernetes has a certificate problem that most teams don’t realize until something breaks.
The cluster itself uses certificates for internal authentication (API server, kubelets, etcd). Your applications need certificates for ingress TLS termination. Your microservices need certificates for mTLS between pods. And each layer has different management mechanisms, different expiry timelines, and different failure modes.
A certificate expiring at the cluster infrastructure layer takes down the entire cluster (kubectl stops working, pods can’t be scheduled, nothing deploys). A certificate expiring at the ingress layer takes down your public-facing services. A certificate expiring in the service mesh breaks internal communication.
This guide covers all three layers: what certificates exist, how to manage them, and how to monitor them so nothing expires without warning.
Layer 1: Cluster Infrastructure Certificates
These are the certificates Kubernetes itself uses for internal component authentication. They’re created at cluster initialization and expire after 1 year (kubeadm default).
What Exists
/etc/kubernetes/pki/
├── ca.crt + ca.key # Cluster CA (signs everything below)
├── apiserver.crt + apiserver.key # API server TLS (what kubectl connects to)
├── apiserver-kubelet-client.crt # API server → kubelet authentication
├── apiserver-etcd-client.crt # API server → etcd authentication
├── front-proxy-ca.crt + key # Aggregation layer CA
├── front-proxy-client.crt + key # Aggregation layer client
├── etcd/
│ ├── ca.crt + ca.key # etcd CA (separate from cluster CA)
│ ├── server.crt + server.key # etcd server TLS
│ ├── peer.crt + peer.key # etcd peer communication
│ └── healthcheck-client.crt + key # etcd health checks
└── sa.key + sa.pub # Service account token signing
The Danger
kubeadm clusters: Certificates expire after 1 year. If nobody renews them:
- kubectl returns “certificate has expired”
- New pods can’t be scheduled
- Existing pods continue running but can’t be managed
- The cluster is effectively frozen
Managed clusters (EKS, GKE, AKS): The provider handles infrastructure certificate rotation automatically. You never see these certificates.
How to Manage
# Check expiry dates (kubeadm)
kubeadm certs check-expiration
# Renew all certificates
kubeadm certs renew all
systemctl restart kubelet
# Automate with a cron job (run monthly)
0 0 1 * * /usr/bin/kubeadm certs renew all && systemctl restart kubelet
Monitoring
# Prometheus alert for cluster cert expiry
- alert: KubernetesClusterCertExpiringSoon
expr: |
(kube_certificate_expiration_timestamp_seconds - time()) / 86400 < 30
labels:
severity: critical
annotations:
summary: "Kubernetes cluster certificate expires in < 30 days"
Layer 2: Ingress Certificates (cert-manager)
These are the TLS certificates that terminate HTTPS for your public-facing services. cert-manager is the standard tool for managing them.
Setup
# 1. Install cert-manager
# kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
# 2. Create a ClusterIssuer (Let's Encrypt)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform-team@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
- dns01:
cloudDNS:
project: my-gcp-project
selector:
dnsNames:
- "*.example.com"
# 3. Annotate your Ingress (automatic certificate provisioning)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- app.example.com
secretName: app-tls # cert-manager creates and manages this Secret
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app
port:
number: 80
That’s it. cert-manager handles: key generation → CSR → ACME challenge → certificate issuance → Secret creation → renewal at 2/3 lifetime.
For Internal Services (Private CA)
# Vault issuer for internal certificates
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-internal
spec:
vault:
server: https://vault.internal:8200
path: pki_int/sign/internal-service
auth:
kubernetes:
role: cert-manager
mountPath: /v1/auth/kubernetes
---
# Internal service certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: payment-api-tls
namespace: production
spec:
secretName: payment-api-tls
issuerRef:
name: vault-internal
kind: ClusterIssuer
dnsNames:
- payment-api.production.svc.cluster.local
- payment-api.internal.example.com
duration: 720h # 30 days
renewBefore: 240h # Renew 10 days before expiry
privateKey:
algorithm: ECDSA
size: 256
Monitoring cert-manager
# Prometheus alerts for cert-manager
- alert: CertManagerCertNotReady
expr: certmanager_certificate_ready_status{condition="False"} == 1
for: 30m
labels:
severity: warning
annotations:
summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} is not ready"
- alert: CertManagerCertExpiringSoon
expr: (certmanager_certificate_expiration_timestamp_seconds - time()) < 604800
labels:
severity: critical
annotations:
summary: "Certificate {{ $labels.name }} expires in < 7 days"
Layer 3: Service-to-Service mTLS (Service Mesh)
For encrypting and authenticating all pod-to-pod traffic.
Istio (Most Common)
# Enable strict mTLS mesh-wide
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
# Istio automatically:
# - Injects Envoy sidecar into every pod
# - Issues per-pod certificates (24h validity, SPIFFE ID)
# - Rotates certificates before expiry
# - Encrypts all pod-to-pod traffic with mTLS
# - No application code changes required
Linkerd (Simpler Alternative)
# Install Linkerd (mTLS on by default)
linkerd install | kubectl apply -f -
# Inject into a namespace
kubectl annotate namespace production linkerd.io/inject=enabled
# Verify mTLS is active
linkerd viz edges -n production
# Shows: all connections secured with mTLS
Monitoring Service Mesh Certificates
# Istio: check certificate status for a pod
istioctl proxy-config secret <pod-name> -n production
# Shows: certificate chain, expiry, SPIFFE ID
# Linkerd: check identity
linkerd identity -n production
# Shows: certificate issuer, expiry, trust anchors
The Complete Monitoring Stack
Monitor all three layers with a unified approach:
# Prometheus rules covering all certificate layers
groups:
- name: certificate-monitoring
rules:
# Layer 1: Cluster infrastructure
- alert: ClusterCertExpiring
expr: (apiserver_client_certificate_expiration_seconds - time()) / 86400 < 30
# Layer 2: Ingress (cert-manager)
- alert: IngressCertExpiring
expr: (certmanager_certificate_expiration_timestamp_seconds - time()) / 86400 < 7
# Layer 3: Service mesh
- alert: IstioCertError
expr: increase(istio_agent_cert_rotation_failure_total[5m]) > 0
# External probe (what clients actually see)
- alert: ExternalCertExpiring
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
Common Failures and Fixes
cert-manager: “error presenting challenge”
Cause: DNS-01 challenge can’t create TXT record (wrong credentials, missing permissions).
# Debug
kubectl describe challenge <challenge-name> -n <namespace>
kubectl logs -n cert-manager deploy/cert-manager -f | grep -i error
Ingress serves old certificate after renewal
Cause: Ingress controller didn’t detect the Secret update.
# Force ingress controller to reload
kubectl rollout restart deployment ingress-nginx-controller -n ingress-nginx
# Or verify the controller watches Secrets (nginx-ingress does by default)
Istio: “upstream connect error or disconnect/reset before headers”
Cause: mTLS handshake failure between pods (certificate expired, CA mismatch, sidecar not injected).
# Check if sidecar is injected
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'
# Should include "istio-proxy"
# Check certificate validity
istioctl proxy-config secret <pod> | grep "VALID"
FAQ
Q: Do I need cert-manager if I use a service mesh? A: Yes — they handle different layers. cert-manager manages ingress certificates (public-facing TLS). The service mesh manages internal mTLS certificates (pod-to-pod). They don’t overlap.
Q: What about Gateway API (replacing Ingress)?
A: cert-manager supports Gateway API via the gateway-shim controller. Same concept: annotate your Gateway/HTTPRoute, cert-manager provisions the certificate.
Q: How do I handle certificates for non-HTTP services (gRPC, databases)?
A: Use cert-manager Certificate resources directly (not via Ingress annotations). Mount the resulting Secret as a volume in your pod. Your application loads the cert from the mounted path.
Q: Should I use Let’s Encrypt or a private CA for internal services? A: Private CA (Vault, AWS PCA, self-signed CA via cert-manager). Internal services don’t need public trust. Private CAs give you: no rate limits, custom validity periods, no external dependency, and no information leakage to CT logs.