Certificate Management for DevOps Teams: Stop Treating Certs as an Afterthought

You’ve automated everything else. Infrastructure is Terraform. Deployments are CI/CD. Monitoring is Prometheus + Grafana. Secrets are in Vault. Scaling is automatic.

But certificates? Somebody manually requests them from a portal. Somebody downloads a ZIP file. Somebody SCPs it to a server. Somebody remembers to renew it in 11 months. Maybe.

This is the gap. DevOps teams that deploy 50 services a week still manage certificates like it’s a manual IT process from 2010. And then they’re surprised when an expired cert takes down production at 2 AM on a Saturday.

Here’s how to fix it — treating certificates as infrastructure, not tickets.

The DevOps Certificate Manifesto

Certificates should be:

Declared in code (not requested via email/portal)
Provisioned automatically (not downloaded and uploaded manually)
Renewed without human intervention (not tracked in spreadsheets)
Monitored like any other infrastructure (not discovered during outages)
Ephemeral where possible (short-lived, disposable, auto-replaced)

If your certificate process requires a human to do anything other than write the initial configuration, it’s not automated enough.

Pattern 1: Certificates as Code (Terraform)

Declare certificates in your infrastructure code. They’re provisioned alongside the infrastructure that uses them.

AWS (ACM + Route53 + ALB)

# Certificate declared in Terraform
resource "aws_acm_certificate" "api" {
  domain_name               = "api.example.com"
  subject_alternative_names = ["api-v2.example.com"]
  validation_method         = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

# DNS validation (automatic)
resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.api.domain_validation_options : dvo.domain_name => dvo
  }
  zone_id = data.aws_route53_zone.main.zone_id
  name    = each.value.resource_record_name
  type    = each.value.resource_record_type
  records = [each.value.resource_record_value]
  ttl     = 60
}

# Wait for validation
resource "aws_acm_certificate_validation" "api" {
  certificate_arn         = aws_acm_certificate.api.arn
  validation_record_fqdns = [for r in aws_route53_record.cert_validation : r.fqdn]
}

# Attach to ALB (certificate auto-renews via ACM)
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = 443
  protocol          = "HTTPS"
  certificate_arn   = aws_acm_certificate_validation.api.certificate_arn
  # ...
}

Result: terraform apply creates the certificate, validates it via DNS, and attaches it to the load balancer. ACM auto-renews. Zero ongoing maintenance.

GCP (Managed Certificate + Load Balancer)

resource "google_compute_managed_ssl_certificate" "api" {
  name = "api-cert"
  managed {
    domains = ["api.example.com"]
  }
}

resource "google_compute_target_https_proxy" "api" {
  name             = "api-proxy"
  url_map          = google_compute_url_map.api.id
  ssl_certificates = [google_compute_managed_ssl_certificate.api.id]
}

Pattern 2: Certificates in Kubernetes (cert-manager)

For Kubernetes workloads, cert-manager is the standard. Certificates are Kubernetes resources — managed the same way as Deployments and Services.

The GitOps Way

# In your Helm chart or Kustomize overlay:
# charts/my-app/templates/certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: {{ .Release.Name }}-tls
  namespace: {{ .Release.Namespace }}
spec:
  secretName: {{ .Release.Name }}-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  {{- range .Values.ingress.hosts }}
    - {{ . }}
  {{- end }}
  privateKey:
    algorithm: ECDSA
    size: 256

# values.yaml
ingress:
  hosts:
    - api.example.com
    - api-v2.example.com

Result: Deploy the app → certificate is automatically provisioned. Delete the app → certificate is cleaned up. Scale to 10 environments → each gets its own certificate automatically.

Monitoring cert-manager (Prometheus)

# ServiceMonitor for cert-manager metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cert-manager
spec:
  selector:
    matchLabels:
      app: cert-manager
  endpoints:
  - port: http-metrics

# Alert rules
- alert: CertificateNotReady
  expr: certmanager_certificate_ready_status{condition="False"} == 1
  for: 15m
  annotations:
    summary: "Certificate {{ $labels.name }} failed to issue"

- alert: CertificateExpiringSoon
  expr: (certmanager_certificate_expiration_timestamp_seconds - time()) / 86400 < 7
  annotations:
    summary: "Certificate {{ $labels.name }} expires in < 7 days"

Pattern 3: Certificates in CI/CD Pipelines

For services that aren’t in Kubernetes or cloud-managed load balancers:

GitHub Actions: Request + Deploy + Verify

name: Deploy with Certificate
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Request certificate (if needed)
        run: |
          # Check if current cert expires within 30 days
          EXPIRY=$(ssh deploy@server "openssl x509 -enddate -noout -in /etc/ssl/certs/app.pem" | cut -d= -f2)
          EXPIRY_EPOCH=$(date -d "$EXPIRY" +%s)
          NOW_EPOCH=$(date +%s)
          DAYS_LEFT=$(( (EXPIRY_EPOCH - NOW_EPOCH) / 86400 ))
          
          if [ $DAYS_LEFT -lt 30 ]; then
            echo "Certificate expires in $DAYS_LEFT days — renewing"
            ssh deploy@server "certbot renew --deploy-hook 'systemctl reload nginx'"
          fi

      - name: Deploy application
        run: |
          # Your normal deployment steps
          ssh deploy@server "cd /app && git pull && docker-compose up -d"

      - name: Verify certificate
        run: |
          sleep 10
          CERT_INFO=$(echo | openssl s_client -connect app.example.com:443 -servername app.example.com 2>/dev/null | openssl x509 -noout -subject -enddate)
          echo "$CERT_INFO"
          echo "$CERT_INFO" | grep -q "app.example.com" || exit 1

Pattern 4: Certificate Monitoring as Code

Your monitoring stack should treat certificate expiry the same as disk space or memory usage:

Prometheus + Blackbox Exporter

# prometheus.yml — probe all TLS endpoints
scrape_configs:
  - job_name: 'tls-certificates'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - api.example.com:443
        - app.example.com:443
        - admin.example.com:443
        - payments.example.com:443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

# Alert rules
groups:
- name: certificates
  rules:
  - alert: TLSCertExpiring30Days
    expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
    labels:
      severity: warning
    annotations:
      summary: "TLS cert for {{ $labels.instance }} expires in < 30 days"

  - alert: TLSCertExpiring7Days
    expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 7
    labels:
      severity: critical
    annotations:
      summary: "CRITICAL: TLS cert for {{ $labels.instance }} expires in < 7 days"
      runbook: "https://wiki.internal/runbooks/certificate-renewal"

Grafana Dashboard

# Days until expiry for all monitored endpoints
(probe_ssl_earliest_cert_expiry - time()) / 86400

# Count of certificates expiring within 30 days
count(probe_ssl_earliest_cert_expiry - time() < 86400 * 30)

# Certificate issuer distribution
count by (issuer_cn) (probe_ssl_last_chain_info)

Pattern 5: Internal Certificates with Vault

For internal services that need mTLS or private certificates:

# In your deployment script or Helm chart:
# 1. Authenticate to Vault (using K8s service account or CI JWT)
export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login \
  role=my-app jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))

# 2. Request a short-lived certificate
vault write -format=json pki/issue/internal-service \
  common_name="my-app.production.svc.cluster.local" \
  ttl="72h" > /tmp/cert.json

# 3. Extract cert and key
jq -r '.data.certificate' /tmp/cert.json > /etc/ssl/app.pem
jq -r '.data.private_key' /tmp/cert.json > /etc/ssl/app-key.pem
jq -r '.data.ca_chain[]' /tmp/cert.json >> /etc/ssl/app.pem

# 4. Clean up
rm /tmp/cert.json

With Vault Agent (sidecar), this happens automatically with renewal:

# vault-agent-config.hcl
template {
  source      = "/vault/templates/cert.tpl"
  destination = "/etc/ssl/app.pem"
  command     = "nginx -s reload"
  # Vault Agent re-renders template when cert approaches expiry
  # Nginx reloads automatically
}

The Anti-Patterns (What NOT to Do)

❌ Certificates in Git

# NEVER commit certificates or keys to source control
# .gitignore should include:
*.pem
*.key
*.crt
*.pfx
*.p12

❌ Long-Lived Certificates for Dynamic Infrastructure

If your infrastructure scales up/down daily, don’t use 1-year certificates that require manual renewal. Use short-lived certificates (hours/days) that are issued at deploy time and expire naturally.

❌ Shared Wildcard Certificates Across Environments

# DON'T: Same wildcard cert on dev, staging, and production
*.example.com → deployed everywhere

# DO: Separate certificates per environment
dev.example.com   → cert from Let's Encrypt (auto-renewed)
staging.example.com → cert from Let's Encrypt (auto-renewed)
api.example.com   → cert from Let's Encrypt (auto-renewed)

Shared wildcards mean: one compromised environment exposes the key for all environments.

❌ Manual Renewal Reminders

If your certificate management strategy involves calendar reminders or Jira tickets for renewal, it’s not automated — it’s a human process pretending to be managed. Automate it or accept that you’ll have outages.

The Maturity Model for DevOps Certificate Management

Level	Description	Characteristics
0	Chaos	Manual everything. Certs expire without warning.
1	Tracked	Spreadsheet/monitoring exists. Still manual renewal.
2	Automated	ACME/cert-manager handles renewal. Monitoring alerts on failure.
3	Codified	Certificates declared in IaC. Provisioned with infrastructure.
4	Ephemeral	Short-lived certs. No renewal needed. Issued at deploy, expire naturally.

Most DevOps teams are at Level 1-2. The goal is Level 3-4.

FAQ

Q: Should every service have its own certificate? A: Yes. One certificate per service (or per endpoint). Shared certificates (especially wildcards) create shared risk. If one service’s key is compromised, all services sharing that certificate are affected.

Q: How do I handle certificates for local development? A: Use mkcert — it generates locally-trusted certificates for localhost and custom domains. No browser warnings, no self-signed cert hacks. For team-wide dev environments, use a shared private CA with cert-manager.

Q: What about certificates for non-HTTP services (databases, message queues)? A: Same principles apply. Use cert-manager Certificate resources (mount the Secret as a volume), Vault PKI (request at startup), or your CLM platform’s agent. The protocol doesn’t matter — the lifecycle management is the same.

Q: How do I convince my team to invest in certificate automation? A: Calculate the cost of your last certificate outage (or the next one). Include: engineer time (emergency response at 2 AM), revenue loss, customer trust impact, and the post-mortem time. Compare to the cost of setting up cert-manager (a few hours) or ACME (an afternoon). The ROI is immediate.

Certificate Management for DevOps Teams: Stop Treating Certs as an Afterthought

The DevOps Certificate Manifesto

Pattern 1: Certificates as Code (Terraform)

AWS (ACM + Route53 + ALB)

GCP (Managed Certificate + Load Balancer)

Pattern 2: Certificates in Kubernetes (cert-manager)

The GitOps Way

Monitoring cert-manager (Prometheus)

Pattern 3: Certificates in CI/CD Pipelines

GitHub Actions: Request + Deploy + Verify

Pattern 4: Certificate Monitoring as Code

Prometheus + Blackbox Exporter

Grafana Dashboard

Pattern 5: Internal Certificates with Vault

The Anti-Patterns (What NOT to Do)

❌ Certificates in Git

❌ Long-Lived Certificates for Dynamic Infrastructure

❌ Shared Wildcard Certificates Across Environments

❌ Manual Renewal Reminders

The Maturity Model for DevOps Certificate Management

FAQ

Free SSL Tools

Related Insights

Ready to Secure Your Enterprise?

The DevOps Certificate Manifesto

Pattern 1: Certificates as Code (Terraform)

AWS (ACM + Route53 + ALB)

GCP (Managed Certificate + Load Balancer)

Pattern 2: Certificates in Kubernetes (cert-manager)

The GitOps Way

Monitoring cert-manager (Prometheus)

Pattern 3: Certificates in CI/CD Pipelines

GitHub Actions: Request + Deploy + Verify

Pattern 4: Certificate Monitoring as Code

Prometheus + Blackbox Exporter

Grafana Dashboard

Pattern 5: Internal Certificates with Vault

The Anti-Patterns (What NOT to Do)

❌ Certificates in Git

❌ Long-Lived Certificates for Dynamic Infrastructure

❌ Shared Wildcard Certificates Across Environments

❌ Manual Renewal Reminders

The Maturity Model for DevOps Certificate Management

FAQ

Free SSL Tools

Related Insights

Ready to Secure Your Enterprise?

Stay ahead on cryptography & PKI