Every time you connect to a website over HTTPS, send an encrypted email, sign a software package, or authenticate to a VPN, you’re relying on Public Key Infrastructure. PKI is the system that makes digital trust possible — it answers the question “how do I know this server/person/device is who they claim to be?” using mathematics instead of blind faith.
Yet most engineers interact with PKI only when something breaks: an expired certificate takes down production, a chain validation error blocks a deployment, or an auditor asks “show me your PKI architecture” and nobody can answer.
This guide covers PKI from first principles through enterprise deployment — what it is, how it works, how to design a hierarchy, and where organizations get it wrong.
PKI in One Paragraph
PKI is a system for issuing, managing, and validating digital certificates. A Certificate Authority (CA) signs certificates that bind a public key to an identity (domain name, organization, device). Clients verify these certificates by checking the signature chain back to a trusted root. If the chain is valid and the certificate isn’t expired or revoked, the identity is trusted and encrypted communication can begin.
That’s it. Everything else — hierarchies, intermediates, CRLs, OCSP, key ceremonies, HSMs — is implementation detail that makes this basic model work securely at scale.
The Core Components
1. Certificate Authority (CA)
The CA is the trust anchor. It vouches for identities by signing their public keys. There are two types:
Public CAs (DigiCert, Let’s Encrypt, Sectigo) — trusted by every browser and OS by default. Used for public-facing websites and services. You don’t control the root.
Private CAs (Microsoft AD CS, EJBCA, Vault PKI, AWS Private CA) — trusted only within your organization. You own the root. Used for internal mTLS, device authentication, Wi-Fi (EAP-TLS), VPN, and code signing.
2. Certificates (X.509)
A certificate is a signed document containing:
- Subject — who this certificate identifies (domain name, organization, device)
- Public key — the cryptographic key bound to this identity
- Issuer — which CA signed this certificate
- Validity — when the certificate is valid (notBefore → notAfter)
- Extensions — what the certificate can be used for (TLS server auth, client auth, code signing)
- Signature — the CA’s digital signature over all the above
3. Trust Store
Every OS and browser ships with a pre-installed list of trusted Root CA certificates (~150 roots). When a client receives a certificate, it builds a chain from the certificate up to a root in its trust store. If the chain is complete and all signatures verify, the certificate is trusted.
4. Registration Authority (RA)
The RA handles identity verification before the CA signs. For public CAs, this means domain validation (HTTP challenge, DNS record). For enterprise CAs, this might mean checking Active Directory group membership or manager approval.
5. Revocation Infrastructure
When a certificate needs to be invalidated before expiry (key compromise, employee termination), revocation mechanisms notify clients:
- CRL (Certificate Revocation List) — a signed list of revoked serial numbers, published periodically
- OCSP (Online Certificate Status Protocol) — real-time per-certificate status checks
- Short-lived certificates — the modern approach: issue certificates with hours/days validity so revocation becomes unnecessary
How Certificate Validation Works
When your browser connects to https://bank.com:
1. Browser receives server's certificate
2. Checks: is the certificate expired? → No → continue
3. Checks: does the SAN match the hostname? → bank.com matches → continue
4. Finds the issuer (Intermediate CA) → checks its signature → valid
5. Finds the Intermediate's issuer (Root CA) → is it in my trust store? → Yes
6. Chain complete: End Entity → Intermediate → Root (trusted)
7. Checks revocation (OCSP staple or CRL) → not revoked
8. Certificate is trusted → TLS handshake continues → encrypted session established
If any step fails — expired, wrong hostname, broken chain, untrusted root, revoked — the connection is rejected.
PKI Hierarchy Design
Why Hierarchies Exist
A single CA that signs everything is a single point of catastrophic failure. If its key is compromised, every certificate it ever issued is untrusted. Hierarchies solve this:
- Root CA — offline, air-gapped, stored in HSM. Used only to sign Intermediate CAs. If an Intermediate is compromised, revoke it and issue a new one. The Root survives.
- Intermediate CA (Issuing CA) — online, handles daily certificate issuance. Can be revoked without destroying the entire trust hierarchy.
1-Tier (Don’t Do This in Production)
Root/Issuing CA (online) → End Entity Certificates
The CA is both the trust anchor and the daily issuer. If compromised, everything is lost. Acceptable only for development and testing.
2-Tier (Standard for Most Organizations)
Root CA (offline, HSM) → Issuing CA (online) → End Entity Certificates
The Root signs the Issuing CA’s certificate, then goes offline. The Issuing CA handles all certificate requests. If the Issuing CA is compromised: revoke it, bring the Root online, sign a new Issuing CA, re-issue affected certificates. The trust anchor (Root) is preserved.
3-Tier (Large Enterprises, Government)
Root CA (offline) → Policy CA → Issuing CA (online) → End Entity Certificates
Adds a Policy CA layer for organizations that need multiple Issuing CAs with different policies (one for TLS, one for code signing, one for device identity). Each Policy CA defines a scope; Issuing CAs under it inherit that scope.
Enterprise PKI: Where It Gets Real
Microsoft AD CS (The Most Common Enterprise PKI)
Most enterprises run Microsoft Active Directory Certificate Services. A typical deployment:
Offline Root CA (standalone, Windows Server, powered off)
↓ signs (during annual ceremony)
Online Issuing CA (enterprise, AD-integrated, auto-enrollment)
↓ issues
Certificates for: users, computers, servers, Wi-Fi, VPN, code signing
Auto-enrollment is the killer feature: domain-joined machines automatically request and receive certificates based on Group Policy templates. No manual CSR submission. No human involvement.
Common problems:
- Root CA server hasn’t been powered on in 2 years (CRL expired, chain breaks)
- Issuing CA certificate approaching expiry (nobody tracks it)
- Certificate templates with overly permissive settings (any user can request any cert)
- No monitoring on certificate issuance (rogue certificates go undetected)
Cloud-Native PKI
For organizations without on-premises infrastructure:
- AWS Private CA — managed CA service, HSM-backed, API-driven. $400/month per CA + $0.75 per certificate.
- Google CAS — Certificate Authority Service, integrates with GKE and Workload Identity.
- HashiCorp Vault PKI — self-hosted, issues short-lived certificates via API. Popular for service mesh and microservices.
- cert-manager — Kubernetes-native, automates certificate lifecycle for cluster workloads.
Hybrid Approach (Most Practical)
On-premises Root CA (offline, HSM, key ceremony)
↓
Cloud Issuing CA (AWS Private CA or Vault PKI)
↓
Certificates for cloud workloads, containers, APIs
Root stays under your physical control. Issuing CA runs in the cloud for availability and automation. Best of both worlds.
PKI Failures: What Goes Wrong
1. The Forgotten Root CA
The Root CA was set up 5 years ago on a Windows Server in a closet. Nobody remembers the admin password. The CRL it publishes expired 2 years ago. Every certificate in the hierarchy is technically unverifiable because the CRL endpoint returns a stale file. Nobody notices until a Java application (which actually checks CRLs) starts rejecting certificates.
Prevention: Monitor Root CA CRL expiry. Schedule annual ceremonies to publish fresh CRLs.
2. Certificate Template Misconfiguration
An AD CS template allows any authenticated user to request a certificate with the “Client Authentication” EKU and any Subject Alternative Name. An attacker (or curious intern) requests a certificate with SAN: administrator@domain.com and uses it to authenticate as the domain admin.
Prevention: Restrict templates. Never allow requestors to specify their own Subject. Use “supply in request” only for templates with strict enrollment permissions.
3. No Certificate Inventory
The organization has 3,000 certificates across 500 servers, 50 load balancers, and 12 cloud accounts. Nobody has a complete list. Certificates expire without warning. New certificates are issued without tracking. When an auditor asks “how many certificates do you have?”, the answer is “we don’t know.”
Prevention: Continuous certificate discovery. Scan all endpoints. Query all cloud APIs. Monitor CT logs. Build and maintain a single inventory.
4. Chain Breaks After CA Migration
The organization migrates from an old CA to a new one. New certificates are issued by the new CA. But some clients still have only the old CA in their trust store. The new certificates fail validation on those clients. Or worse: the old CA’s certificate expires, and old certificates (still in use on legacy systems) become unverifiable.
Prevention: Plan CA migrations with overlap periods. Distribute new CA certificates to all trust stores before issuing end-entity certificates from the new CA.
PKI Maturity Model
| Level | Description | Characteristics |
|---|---|---|
| 1 | Ad-hoc | Certificates managed manually. No inventory. Outages from expired certs. |
| 2 | Tracked | Spreadsheet or basic monitoring. Know what exists. Still manual renewal. |
| 3 | Automated | ACME/cert-manager for most certificates. Automated renewal. Monitoring with alerts. |
| 4 | Governed | Policy-driven issuance. Certificate templates enforced. Audit trail. Ownership mapped. |
| 5 | Optimized | Short-lived certificates. Zero-touch lifecycle. Crypto-agility. PQC-ready. |
Most organizations are at Level 1-2. The goal is Level 4 minimum for production environments.
Getting Started: Practical Next Steps
If you have no PKI today:
- Use Let’s Encrypt (ACME) for public-facing TLS certificates
- Use cert-manager in Kubernetes for automated lifecycle
- For internal mTLS: start with Vault PKI or Smallstep — they’re simpler than AD CS for greenfield deployments
If you have PKI but it’s a mess:
- Run a certificate discovery scan across all infrastructure
- Identify certificates with no owner and no renewal automation
- Prioritize: fix the ones expiring soonest, automate the ones renewed most frequently
- Document your hierarchy (even if it’s just a diagram on a wiki page)
If you have mature PKI and want to level up:
- Implement crypto-agility (ability to swap algorithms without re-architecting)
- Build a CBOM (Cryptographic Bill of Materials) — inventory every algorithm and key
- Start testing post-quantum algorithms in non-production
- Reduce certificate lifetimes toward 47-day target
FAQ
Q: Do I need PKI if I just use Let’s Encrypt? A: Let’s Encrypt IS PKI — it’s a public CA that issues certificates. You’re using PKI, you’re just not running your own CA. You need your own private CA when you need certificates for internal services, mTLS, device authentication, or anything that shouldn’t be publicly trusted.
Q: How many CAs do I need? A: Minimum for production: 1 offline Root CA + 1 online Issuing CA (2-tier). Add more Issuing CAs for: different environments (prod/staging), different purposes (TLS vs code signing), or different regions.
Q: Is PKI expensive? A: Public certificates (Let’s Encrypt): free. Private CA (Vault, EJBCA): free software, cost is operational. Cloud CA (AWS Private CA): $400/month + per-cert fees. On-premises with HSMs: $50K-$200K+ upfront. The cost scales with security requirements and compliance needs.
Q: What’s the difference between PKI and CLM? A: PKI is the infrastructure (CAs, hierarchies, trust models). CLM (Certificate Lifecycle Management) is the operational practice of managing certificates issued by that infrastructure (discovery, monitoring, renewal, deployment). You need both.