Your IAM team manages 5,000 human identities with military precision: SSO, MFA, access reviews, automated offboarding, behavioral analytics. Meanwhile, 200,000 machine identities — TLS certificates, SSH keys, API tokens, service accounts, workload credentials — are scattered across your infrastructure with no inventory, no ownership, no rotation, and no offboarding process.
This isn’t a niche problem. Machine identities are the credentials that authenticate your servers, encrypt your data, sign your code, and connect your services. When they’re compromised, the blast radius is often larger than a compromised human identity — because machines have broader access, operate 24/7, and don’t trigger behavioral anomaly alerts.
The Scale of the Problem
A typical mid-size enterprise (2,000 employees):
| Identity Type | Estimated Count | Managed? |
|---|---|---|
| Human identities (AD/SSO) | 2,500 | ✅ Yes (IAM team) |
| TLS certificates | 3,000-8,000 | ⚠️ Partially |
| SSH keys (authorized_keys entries) | 15,000-30,000 | ❌ Rarely |
| API keys and tokens | 10,000-50,000 | ❌ Almost never |
| Service account credentials | 5,000-15,000 | ⚠️ Partially |
| Container/workload identities | 50,000-200,000 | ⚠️ If using service mesh |
| Total machine identities | ~100,000-300,000 | ~10-20% managed |
The ratio: 45-100 machine identities per human identity. And the machine identities have less governance, less monitoring, and less lifecycle management than the human ones.
Why Machine Identities Are Different
Human identity management is mature because it maps to organizational processes:
| Event | Human Identity | Machine Identity |
|---|---|---|
| Creation | HR hires → IT provisions | Developer deploys → credential created (no ticket, no approval) |
| Access review | Quarterly review (SOX, SOC 2) | Never reviewed (or reviewed without understanding) |
| Role change | Manager approves → access updated | Service changes → old credentials persist alongside new |
| Termination | HR terminates → access revoked same day | Service decommissioned → credentials forgotten |
| Compromise | Lock account, force password reset | ??? (often: nobody knows the credential exists) |
The fundamental gap: human identities have organizational lifecycle events. Machine identities don’t. Nobody sends an “offboarding ticket” when a microservice is decommissioned. Nobody does an “access review” for API tokens.
The Risks: What Happens When Machine Identities Are Unmanaged
Risk 1: Expired Certificates Cause Outages
The #1 operational risk. Certificates expire on a fixed date. If nobody is tracking them, services go down without warning. Average cost: $100K-$500K per incident (revenue loss + emergency response + customer impact).
Real examples: Microsoft Teams (2020), Spotify (2020), Let’s Encrypt root expiry affecting millions of devices (2021).
Risk 2: Stolen Credentials Enable Breaches
SSH keys, API tokens, and service account credentials that are never rotated become permanent attack vectors. An attacker who obtains a 3-year-old SSH key has the same access as the day it was created.
Real examples: Uber breach (2022) — attacker used a compromised service account. Codecov (2021) — stolen credentials from CI/CD environment variables.
Risk 3: Orphaned Credentials Create Backdoors
When engineers leave or services are decommissioned, their machine credentials persist. These orphaned credentials are:
- Not monitored (nobody knows they exist)
- Not rotated (no owner to perform rotation)
- Still active (granting the same access as when created)
- Perfect for attackers (low-profile, no behavioral baseline to trigger alerts)
Risk 4: Compliance Failures
Auditors increasingly ask about machine identity governance:
- “Show me your certificate inventory” (SOC 2, ISO 27001)
- “How do you manage SSH keys?” (CIS Benchmarks, NIST 800-53)
- “What’s your key rotation schedule?” (PCI DSS 3.5-3.6)
- “How do you handle credential offboarding?” (SOX, SOC 2)
If you can’t answer these questions with evidence, it’s a finding.
Building a Machine Identity Management Program
Phase 1: Visibility (Months 1-3)
You can’t manage what you can’t see. Build a complete inventory:
TLS Certificates:
- Network scanning (all TLS ports across all IP ranges)
- Cloud API queries (AWS ACM, Azure Key Vault, GCP Certificate Manager)
- Kubernetes Secret enumeration (all clusters, all namespaces)
- Certificate Transparency log monitoring (detect certificates issued for your domains)
SSH Keys:
- Scan all servers for authorized_keys files
- Inventory CI/CD SSH secrets
- Check configuration management for deployed keys
- Identify keys with no identifiable owner (no comment field, unknown fingerprint)
API Tokens and Service Accounts:
- Query cloud IAM (AWS IAM users/roles, GCP service accounts, Azure service principals)
- Inventory secrets managers (Vault, AWS Secrets Manager)
- Scan CI/CD platforms for stored secrets
- Check application configurations for embedded credentials
Output: A single inventory with: credential type, owner, purpose, creation date, last used date, expiry (if any), and management status (automated/manual/unmanaged).
Phase 2: Ownership (Months 3-6)
Every machine identity needs an owner — someone responsible for its lifecycle:
- TLS certificates: Owned by the team that operates the service
- SSH keys: Owned by the individual (personal keys) or team (service account keys)
- API tokens: Owned by the team that created the integration
- Service accounts: Owned by the team that operates the workload
For orphaned credentials (no identifiable owner): assign to the infrastructure/security team for investigation and potential decommissioning.
Phase 3: Policy (Months 6-9)
Define and enforce standards:
Machine Identity Policy:
├── TLS Certificates
│ ├── Maximum validity: 90 days (public), 1 year (internal)
│ ├── Minimum key size: ECDSA P-256 or RSA 2048
│ ├── Approved CAs: Let's Encrypt (public), Vault PKI (internal)
│ ├── Renewal: Automated (ACME/cert-manager) — no manual renewal
│ └── Monitoring: Alert at 30, 14, 7 days before expiry
├── SSH Keys
│ ├── Algorithm: Ed25519 only
│ ├── Maximum age: 12 months
│ ├── Passphrase: Required for interactive keys
│ ├── Shared keys: Prohibited
│ └── Target: Migrate to SSH certificates within 18 months
├── API Tokens
│ ├── Maximum lifetime: 90 days (rotate quarterly)
│ ├── Storage: Secrets manager only (never in code/config)
│ ├── Scope: Minimum necessary permissions
│ └── Audit: Log all usage, alert on anomalies
└── Service Accounts
├── Naming convention: svc-{team}-{purpose}
├── Permissions: Least privilege, reviewed quarterly
├── Credentials: Short-lived where possible (workload identity)
└── Offboarding: Disable when associated service is decommissioned
Phase 4: Automation (Months 9-12)
Automate lifecycle operations:
- Certificate renewal: ACME, cert-manager, CLM platform
- SSH key rotation: Ansible with exclusive authorized_keys, or SSH certificates
- API token rotation: Secrets manager with TTL-based expiry
- Service account credential rotation: Cloud-native workload identity (no static credentials)
- Offboarding: Tie credential decommissioning to service lifecycle (delete service → revoke credentials)
Phase 5: Monitoring and Governance (Ongoing)
- Expiry monitoring: Alert before any credential expires
- Usage monitoring: Detect unused credentials (candidates for removal)
- Anomaly detection: Alert on credentials used from unexpected locations/times
- Compliance reporting: Generate evidence for auditors (inventory, rotation history, access reviews)
- Quarterly reviews: Review all machine identities, confirm ownership, validate necessity
The Organizational Question
The biggest barrier to machine identity management isn’t technical — it’s organizational. Who owns this program?
| Option | Pros | Cons |
|---|---|---|
| Security team | Understands risk, drives compliance | May lack operational context |
| Infrastructure/Platform team | Operates the systems, understands dependencies | May deprioritize vs feature work |
| IAM team (extended scope) | Already manages human identities, has governance processes | May lack technical depth for certificates/keys |
| Dedicated Machine Identity team | Focused, accountable | Requires headcount investment |
Recommendation: Extend the IAM team’s scope to include machine identities. They already have the governance muscle (policies, reviews, audits). They need technical support from infrastructure/security for implementation. The worst option: nobody owns it (current state at most organizations).
FAQ
Q: Where do I start if I have zero visibility today? A: Start with TLS certificates — they’re the most visible (network-scannable) and have the most immediate risk (expiry = outage). Run a network scan across all your IP ranges on port 443. That single action will reveal certificates you didn’t know existed.
Q: How do I justify the investment to leadership? A: Calculate the cost of your last certificate outage (or estimate one). Multiply by the probability of recurrence (67% of organizations have one per year). Compare to the cost of a CLM platform + operational time. The ROI is usually obvious.
Q: Should I buy a platform or build with open-source tools? A: For TLS certificates: cert-manager (K8s) + ACME (traditional) covers most automation needs for free. For SSH keys: Ansible + SSH certificates (Smallstep/Vault) works. For unified visibility across all identity types: you likely need a platform (commercial CLM/machine identity solution). Build for automation, buy for visibility.
Q: How long does it take to get to a mature state? A: Visibility (inventory): 1-3 months. Basic automation (cert renewal): 3-6 months. Full governance (policy, ownership, reviews): 12-18 months. The key: start with visibility. Everything else builds on knowing what you have.