QCecuring - Enterprise Security Solutions

Service Mesh and mTLS (Istio, Linkerd)

Shivam Sharma

Key Takeaways

  • Service meshes inject sidecar proxies that handle mTLS transparently — applications send plaintext to localhost, the proxy encrypts to the destination
  • Certificates are issued per-pod with short lifetimes (24 hours default in Istio) and rotated automatically — no manual certificate management
  • SPIFFE identities (spiffe://cluster/ns/namespace/sa/service-account) provide workload identity tied to Kubernetes service accounts
  • mTLS in a service mesh provides encryption + authentication + authorization without any application code changes

A service mesh is an infrastructure layer that manages service-to-service communication in Kubernetes, including automatic mTLS encryption. Sidecar proxies (Envoy in Istio, linkerd2-proxy in Linkerd) are injected alongside every pod. These proxies intercept all network traffic, establish mTLS connections with other proxies, and handle certificate issuance, rotation, and revocation — all transparently to the application. The application sends plaintext HTTP to localhost; the proxy encrypts it with mTLS before it leaves the pod.


Why it matters

  • Zero-code encryption — applications don’t need TLS libraries, certificate loading, or connection management. The sidecar proxy handles everything. Existing applications get mTLS without modification.
  • Automatic certificate lifecycle — the mesh control plane issues short-lived certificates (24 hours in Istio, even shorter in Linkerd) and rotates them continuously. No cert-manager, no ACME, no manual renewal.
  • Workload identity — each pod gets a cryptographic identity (SPIFFE ID) derived from its Kubernetes service account. This identity is used for both authentication and authorization policy.
  • Policy enforcement — authorization policies can specify “service A can call service B but not service C” based on cryptographic identity. Network policies alone can’t provide this level of granularity.
  • Observability — because the proxy handles all traffic, the mesh provides metrics (latency, error rates, throughput) and distributed tracing without application instrumentation.

How it works

  1. Sidecar injection — when a pod is created, the mesh injects a sidecar proxy container (via mutating webhook). The proxy intercepts all inbound and outbound traffic.
  2. Identity bootstrapping — the proxy authenticates to the mesh control plane using its Kubernetes service account token (projected volume).
  3. Certificate issuance — the control plane (Istiod in Istio, identity controller in Linkerd) issues a short-lived X.509 certificate with the workload’s SPIFFE ID as the SAN.
  4. mTLS connection — when pod A calls pod B, proxy A initiates mTLS to proxy B. Both present their certificates. Both verify the other’s certificate chains to the mesh CA.
  5. Traffic flows encrypted — application traffic is encrypted between proxies. The application itself sends/receives plaintext on localhost.
  6. Certificate rotation — before the certificate expires, the proxy requests a new one from the control plane. No disruption to active connections.
  7. Authorization check — the receiving proxy checks authorization policy: is the caller’s SPIFFE identity allowed to access this endpoint?

In real systems

Istio — enabling strict mTLS:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # All traffic must be mTLS (no plaintext allowed)
---
# Authorization policy: only frontend can call backend
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/frontend"]
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]

Linkerd — mTLS is on by default:

# Install Linkerd (mTLS enabled automatically)
linkerd install | kubectl apply -f -

# Inject sidecar into a deployment
kubectl get deploy -n production -o yaml | linkerd inject - | kubectl apply -f -

# Verify mTLS is active
linkerd viz edges -n production
# SRC          DST          SRC_NS       DST_NS       SECURED
# frontend     backend      production   production   √ (mTLS)

# Check certificate details
linkerd identity -n production
# Shows: SPIFFE ID, issuer, expiry (typically 24h)

Istio certificate configuration:

# Custom CA integration (use external CA instead of Istio's built-in)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    caCertificates:
    - pem: |
        -----BEGIN CERTIFICATE-----
        <your-root-ca>
        -----END CERTIFICATE-----
  values:
    pilot:
      env:
        EXTERNAL_CA: ISTIOD_RA_KUBERNETES_API

Checking mTLS status:

# Istio: verify mTLS between services
istioctl x describe pod frontend-abc123 -n production
# Shows: mTLS status, certificate chain, policy applied

# Check certificate expiry on a specific proxy
istioctl proxy-config secret frontend-abc123 -n production
# Shows: certificate serial, expiry, SPIFFE ID

Where it breaks

Permissive mode hides unencrypted traffic — Istio’s default PERMISSIVE mode accepts both mTLS and plaintext. This is meant for migration (gradually onboarding services), but teams leave it in permissive permanently. Services without sidecars communicate in plaintext, and nobody notices because the mesh doesn’t reject them. Always move to STRICT mode after migration is complete — permissive mode provides a false sense of security.

Control plane outage blocks new pods — the mesh control plane (Istiod) issues certificates to new pods. If Istiod is down, new pods can’t get certificates and can’t establish mTLS connections. Existing pods continue working (their certificates are still valid), but new deployments, scaling events, and pod restarts fail. Run multiple Istiod replicas and monitor control plane health.

Non-mesh services can’t communicate — after enabling STRICT mTLS, any service without a sidecar (legacy apps, external services, jobs without injection) can’t reach mesh services. The mesh proxy rejects plaintext connections. You must either inject sidecars into all communicating services, or create explicit exceptions (PeerAuthentication with DISABLE for specific ports/services).


Operational insight

The service mesh’s short-lived certificates (24 hours) fundamentally change the certificate management problem. Instead of managing long-lived certificates that might expire unexpectedly, you have certificates that expire constantly and are renewed constantly. The failure mode shifts from “certificate expired because nobody renewed it” to “certificate renewal failed because the control plane is unhealthy.” This means your monitoring focus should be on control plane health (Istiod availability, certificate issuance latency, SDS errors) rather than individual certificate expiry. If the control plane is healthy, certificates are always fresh. If it’s unhealthy, you have 24 hours before existing certificates expire and traffic starts failing.


Ready to Secure Your Enterprise?

Experience how our cryptographic solutions simplify, centralize, and automate identity management for your entire organization.

Stay ahead on cryptography & PKI

Get monthly insights on certificate management, post-quantum readiness, and enterprise security. No spam.

We respect your privacy. Unsubscribe anytime.