Cryptographic hash functions are the silent workhorses of modern security. Every TLS handshake, every Git commit, every blockchain transaction, and every password stored in a database relies on hash functions to maintain integrity and trust. Yet many developers use them without fully understanding their properties, limitations, or when one algorithm should be chosen over another.
This guide breaks down how hash functions work, examines the major algorithms from MD5 through BLAKE3, and provides practical guidance on choosing the right hash function for your use case.
What Is a Cryptographic Hash Function?
A cryptographic hash function is a mathematical algorithm that takes an arbitrary-length input and produces a fixed-length output (called a digest or hash). Unlike encryption, hashing is a one-way operation — you cannot recover the original input from the hash output.
import hashlib
# Same input always produces the same output
message = b"Hello, World!"
digest = hashlib.sha256(message).hexdigest()
print(digest) # dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Essential Properties
Every secure cryptographic hash function must satisfy these properties:
| Property | Description | What It Prevents |
|---|---|---|
| Deterministic | Same input always produces same output | Unpredictable behavior |
| Pre-image Resistance | Given hash h, infeasible to find m where H(m) = h | Reversing the hash |
| Second Pre-image Resistance | Given m₁, infeasible to find m₂ where H(m₁) = H(m₂) | Forging a different message with same hash |
| Collision Resistance | Infeasible to find any m₁ ≠ m₂ where H(m₁) = H(m₂) | Birthday attacks |
| Avalanche Effect | Small input change causes drastic output change | Pattern analysis |
| Fixed Output Length | Output size is constant regardless of input size | Length-based information leakage |
The Avalanche Effect in Practice
Changing a single bit in the input should change approximately 50% of the output bits:
import hashlib
hash1 = hashlib.sha256(b"Hello").hexdigest()
hash2 = hashlib.sha256(b"Hellp").hexdigest() # One character different
print(f"Input 1: Hello -> {hash1}")
print(f"Input 2: Hellp -> {hash2}")
# Completely different outputs despite minimal input change
MD5: Broken but Still Everywhere
MD5 (Message Digest 5) produces a 128-bit hash and was designed by Ronald Rivest in 1991. It’s fast and widely supported, but cryptographically broken.
Why MD5 Is Insecure
- 2004: Wang et al. demonstrated practical collision attacks
- 2008: Researchers created a rogue CA certificate using MD5 collisions
- 2012: The Flame malware exploited MD5 collisions in Windows Update certificates
# MD5 collision example - two different files with the same MD5 hash
# This is trivially achievable with tools like HashClash
md5sum file1.bin file2.bin
# d131dd02c5e6eec4 file1.bin
# d131dd02c5e6eec4 file2.bin (same hash, different content!)
When MD5 Is Still Acceptable
MD5 remains acceptable only for non-security purposes:
- Checksums for data integrity in trusted environments (detecting accidental corruption)
- Hash table keys and deduplication
- Cache invalidation identifiers
Never use MD5 for: digital signatures, certificate fingerprints, password hashing, or any security-critical application.
SHA-1: Deprecated Since 2017
SHA-1 produces a 160-bit hash and was the standard for over a decade. Google’s SHAttered attack in 2017 demonstrated a practical collision, producing two different PDF files with identical SHA-1 hashes at a cost of approximately $110,000 in compute.
# SHA-1 is deprecated - don't use for new applications
openssl dgst -sha1 document.pdf
# Still seen in legacy Git commits (Git is migrating to SHA-256)
All major browsers and CAs stopped accepting SHA-1 certificates by 2017. If you encounter SHA-1 in production systems, prioritize migration.
SHA-2 Family: The Current Standard
SHA-2, designed by the NSA and published by NIST in 2001, remains the most widely deployed family of hash functions. It includes several variants:
| Variant | Output Size | Block Size | Rounds | Primary Use Case |
|---|---|---|---|---|
| SHA-224 | 224 bits | 512 bits | 64 | Truncated SHA-256 |
| SHA-256 | 256 bits | 512 bits | 64 | General purpose, TLS, blockchain |
| SHA-384 | 384 bits | 1024 bits | 80 | Truncated SHA-512 |
| SHA-512 | 512 bits | 1024 bits | 80 | High-security applications |
| SHA-512/256 | 256 bits | 1024 bits | 80 | 64-bit optimized alternative to SHA-256 |
SHA-256 in Practice
SHA-256 is the default choice for most applications:
# File integrity verification
sha256sum important-release.tar.gz
# Certificate fingerprint
openssl x509 -in cert.pem -fingerprint -sha256 -noout
# Git commit hashing (Git 2.29+ supports SHA-256)
git init --object-format=sha256
import hashlib
# HMAC-SHA256 for message authentication
import hmac
key = b"secret-key"
message = b"authenticate this message"
mac = hmac.new(key, message, hashlib.sha256).hexdigest()
SHA-512 for 64-bit Systems
On 64-bit processors, SHA-512 is often faster than SHA-256 because it operates on 64-bit words natively:
# Benchmark on a modern x86_64 system
openssl speed sha256 sha512
# SHA-256: ~1200 MB/s
# SHA-512: ~1800 MB/s (faster on 64-bit!)
Security Status of SHA-2
No practical attacks exist against SHA-2. The theoretical best attack against SHA-256 reduces the collision resistance from 2¹²⁸ to approximately 2⁶⁵ for a reduced-round variant (31 of 64 rounds). The full algorithm remains secure with a comfortable margin.
SHA-3 (Keccak): The Alternative Standard
SHA-3, based on the Keccak sponge construction, was standardized by NIST in 2015 as a backup to SHA-2. It uses a fundamentally different design — if a structural weakness is found in SHA-2’s Merkle-Damgård construction, SHA-3 would remain unaffected.
Sponge Construction
Unlike SHA-2’s Merkle-Damgård structure, SHA-3 uses a sponge construction:
- Absorb phase: Input blocks are XORed into the state
- Squeeze phase: Output blocks are extracted from the state
import hashlib
# SHA-3 variants
sha3_256 = hashlib.sha3_256(b"Hello, World!").hexdigest()
sha3_512 = hashlib.sha3_512(b"Hello, World!").hexdigest()
# SHAKE - extendable output functions (XOF)
shake_128 = hashlib.shake_128(b"Hello, World!").hexdigest(32) # 32 bytes output
shake_256 = hashlib.shake_256(b"Hello, World!").hexdigest(64) # 64 bytes output
SHA-3 Variants
| Variant | Output Size | Security Level | Notes |
|---|---|---|---|
| SHA3-224 | 224 bits | 112-bit | Drop-in SHA-224 replacement |
| SHA3-256 | 256 bits | 128-bit | Drop-in SHA-256 replacement |
| SHA3-384 | 384 bits | 192-bit | Drop-in SHA-384 replacement |
| SHA3-512 | 512 bits | 256-bit | Drop-in SHA-512 replacement |
| SHAKE128 | Variable | 128-bit | Extendable output |
| SHAKE256 | Variable | 256-bit | Extendable output |
When to Use SHA-3
- When regulatory requirements mandate algorithm diversity
- In systems requiring defense-in-depth against structural attacks on Merkle-Damgård
- When you need variable-length output (SHAKE)
- In post-quantum contexts (SHA-3 has better quantum resistance margins)
BLAKE2 and BLAKE3: Speed Without Compromise
BLAKE2
BLAKE2 (2012) was designed as a faster alternative to MD5 while maintaining security equivalent to SHA-3. It comes in two variants:
- BLAKE2b: Optimized for 64-bit platforms, up to 64-byte output
- BLAKE2s: Optimized for 32-bit platforms, up to 32-byte output
import hashlib
# BLAKE2b - faster than SHA-256 on 64-bit systems
digest = hashlib.blake2b(b"Hello, World!", digest_size=32).hexdigest()
# BLAKE2b with key (built-in MAC, no need for HMAC construction)
keyed = hashlib.blake2b(b"message", key=b"secret-key-here!", digest_size=32).hexdigest()
BLAKE3
BLAKE3 (2020) takes performance further with a Merkle tree structure enabling:
- Parallelism: Leverages SIMD and multi-core processors
- Streaming: Incremental updates without buffering
- Speed: 3-5x faster than SHA-256 on modern hardware
// BLAKE3 in Rust
use blake3;
let hash = blake3::hash(b"Hello, World!");
println!("{}", hash.to_hex());
// Incremental hashing
let mut hasher = blake3::Hasher::new();
hasher.update(b"chunk 1");
hasher.update(b"chunk 2");
let hash = hasher.finalize();
# BLAKE3 CLI tool - extremely fast file hashing
b3sum large-file.iso
# Processes multi-GB files in seconds using all CPU cores
Performance Comparison
Benchmarks on a modern x86_64 processor (single-threaded, long messages):
| Algorithm | Speed (GB/s) | Output Size | Security Level |
|---|---|---|---|
| MD5 | ~0.7 | 128 bits | Broken |
| SHA-1 | ~0.9 | 160 bits | Broken |
| SHA-256 | ~1.2 | 256 bits | 128-bit |
| SHA-512 | ~1.8 | 512 bits | 256-bit |
| SHA3-256 | ~0.8 | 256 bits | 128-bit |
| BLAKE2b | ~2.0 | Up to 512 bits | 128-bit |
| BLAKE3 | ~6.0+ | 256 bits | 128-bit |
HMAC: Hash-Based Message Authentication
HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to provide both integrity and authentication:
import hmac
import hashlib
# API request signing
def sign_request(secret_key: bytes, method: str, path: str, body: bytes, timestamp: str) -> str:
message = f"{method}\n{path}\n{timestamp}\n".encode() + body
return hmac.new(secret_key, message, hashlib.sha256).hexdigest()
# Verification
def verify_signature(secret_key: bytes, message: bytes, signature: str) -> bool:
expected = hmac.new(secret_key, message, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, signature) # Constant-time comparison
Important: Always use hmac.compare_digest() for signature verification to prevent timing attacks.
Password Hashing: A Different Category
General-purpose hash functions like SHA-256 are not suitable for password hashing. Password hashing requires algorithms that are intentionally slow and memory-hard to resist brute-force attacks.
Argon2 (Recommended)
Winner of the 2015 Password Hashing Competition:
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=3, # Number of iterations
memory_cost=65536, # 64 MB memory usage
parallelism=4 # Number of threads
)
# Hash a password
hash = ph.hash("user-password-here")
# Verify
try:
ph.verify(hash, "user-password-here")
except argon2.exceptions.VerifyMismatchError:
print("Invalid password")
bcrypt
Battle-tested and widely supported:
import bcrypt
# Hash with automatic salt generation
password = b"user-password"
hashed = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))
# Verify
if bcrypt.checkpw(password, hashed):
print("Password matches")
Password Hashing Comparison
| Algorithm | Memory-Hard | GPU Resistant | Max Input | Recommended |
|---|---|---|---|---|
| Argon2id | Yes | Yes | Unlimited | New applications |
| bcrypt | No | Moderate | 72 bytes | Legacy/compatibility |
| scrypt | Yes | Yes | Unlimited | When Argon2 unavailable |
| PBKDF2 | No | No | Unlimited | Only if mandated (FIPS) |
Use Cases and Algorithm Selection
Integrity Verification
# Software distribution
sha256sum release-v2.1.0.tar.gz > SHA256SUMS
gpg --sign SHA256SUMS # Sign the checksum file
# Verification
sha256sum -c SHA256SUMS
gpg --verify SHA256SUMS.sig
Digital Signatures
Hash functions are integral to digital signature schemes. The document is hashed first, then the hash is signed with the private key:
# Sign a document
openssl dgst -sha256 -sign private.key -out signature.bin document.pdf
# Verify
openssl dgst -sha256 -verify public.key -signature signature.bin document.pdf
Certificate Fingerprints
TLS certificates use hash functions for fingerprinting and chain validation:
# View certificate fingerprint
openssl x509 -in server.crt -fingerprint -sha256 -noout
# SHA256 Fingerprint=A1:B2:C3:D4:...
Blockchain and Merkle Trees
Bitcoin uses double-SHA-256, while Ethereum uses Keccak-256 (a SHA-3 variant):
import hashlib
# Bitcoin-style double SHA-256
def double_sha256(data: bytes) -> bytes:
return hashlib.sha256(hashlib.sha256(data).digest()).digest()
# Merkle tree node
def merkle_parent(left: bytes, right: bytes) -> bytes:
return double_sha256(left + right)
Choosing the Right Hash Function
┌─────────────────────────────────────────────────┐
│ Hash Function Decision Tree │
├─────────────────────────────────────────────────┤
│ │
│ Is it for password storage? │
│ ├── YES → Use Argon2id (or bcrypt) │
│ └── NO ↓ │
│ │
│ Is it for message authentication? │
│ ├── YES → Use HMAC-SHA-256 │
│ └── NO ↓ │
│ │
│ Do you need maximum performance? │
│ ├── YES → Use BLAKE3 │
│ └── NO ↓ │
│ │
│ Is FIPS compliance required? │
│ ├── YES → Use SHA-256 or SHA-3-256 │
│ └── NO ↓ │
│ │
│ Default → SHA-256 (universal compatibility) │
│ │
└─────────────────────────────────────────────────┘
Quick Reference
| Use Case | Recommended Algorithm | Avoid |
|---|---|---|
| General integrity | SHA-256 | MD5, SHA-1 |
| High-performance hashing | BLAKE3 | MD5 |
| Password storage | Argon2id | SHA-256, MD5 |
| Digital signatures | SHA-256 or SHA-384 | SHA-1, MD5 |
| FIPS-compliant systems | SHA-2 or SHA-3 | BLAKE2/3 |
| API authentication | HMAC-SHA-256 | Plain hash comparison |
| File deduplication | BLAKE3 or SHA-256 | MD5 (if security matters) |
| Certificate fingerprints | SHA-256 | SHA-1, MD5 |
Migrating Away from Deprecated Algorithms
If your systems still use MD5 or SHA-1, here’s a migration approach:
- Inventory: Scan codebases and configurations for deprecated algorithms
- Classify: Determine if usage is security-critical or non-critical
- Plan: For security-critical uses, migrate to SHA-256 minimum
- Dual-hash period: During transition, compute both old and new hashes
- Verify: Confirm all consumers support the new algorithm
- Deprecate: Remove old algorithm support
Tools like QCecuring’s cryptographic bill of materials (CBOM) can automate the discovery of deprecated hash functions across your infrastructure, identifying every instance of MD5 or SHA-1 usage in certificates, code, and configurations.
Key Takeaways
- MD5 and SHA-1 are broken — never use them for security purposes. Migrate any remaining usage to SHA-256 or better.
- SHA-256 is the safe default for most applications. It’s universally supported, FIPS-compliant, and has no known practical weaknesses.
- SHA-3 provides algorithm diversity — use it when you need independence from SHA-2’s design or when regulations require it.
- BLAKE3 offers superior performance for applications where speed matters and FIPS compliance isn’t required.
- Password hashing requires specialized algorithms — Argon2id is the current best choice; never use general-purpose hash functions for passwords.
- HMAC adds authentication — always use HMAC (not plain hashing) when you need to verify both integrity and origin.
- The avalanche effect is your friend — it’s what makes hash functions useful for detecting even the smallest changes to data.
- Plan for algorithm agility — design systems that can swap hash functions without architectural changes, because today’s secure algorithm may be tomorrow’s deprecated one.