Database Sharding Cheat Sheet

Database sharding strategies — horizontal partitioning, shard keys, consistent hashing, resharding, and cross-shard query handling.

Last Updated: May 1, 2025

Sharding Strategies

StrategyHow It WorksBest For
Range-BasedData split by value rangesRange queries, simple
Hash-Basedhash(key) % N determines shardEven distribution
Directory-BasedLookup table maps key → shardFlexible rebalancing
Geo-BasedPartitioned by regionLow latency for local users

Shard Key Selection

ItemDescription
High CardinalityMany distinct values — 10M users across 4 shards works
Even DistributionAvoid hot spots — hash user_id, not range on date
Query LocalityKeep related data on same shard
Avoid Cross-Shard JoinsDenormalize or use app-level joins

Common Architectures

ItemDescription
Vitess (MySQL)YouTube sharding middleware — auto resharding, connection pooling
Citus (PostgreSQL)Distributed PostgreSQL — shards + coordinator, SQL-compatible
DynamoDBAWS managed — partition key, auto-scaling
CassandraRing-based consistent hashing — no master, gossip protocol

Cross-Shard Challenges

ItemDescription
JoinsImpossible at DB level — app-level or denormalize
Transactions2PC is slow — use eventual consistency or Saga pattern
Unique ConstraintsUse UUIDv7 or global sequence service
ReshardingUse consistent hashing to minimize data movement
Pro Tip: Choose your shard key carefully — changing it later requires resharding ALL data. Hash user_id for even distribution.