Sharding Strategies
| Strategy | How It Works | Best For |
|---|
| Range-Based | Data split by value ranges | Range queries, simple |
| Hash-Based | hash(key) % N determines shard | Even distribution |
| Directory-Based | Lookup table maps key → shard | Flexible rebalancing |
| Geo-Based | Partitioned by region | Low latency for local users |
Shard Key Selection
| Item | Description |
|---|
High Cardinality | Many distinct values — 10M users across 4 shards works |
Even Distribution | Avoid hot spots — hash user_id, not range on date |
Query Locality | Keep related data on same shard |
Avoid Cross-Shard Joins | Denormalize or use app-level joins |
Common Architectures
| Item | Description |
|---|
Vitess (MySQL) | YouTube sharding middleware — auto resharding, connection pooling |
Citus (PostgreSQL) | Distributed PostgreSQL — shards + coordinator, SQL-compatible |
DynamoDB | AWS managed — partition key, auto-scaling |
Cassandra | Ring-based consistent hashing — no master, gossip protocol |
Cross-Shard Challenges
| Item | Description |
|---|
Joins | Impossible at DB level — app-level or denormalize |
Transactions | 2PC is slow — use eventual consistency or Saga pattern |
Unique Constraints | Use UUIDv7 or global sequence service |
Resharding | Use consistent hashing to minimize data movement |
Pro Tip: Choose your shard key carefully — changing it later requires resharding ALL data. Hash user_id for even distribution.