Core Concepts
| Item | Description |
|---|
Scalability | Ability to handle growth — horizontal (more machines) vs vertical (bigger machine) |
Availability | Percentage of time operational — 99.9% = 8.76h downtime/year |
CAP Theorem | Choose 2 of 3: Consistency, Availability, Partition Tolerance |
Latency | Request time — P50 (median), P95, P99 (tail latency matters most) |
Throughput | Requests per second — measured in QPS or RPS |
CAP Theorem Choices
| Choice | You Get | Best For |
|---|
| CP | Strong consistency during partitions | Banking, payments |
| AP | Always available during partitions | Social media, CDNs |
| CA | Both when no partition | Single-node databases |
Design Process
| Item | Description |
|---|
1. Requirements | Functional + non-functional (scale, latency, durability) |
2. Estimation | QPS, storage, bandwidth — 1M DAU x 10 req/day = ~115 QPS avg |
3. API Design | REST/GraphQL/gRPC — endpoints, request/response |
4. Data Model | SQL vs NoSQL — schemas, indexes, partition keys |
5. High-Level | Boxes and arrows — services, DBs, caches, queues |
6. Deep Dive | Critical components — SPOFs, bottlenecks |
Key Tradeoffs
| Item | Description |
|---|
Read vs Write | Indexes speed reads, slow writes. Caches add consistency issues. |
Sync vs Async | Sync = strong consistency, higher latency. Async = eventual, better throughput. |
SQL vs NoSQL | SQL = ACID, joins, schema. NoSQL = flexible, horizontal scaling, BASE. |
Monolith vs Microservices | Monolith = simpler. Microservices = independent scaling, ops complexity. |
Pro Tip: Start with a monolith and split when you have clear bounded contexts. Premature distribution creates complexity without the scale to justify it.