T39: System Design - Caching, Queues & Patterns

Caches pre-chop expensive reads. Queues decouple slow work from the user request. CDNs put a copy near every user. Load balancers and replication absorb failures. A system design deep-dive is stitching these together until the latency and throughput numbers fall into place.

Caching: Fast Memory Between You and The Database

A cache stores the result of a slow or expensive operation in fast memory. The canonical pattern is cache-aside: app checks cache; on miss, reads DB and fills cache; on hit, skips DB entirely.

sequenceDiagram participant App participant Cache as Redis participant DB as Database App->>Cache: GET key alt cache hit Cache-->>App: value (fast) else cache miss Cache-->>App: nil App->>DB: SELECT ... DB-->>App: row App->>Cache: SET key value ttl end

// Cache-aside in Node.js
async function getUser(id) {
    const cached = await redis.get(`user:${id}`);
    if (cached) return JSON.parse(cached);

    const row = await db.query("SELECT * FROM users WHERE id = $1", [id]);
    await redis.set(`user:${id}`, JSON.stringify(row), "EX", 300);
    return row;
}

The two hard problems of caching are invalidation (when do you throw stale data out) and stampede (when many requests miss at once and hammer the DB). Fix with TTLs, write-through updates, and single-flight locks on misses.

Where to Cache

Browser cache - closest to user, controlled by Cache-Control headers
CDN (edge cache) - static assets, public API responses. Global, cheap, fast
Application cache - in-process memory or Redis. Good for per-user data and hot rows
Database cache - the DB's own buffer pool. Free, already tuned

Message Queues: Decouple Slow Work

Any operation that takes more than a few hundred milliseconds should not block the user. Queues let the app accept the job and return immediately; a worker reads the queue and does the slow work later.

sequenceDiagram participant User participant API participant Q as Queue (Kafka/SQS) participant W as Worker participant DB User->>API: POST /upload API->>Q: enqueue job API-->>User: 202 Accepted (fast) W->>Q: pull job W->>W: resize, transcode, scan W->>DB: write result

Queues also absorb traffic spikes. If the worker can process 1000/sec and a spike pushes 10,000/sec, the queue flattens the curve instead of dropping requests. Kafka, RabbitMQ, and SQS each make different trade-offs around ordering, durability, and replay.

Load Balancers and Redundancy

A load balancer sits in front of identical app servers and spreads requests. Three jobs: distribute load, detect dead servers (health checks), terminate TLS. Run at least two of everything - load balancer, app, database replica - so any single failure is absorbed.

Client -> DNS -> LB (primary) --> app1
                    LB (standby)   app2
                                   app3

CDNs: A Copy Near Every User

A Content Delivery Network caches your static assets (and sometimes API responses) at hundreds of edge locations around the globe. First user in Tokyo pays the full trip to your origin in Virginia. Next 10,000 users in Tokyo hit the Tokyo edge in 10ms.

// What to put on the CDN
- images, videos, fonts, JS/CSS bundles
- rarely-changing API responses with Cache-Control
- HTML for logged-out pages

Monolith vs Microservices

Do not start with microservices. Every split adds a network hop, a deploy target, and a failure mode.

Monolith: one codebase, one deploy. Fast to iterate, simple to debug. Breaks down at ~50 engineers or obvious bottleneck components.
Microservices: separate codebases, separate deploys, API or queue between. Each team owns a service. Pays off at scale, costs a lot up front. Extract only when the monolith is visibly painful.

Back-of-Envelope Numbers Worth Memorizing

L1 cache: ~1 ns. Memory: ~100 ns. SSD: ~100 us. Network round trip same region: ~1 ms. Cross-region: ~100 ms.
A modern CPU server handles ~10k-100k req/sec for simple JSON.
Postgres handles ~10k writes/sec / ~50k reads/sec before tuning.
Redis handles ~100k-1M ops/sec.
100M events/day = ~1,160/sec average, ~10k/sec at peak.

Key Takeaways

Cache-aside is the default: check cache, miss -> hit DB -> fill cache. Watch for stampedes and invalidation
Queues make the API respond fast by handing slow work to workers. They also flatten traffic spikes
Run two of everything behind a load balancer so no single failure takes the system down
CDNs buy global latency for pennies. Push every static asset and cacheable response to the edge
Monolith first, microservices only when the monolith is visibly painful. Extraction is cheaper than un-extraction
Keep a rough numbers table in your head: ns, us, ms latencies and per-component throughput