T39: System Design - Caching, Queues & Patterns

Caches pre-chop expensive reads. Queues decouple slow work from the user request. CDNs put a copy near every user. Load balancers and replication absorb failures. A system design deep-dive is stitching these together until the latency and throughput numbers fall into place.

Caching: Fast Memory Between You and The Database

A cache stores the result of a slow or expensive operation in fast memory. The canonical pattern is cache-aside: app checks cache; on miss, reads DB and fills cache; on hit, skips DB entirely.

sequenceDiagram participant App participant Cache as Redis participant DB as Database App->>Cache: GET key alt cache hit Cache-->>App: value (fast) else cache miss Cache-->>App: nil App->>DB: SELECT ... DB-->>App: row App->>Cache: SET key value ttl end
// Cache-aside in Node.js
async function getUser(id) {
    const cached = await redis.get(`user:${id}`);
    if (cached) return JSON.parse(cached);

    const row = await db.query("SELECT * FROM users WHERE id = $1", [id]);
    await redis.set(`user:${id}`, JSON.stringify(row), "EX", 300);
    return row;
}

The two hard problems of caching are invalidation (when do you throw stale data out) and stampede (when many requests miss at once and hammer the DB). Fix with TTLs, write-through updates, and single-flight locks on misses.

Where to Cache

Message Queues: Decouple Slow Work

Any operation that takes more than a few hundred milliseconds should not block the user. Queues let the app accept the job and return immediately; a worker reads the queue and does the slow work later.

sequenceDiagram participant User participant API participant Q as Queue (Kafka/SQS) participant W as Worker participant DB User->>API: POST /upload API->>Q: enqueue job API-->>User: 202 Accepted (fast) W->>Q: pull job W->>W: resize, transcode, scan W->>DB: write result

Queues also absorb traffic spikes. If the worker can process 1000/sec and a spike pushes 10,000/sec, the queue flattens the curve instead of dropping requests. Kafka, RabbitMQ, and SQS each make different trade-offs around ordering, durability, and replay.

Load Balancers and Redundancy

A load balancer sits in front of identical app servers and spreads requests. Three jobs: distribute load, detect dead servers (health checks), terminate TLS. Run at least two of everything - load balancer, app, database replica - so any single failure is absorbed.

Client -> DNS -> LB (primary) --> app1
                    LB (standby)   app2
                                   app3

CDNs: A Copy Near Every User

A Content Delivery Network caches your static assets (and sometimes API responses) at hundreds of edge locations around the globe. First user in Tokyo pays the full trip to your origin in Virginia. Next 10,000 users in Tokyo hit the Tokyo edge in 10ms.

// What to put on the CDN
- images, videos, fonts, JS/CSS bundles
- rarely-changing API responses with Cache-Control
- HTML for logged-out pages

Monolith vs Microservices

Do not start with microservices. Every split adds a network hop, a deploy target, and a failure mode.

Back-of-Envelope Numbers Worth Memorizing

Key Takeaways

  • Cache-aside is the default: check cache, miss -> hit DB -> fill cache. Watch for stampedes and invalidation
  • Queues make the API respond fast by handing slow work to workers. They also flatten traffic spikes
  • Run two of everything behind a load balancer so no single failure takes the system down
  • CDNs buy global latency for pennies. Push every static asset and cacheable response to the edge
  • Monolith first, microservices only when the monolith is visibly painful. Extraction is cheaper than un-extraction
  • Keep a rough numbers table in your head: ns, us, ms latencies and per-component throughput