More hardware won’t fix bad engineering

September 15, 2025

1

CS101 teaches Big O notation, but in production, memory rules. Ulrich Drepper’s classic paper from 2007 explains why code that looks linear can behave super-linearly once you thrash caches or wander across NUMA boundaries. Data structures and access patterns that maximize locality (think B-trees with page-sized nodes, Structure of Arrays (SoA) versus Array of Structures (AoS) layouts, ring buffers) are not academic details—they’re the difference between CPUs working and CPUs waiting. Here’s the executive version: Cache-friendly data structures turn compute you’re already paying for into throughput you can actually use.

Storage engines are data structures with budgets

Every database storage engine is a data structure with a profit and loss balance sheet. Storage engines such as B+ trees, which are optimized for fast, disk-based reads and range scans, trade higher write costs (write amplification) for excellent read locality; log-structured merge-trees (LSM trees) flip that, optimizing for high write rates at the cost of compaction and read amplification. Neither is better. Each is a conscious algorithmic trade-off with direct operational consequences (IOPS, SSD wear, CPU burn during compaction). If your workloads are heavy writes with batched reads, LSM makes sense. If your workload is read-latency sensitive with range scans, B+ trees often win. Your choice is a data-structure selection problem mapped onto cloud bills and SLOs. Treat it that way.

Not convinced? There’s an interesting paper by Frank McSherry, Michael Isard, and Derek Murray that asks a blunt question: How many machines do you need before your hip, cool parallel system beats a competent single thread? They call the metric “COST” (configuration that outperforms a single thread), and the answer for many published systems is “a lot”—sometimes hundreds of cores. If a better algorithm or data structure obliterates your need for a cluster, that’s not simply an engineering flex; it’s millions of dollars saved and an attack surface reduced.

Previous articleThreads Experiments With Communities to Spark Topical Engagement

More hardware won’t fix bad engineering

Storage engines are data structures with budgets

Related Articles