CS101 teaches Big O notation, but in production, memory rules. Ulrich Drepper’s classic paper from 2007 explains why code that looks linear can behave super-linearly once you thrash caches or wander across NUMA boundaries. Data structures and access patterns that maximize locality (think B-trees with page-sized nodes, Structure of Arrays (SoA) versus Array of Structures (AoS) layouts, ring buffers) are not academic details—they’re the difference between CPUs working and CPUs waiting. Here’s the executive version: Cache-friendly data structures turn compute you’re already paying for into throughput you can actually use.
Storage engines are data structures with budgets
Every database storage engine is a data structure with a profit and loss balance sheet. Storage engines such as B+ trees, which are optimized for fast, disk-based reads and range scans, trade higher write costs (write amplification) for excellent read locality; log-structured merge-trees (LSM trees) flip that, optimizing for high write rates at the cost of compaction and read amplification. Neither is better. Each is a conscious algorithmic trade-off with direct operational consequences (IOPS, SSD wear, CPU burn during compaction). If your workloads are heavy writes with batched reads, LSM makes sense. If your workload is read-latency sensitive with range scans, B+ trees often win. Your choice is a data-structure selection problem mapped onto cloud bills and SLOs. Treat it that way.
Not convinced? There’s an interesting paper by Frank McSherry, Michael Isard, and Derek Murray that asks a blunt question: How many machines do you need before your hip, cool parallel system beats a competent single thread? They call the metric “COST” (configuration that outperforms a single thread), and the answer for many published systems is “a lot”—sometimes hundreds of cores. If a better algorithm or data structure obliterates your need for a cluster, that’s not simply an engineering flex; it’s millions of dollars saved and an attack surface reduced.