Understanding Moore’s Law Today

Authors: Jacob Sussmilch

The Observation Everyone Misquotes

Moore’s Law is an empirical observation Gordon Moore made in 1965: the number of components on an integrated circuit — at the price point where each component was cheapest to manufacture — was doubling roughly every year, and he predicted this trend would continue. The original claim was as much economic as physical. It wasn’t just that chips were getting more complex; it was that they were getting more complex at a cost the market could absorb. He revised the cadence to every two years in 1975, and as transistors became the dominant component type, the industry increasingly measured complexity in transistor counts. That version became the standard formulation. For five decades, the trend held. The prediction became self-fulfilling: the semiconductor industry organized its R&D roadmaps around maintaining the pace Moore described, and capital allocation followed.

What Moore’s Law actually says is narrow: transistor density on a chip doubles approximately every two years. What people think it says is broad: computers get twice as fast every two years. These are not the same claim, and the gap between them is where most confusion about the current state of computing lives. It is also worth noting that Moore’s Law comes from an era when the relevant chip was the CPU — a general-purpose processor where more transistors reliably translated into more performance across all workloads. That assumption no longer holds in a world where the most consequential chips are GPUs, TPUs, and other specialized accelerators whose performance depends heavily on the specific workload they are running.

What Shifted

Transistor density scaling has slowed. The cadence has stretched from roughly two years to closer to three, and each new process node (5nm, 3nm, 2nm) requires exponentially more capital to develop. TSMC’s fabrication costs per transistor have stopped falling at the historical rate. The economic version of Moore’s Law — that you get twice the transistors for the same price — is largely over.

But here is what did not end: the rate of improvement in useful computation. By 2018, the compute used in the largest AI training runs had grown by roughly 300,000x since 2012. That figure reflects both hardware gains and a massive increase in willingness to spend — training budgets went from thousands to billions of dollars. The raw number overstates efficiency improvement. However, controlling for spend, the gains are real and far faster than Moore’s Law ever delivered. They come from a different source — not from shrinking transistors, but from specializing hardware, exploiting parallelism, and optimizing software for the specific workloads that matter.

This is the critical distinction. Moore’s Law was a general-purpose scaling trend. A faster CPU made everything faster — spreadsheets, games, databases, simulations. The improvements were architecture-agnostic. You didn’t need to know what the transistors would be used for in order to benefit from having more of them.

The locus of progress shifted to special-purpose scaling. GPUs are not better CPUs. They are massively parallel processors optimized for the specific math that neural networks require — matrix multiplications, primarily. TPUs go further, stripping away general-purpose functionality to accelerate a narrower set of operations even more efficiently. The gains are real, but they accrue to specific workloads. A modern AI accelerator is dramatically better at training neural networks than a chip from 2012. It is not dramatically better at running a database query.

The Consequences of Specialization

This shift from general to special-purpose scaling has implications that the AI discourse largely ignores.

First, it fragments the compute landscape. Under Moore’s Law, there was one frontier: the fastest general-purpose processor. Today there are many frontiers. The best chip for training large language models (NVIDIA H100) is not the best chip for inference (purpose-built inference accelerators optimize differently), which is not the best chip for edge deployment (Apple’s Neural Engine), which is not the best chip for scientific simulation (still often CPUs or custom FPGAs). There is no single “compute frontier” anymore. There are multiple frontiers, each advancing at different rates, for different workloads.

Second, it makes compute gains contingent on software. A GPU’s theoretical peak performance (measured in FLOPS) is almost never achieved in practice. The gap between theoretical and actual performance (hardware utilization) is where much of the real scaling happens. Take Flash Attention for example — it rearranged memory access patterns to avoid bottlenecks in the GPU’s memory hierarchy, delivering 2-4x speedups on the same hardware. Mixed-precision training doesn’t require new chips. Instead it uses existing hardware more efficiently by computing at lower numerical precision where full precision isn’t needed.

This means that raw hardware specs increasingly understate or overstate actual capability, depending on how good the software stack is. DeepSeek’s engineers trained V3 on 2,048 NVIDIA H800 GPUs — an export-compliant variant with roughly half the interconnect bandwidth of the H100s available to American labs. They compensated through PTX-level programming and architectural innovations like Multi-head Latent Attention that reduced memory bandwidth requirements. The hardware was objectively worse. But the effective compute was competitive.

Third, it shifts the bottleneck from transistors to interconnects and memory. The binding constraint on modern AI training is often not how fast individual chips compute, but how fast data moves between them. Training a trillion-parameter model across thousands of GPUs requires moving enormous volumes of data through interconnects (NVLink, InfiniBand) and in and out of memory (HBM). These components have their own scaling trajectories, and they are not keeping pace with compute scaling. Memory bandwidth, in particular, is widely regarded as the most significant bottleneck in current AI hardware — the “memory wall” that limits how large a model you can efficiently serve. Power consumption is the other constraint that transistor scaling used to solve for free: Dennard scaling — the principle that smaller transistors use proportionally less power — broke down by the mid-2000s. Modern AI training clusters consume megawatts, and the energy cost of scaling is now a first-order strategic concern, not a line item.

What This Means for AI Strategy

If you still think in Moore’s Law terms — compute gets cheaper on a predictable schedule, so wait and the problem solves itself — you will misread the current landscape.

The correct framing is that compute improvement is now earned, not given. Under Moore’s Law, you could plan for next year’s chips being roughly twice as capable, regardless of your specific workload. Today, the gains come from architectural innovation (designing chips for specific workloads), software optimization (closing the gap between theoretical and actual performance), and systems engineering (solving interconnect and memory bottlenecks). These are research problems, not manufacturing problems.

This has strategic consequences. For AI labs, it means that compute efficiency research — the work that determines how much useful computation you extract from a given chip — is as important as compute procurement. Buying more GPUs helps, but the marginal return on the next thousand GPUs is lower than the marginal return on a 2x improvement in hardware utilization. DeepSeek demonstrated this concretely: with fewer and less advanced GPUs than their American competitors, they produced competitive models by being better at using what they had.

For chip companies, it means the market is fragmenting. NVIDIA’s dominance rests on a software ecosystem (CUDA) as much as on hardware superiority. Competitors like AMD, Intel, Groq, Cerebras, and various startups are attacking different segments of the fragmented compute landscape — inference optimization, sparse computation, wafer-scale integration. No single architecture will dominate the way x86 CPUs once dominated servers and desktops.

For policymakers, it means that export controls on advanced chips are a less complete lever than they appear. Restricting access to cutting-edge fabrication nodes constrains one dimension of compute scaling — the hardware dimension. It does not constrain the software and architectural dimensions, which are where the fastest gains are currently happening. China’s response to chip restrictions has been exactly what this analysis would predict: compensating for hardware limitations with software innovation. That said, software optimization gains may not compound indefinitely the way transistor scaling did — you can only close the utilization gap once. If the hardware gap widens over successive chip generations while software gains plateau, export controls may prove more binding in the long run than the current moment suggests.

The Real Lesson

The most important question in AI compute is no longer “when does the next process node arrive?” It is “how efficiently are we using the hardware we already have?” The answer, for most organizations, is… not very. And that gap — between theoretical and actual performance — is where the next generation of competitive advantage lives.