8.1 C
New York
Monday, February 24, 2025

DeepSeek showed us that scaling isn’t all you need to solve AI compute



Returning closer to the present day, we find commercial development of AI beholden to “The Bitter Lesson.” After Nvidia’s CUDA enabled efficient tensor operations on GPUs and deep networks like AlexNet drove unprecedented progress in varied fields, the previously diverse methods competing for dominance in machine learning benchmarks homogenized to solely throwing more compute at deep learning. 

There’s perhaps no greater example of the bitter lesson than large language models, which displayed incredible emergent capabilities with scaling over the past decade. Could we really reach artificial general intelligence (AGI), that is, systems amounting to the archetypal depictions of AI seen in Blade Runner or 2001: A Space Odyssey, simply by adding more parameters to these LLMs and more GPUs to the clusters they’re trained on?

My work at UCSD was predicated on the belief that this scaling would not lead to true intelligence. And, as we’ve seen in recent reporting from top AI labs like OpenAI and luminaries like François Chollet, the way we’ve been approaching deep learning has hit a wall. “Now everybody is searching for the next big thing,” Sutskever aptly puts it. Is it possible that, with techniques like applying reinforcement learning to LLMs à la OpenAI’s o3, we are ignoring the wisdom of the bitter lesson (though these techniques are undoubtedly computationally intensive)? What if we sought to understand a “theory of everything” for learning, and then double down on that?

We have to deconstruct, then reconstruct, how AI models are trained

Rather than black-box approximations, at UCSD we developed breakthrough technology that understands how neural networks actually learn. Deep learning models feature artificial neurons vaguely similar to ours, filtering data through them and then backpropagating them back up to learn features in the data (the latter step is alien to biology). It is this feature learning mechanism that drives the success of AI in fields as disparate as finance and healthcare. 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles