OpenAI's New Highly Capable o3 Model Has Large Implications for Their Future
The Debate About Artificial General Intelligence and What it Indicates for OpenAI’s Future
In AI research, Artificial General Intelligence (AGI) is often described as the kind of intelligence that could match or even surpass human abilities across a wide range of tasks. The possible development of AGI marks a significant milestone in the progress of AI engineering and the capability of autonomous systems. To measure how close we are to achieving AGI, researchers use tests like the ARC (Abstraction and Reasoning Corpus). These tests challenge AI systems to solve puzzles that require creativity and flexibility—skills humans take for granted but current AI struggles with. On Friday, December 20th, the co-founder of ARC Prize François Chollet said that the performance of OpenAI’s new o3 model “represents a significant breakthrough in getting AI to adapt to novel tasks.”
However, he also pointed out that “while the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI.” He notes that the model still falls short on some relatively simple tasks in the first level of these tests, called ARC-AGI-1, and is unlikely to tackle the tougher challenges of the next level, ARC-AGI-2. His comments have kicked off enormous debate in the AI research community, and to understand why this matters, it’s important to explore what these tests reveal about how far we’ve come—and how far we still have to go—in the quest for AGI.
At its simplest, AGI has long been defined in practice as a model more “intelligent” than what we have now. Historically, though, the term has been more technical, referring to a system that can perform a wide range of diverse and novel tasks at the human level or better. OpenAI defines AGI as systems that are either “generally smarter than humans” or “highly autonomous systems that outperform humans at most economically valuable work,” depending on where you look. This is in contrast to “weak AI,” which excels only within narrow domains, like AlphaFold’s exceptional success in predicting protein structures. AlphaFold, despite its brilliance, would fail the ARC tests completely—it isn’t built to “think” beyond its specific task.
Transformer-based large language models (LLMs), like OpenAI’s, are currently the leading contenders for achieving AGI-like capabilities. They show remarkable versatility and adaptability, but it’s crucial to keep perspective: adaptability alone doesn’t mean AGI has been reached. In fact, Chollet himself has argued that LLMs, no matter how flexible they become, will never truly achieve AGI.
This context makes the ARC Prize so significant. The tests are explicitly designed to challenge systems on abstract reasoning and general problem-solving—skills that humans excel at, but AIs still struggle with. In fact, the very name “ARC-AGI” ties these benchmarks to the concept of AGI. Chollet has even suggested that surpassing a specific threshold on evaluations like ARC could signal the arrival of AGI. Yet, despite o3’s impressive progress, it falls short of meeting these criteria.
This evolving discourse highlights a shift in how AGI is defined. Recently, Chollet proposed a more refined and measurable benchmark for AGI: an AI that can correctly answer any question answerable by humans. Reflecting on o3’s limitations, he wrote on X, “This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI – without involving specialist knowledge. We will have AGI when creating such evaluations becomes outright impossible.” By this updated definition, AGI is no longer just about general “intellectual” superiority; it’s about universal solvability, a system capable of “understanding” and addressing any human-posed problem.
This redefinition moves AGI closer to what many have traditionally associated with superintelligence—intelligence that far surpasses human ability in virtually every domain. The boundary between AGI and superintelligence is becoming increasingly blurred, with AGI now seen less as a stepping stone and more as synonymous with the latter.
This definitional drift is not new. AGI has always been a slippery concept, shifting with each generation of AI advancements. For years, it has essentially meant “a model smarter than what we currently have.” Inevitably, this line of thinking leads to “the most intelligent thing we can imagine”—superintelligence. But this time, the stakes for pinning down a clear definition feel higher than ever, especially as OpenAI finds itself at the center of this debate.
The evolving definition of AGI has significant implications for OpenAI’s relationship with Microsoft, its largest investor and key strategic partner. Currently, OpenAI operates under a governance structure that restricts Microsoft’s access to any eventual AGI technology, an arrangement designed to ensure AGI is not misused for purely commercial purposes. This restriction, enshrined in OpenAI’s corporate bylaws, reflects the organization's founding principles of prioritizing safety, ethics, and broad societal benefit over profit. However, as OpenAI pivots toward a more commercially viable model to sustain its rapid growth and compete with well-funded rivals such as Google and Amazon, these provisions are increasingly seen as a barrier to attracting the levels of capital required to achieve AGI.
Reports suggest that OpenAI is considering removing this restriction as part of a broader restructuring aimed at aligning its governance with its financial needs. By granting Microsoft access to AGI, OpenAI could secure further investment, extending Microsoft’s already substantial $14 billion stake in the company. This access would not only strengthen Microsoft’s position as a leader in the AI industry but also deepen its integration of OpenAI’s technologies into its suite of products, such as Azure and Office 365. However, this move risks diluting OpenAI’s ethical oversight, as it would transfer significant influence over AGI technology to a non-OpenAI for-profit corporation whose priorities may not always align with societal well-being.
The best way to think about what incentivizes OpenAI is to imagine if Chollet had said that the new o3 model is AGI. This sort of announcement would have almost certainly necessitated a response from OpenAI’s board, which is responsible for making the final determination regarding AGI. It is important to note that OpenAI’s current business structure is a for-profit LLC governed by a non-profit board. If the board were to declare o3 to be AGI, the current structure coupled with their Microsoft agreement would likely create drastic shifts in their ability to make money from o3. Given that OpenAI is yet to be profitable, and it spends an enormous amount of money just doing its normal operations, the LLC is financially disincentivized from having AGI right now.
This disincentive won't last forever; OpenAI is working to convert into a for-profit entity. If this were to come to fruition, not only is it unclear who would determine whether or not AGI has been achieved, but without an independent board imposing bounds on the financial trajectory of OpenAI, it is feasible that a for-profit OpenAI could shirk ethical responsibility in favor of profit. In this hypothetical, it is unclear whether announcing AGI would restrict the new for-profit entity in the same way it would with the current structure. Sam Altman, CEO of OpenAI, has also made his disdain for the current restrictions known, stating:
“When we started, we had no idea we were going to be a product company or that the capital we needed would turn out to be so huge," he said. "If we knew those things, we would have picked a different structure.”
With all these moving pieces, pinning down a definition for AGI is of the utmost importance. Given that OpenAI shares two differing definitions of AGI publicly and an internal tiered system has been discussed in interviews with OpenAI employees, the definition is extremely slippery. Whether or not AGI is equivalent to superintelligence or can be determined by arbitrary benchmarks has deep and lasting effects on OpenAI as a business and the tech sector as a whole.
Thanks for reading Curiosity is All You Need! While writing this we gained repeated satisfaction from placing words like “intelligence,” “thinking,” or “understanding” in quotes. Given that AGI is a microcosm of this sort of imprecise language, tell us what you think about its usefulness as a metric by which to describe these systems! We are working on a piece that tackles adjacent ideas and would love to hear your thoughts.
If you want to read Chollet’s full statement, you can find it here.