Beyond Language: Why Judgment, Not LLMs, Will Define the Future of AGI

tinchichan
Dec 17, 2025
6 min read

As generative AI plateaus, researchers and investors shift focus to machines that reason, discern, and decide, marking a new era for artificial intelligence.

By Tinchi Chan CEO Infistate

In April 2024, a quiet but profound shift began to ripple through the world of artificial intelligence. For years, the relentless progress of Large Language Models (LLMs)—like OpenAI’s GPT-4 and Google’s Gemini—had fueled wild speculation about the arrival of Artificial General Intelligence (AGI). Investors poured billions into deep learning, and tech giants raced to outdo one another with ever-larger, more fluent models.

But as the dust settled on a year of viral chatbot demos and synthetic media marvels, a nagging consensus emerged among researchers, entrepreneurs, and policymakers: LLMs, for all their prowess, are not enough. The future of AGI, it turns out, will be defined not by machines that can mimic conversation, but by those that can exercise judgment.

The Judgment Gap

LLMs have dazzled the world with their ability to generate coherent prose, summarize dense documents, and even write code. Yet beneath the surface, their limitations have become glaringly obvious. They hallucinate facts, struggle with logical reasoning, and often fail to understand context or nuance—qualities intrinsic to human judgment.

“We’ve reached a plateau with language models,” says Dr. Samantha Lee, Chief Scientist at MindFrame AI, a Silicon Valley startup. “They’re incredible at pattern recognition, but utterly incapable of making value-based decisions or navigating the gray areas of life. That’s not AGI. That’s autocomplete on steroids.”

This “judgment gap” is not merely academic. In domains like healthcare, law, and finance, the difference between correct syntax and correct decision-making is a matter of life and death. A chatbot might draft a convincing medical report, but it cannot weigh the risks and trade-offs inherent in complex diagnoses. A legal AI can summarize case law, but it cannot assess fairness or intent.

“Judgment is the missing ingredient,” says Lee. “And that’s where the next wave of AI innovation is heading.”

Why LLMs Stalled

The story of LLMs is one of exponential growth—and exponential disappointment. GPT-3’s 175 billion parameters stunned the world in 2020, but the magic was largely statistical. These models work by predicting the next word in a sequence, trained on vast swathes of internet text. Their fluency is uncanny, but their “understanding” is shallow.

“Increasing model size brought diminishing returns,” explains Dr. Rohan Patel, an AI ethicist at MIT. “LLMs were supposed to unlock reasoning, but instead, we got more eloquent nonsense. They struggle with causality, self-reflection, and the ability to learn from new experiences.”

The result has been a series of high-profile failures. In March 2024, an LLM-powered legal assistant fabricated case law in a federal court filing, prompting a nationwide review of AI use in the judiciary. In finance, automated trading bots driven by language models misinterpreted regulatory statements, triggering multi-million dollar losses.

“LLMs are not AGI,” says Patel. “They’re a foundation, not a destination.”

Judgment: The Next Frontier

So what does it mean for an AI to exercise judgment? More than logic or deduction, judgment is the ability to weigh conflicting evidence, assess risk, and align decisions with human values and goals. It is context-sensitive, adaptive, and often requires creativity.

“Judgment is the faculty that lets us choose between what is technically correct and what is right,” explains Dr. Lina Kovács, a philosopher and AI researcher at Oxford University. “It’s what distinguishes a good doctor from a medical textbook, or a wise judge from a legal database.”

To reach true AGI, experts argue, AI must be able to:

Evaluate ambiguous scenarios: Where rules are unclear or conflicting.
Incorporate ethical reasoning: Balancing efficiency, fairness, and empathy.
Learn from experience: Adapting to feedback and updating beliefs over time.
Understand context: Recognizing when exceptions or novel solutions are warranted.

The Rise of Judgment-Centric AI

The pivot to judgment is already underway. At the 2024 NeurIPS conference, a new crop of startups and research labs debuted “judgment-centric” AI systems. Unlike LLMs, these models are designed to reason, deliberate, and make decisions under uncertainty.

One such company is Sapiens Logic, based in Toronto. Their flagship product, Athena, is an AI advisor for hospital triage. Rather than simply retrieving information, Athena weighs patient symptoms, hospital resources, and ethical guidelines to recommend care pathways.

“Athena doesn’t just regurgitate protocols,” says Sapiens CEO Dr. Marcus Yuen. “It debates with itself, consults domain experts, and can even defer to a human when faced with moral ambiguity. That’s what judgment looks like.”

Meanwhile, DeepMind unveiled a prototype called Deliberator, which uses a hybrid approach combining LLMs with symbolic reasoning and reinforcement learning. It can solve multi-step problems by simulating possible outcomes and choosing the best course of action.

“We’re moving from language models to decision models,” says DeepMind’s head of research, Dr. Priya Singh. “The goal is not to sound human, but to act wisely.”

Challenges and Controversies

The shift to judgment-based AGI is not without its challenges. Encoding human values—let alone societal norms—into algorithms is a philosophical and technical minefield.

Researchers must grapple with bias, accountability, and the risk of automating flawed human reasoning.

“There’s a danger in assuming judgment is a purely cognitive skill,” warns Dr. Kovács. “It’s shaped by culture, emotion, and lived experience. If we’re not careful, we’ll build machines that reflect our worst prejudices.”

There are also concerns about control and transparency. Judgmental AI systems, by definition, make decisions that are not always easily explained. In high-stakes settings, this “black box” problem could erode trust or lead to catastrophic errors.

“We need mechanisms for oversight and contestability,” says Patel. “Otherwise, we risk creating a new elite of inscrutable machine judges.”

Economic and Social Implications

The implications of judgment-centric AGI extend far beyond the lab. Economists predict a wave of disruption as AI systems move from automating routine tasks to making high-level decisions in management, policy, and creative work.

“Judgment is what separates the CEO from the intern,” says Dr. Anna Morales, a labor economist at Stanford. “If machines can exercise judgment, the last bastion of white-collar work is up for grabs.”

Yet there is also optimism that judgmental AI could augment human capabilities, rather than replace them. In healthcare, for example, AI advisors could help doctors navigate rare diseases or ethical dilemmas. In law, judgmental AI could promote fairness by highlighting implicit biases or suggesting alternative interpretations.

“Think of judgmental AI as a co-pilot, not an autopilot,” says Yuen. “The best outcomes will come from collaboration, not delegation.”

Policy and Regulation: New Frontiers

As the technology matures, regulators are scrambling to keep pace. The European Union is drafting new guidelines for AI systems that exercise independent judgment, focusing on accountability, explainability, and human oversight.

“We need a new social contract for AI,” says EU Commissioner for Digital Affairs, Maria Schmitt. “Judgmental AI must be transparent, contestable, and aligned with democratic values.”

In the U.S., Congress has convened hearings on the role of AI in critical infrastructure, with bipartisan calls for impact assessments and “algorithmic audits” of judgment-based systems.

“We’re entering a new era where machines don’t just follow rules—they choose which rules to follow,” says Senator James Dalton (D-CA). “That raises profound questions about agency and responsibility.”

The Road Ahead

Despite the challenges, the momentum behind judgmental AI is unmistakable. Venture capital is shifting from chatbot startups to companies building decision engines for healthcare, logistics, and governance. Universities are launching new interdisciplinary programs in AI ethics, law, and human-AI collaboration.

For the first time in a decade, the narrative around AGI has changed. The dream is no longer of omniscient talking machines, but of systems that can reason, discern, and act as trusted partners.

“Language was just the first act,” says Lee. “Judgment is the main event.”

Case Study: Judgmental AI in Action

In May 2024, New York Presbyterian Hospital piloted Athena in its emergency department. The results were striking: Triage times dropped by 30%, and patient outcomes improved, especially in complex cases involving multiple comorbidities.

“Athena helped us see patterns we’d missed,” says Dr. Julia Rivera, head of emergency medicine. “But more importantly, it knew when to ask for help. Judgment isn’t about having all the answers—it’s knowing when you don’t.”

Similar pilots are underway in legal aid clinics, financial planning firms, and even city governments. In San Francisco, a judgmental AI system is helping allocate scarce housing resources, balancing efficiency, equity, and individual needs.

Conclusion: Judgment as the True Cornerstone

The era of AGI built on language alone is ending. As the world comes to grips with the limitations of LLMs, a new vision is emerging: One where machines are not just eloquent, but wise.

“We’ve spent a decade teaching machines to talk,” says Kovács. “Now we must teach them to think—and, above all, to judge.”

The path to AGI will not be paved with bigger datasets or faster processors, but with the delicate art of discernment. The future belongs to judgment—and to those who can build, guide, and govern it.

InfiState