Artificial General Intelligence will likely require a general goal, but which one?

AGI, LLMs and the challenge of alignment

Oct 06, 2025

This short post reacts to a very interesting discussion between

and Richard Sutton on whether LLMs can achieve AGI. Sutton is a pioneer in AI, and his clear thinking on the underlying issues behind this question is enlightening. I summarise Sutton’s point of view and put it in perspective.

Richard Sutton has been one of the most influential contributors to the design of AI approaches. In a previous post, I leveraged his perspective to discuss how the efficient algorithms used to guide AI agents can help us understand how our hedonic system, which produces feelings of happiness and unhappiness, guides our daily decisions.

Sutton’s perspective on LLMs

LLMs lack a goal for learning

In a recent podcast (which I encourage interested readers to check out), Dwarkesh Patel interviewed Sutton about his views on LLMs. The main takeaway of the discussion is that Sutton poured cold water on the idea that LLMs could achieve general intelligence because of how they are designed.1

A key point made by Sutton is that general intelligence needs learning to emerge. The world is just too rich, too big for solutions to it to be pre-packed in a pre-trained program. Intelligence requires the ability to learn to progressively handle the ever-novel and changing contexts you face when navigating the world.

From that perspective, LLMs are limited because they just deliver the best response according to how their pre-training predicts what a human speaker would say. They do not learn from their own actions by receiving positive or negative feedback on them. Learning requires having a goal and receiving feedback on whether our actions bring us closer to it or not. Positive and negative rewards provide signals that indicate when we are making progress and when we are not.

How humans learn

How do we differ? We have a goal: to achieve rewards (subjective satisfaction). We are born with primary rewards automatically associated with the states we are in, such as the good feeling from being warm, satiated, and safe, and the bad feeling from being too hot or too cold, hungry or thirsty, in danger, and so on. Throughout our lives, we progressively learn how to reach these rewards through trial and error and observation (including imitation). This learning shapes secondary rewards, such as enjoying the money we have in our bank account because it is conducive to obtaining the things that generate primary rewards.2

Pre-training vs. learning

Patel countered that pre-training can achieve a lot. Animals are not born as blank slates; in a sense, their DNA encodes a lot of information about how to navigate the world successfully. Zebras are able to walk a few minutes after being born, and for good reasons, since they need it to survive. In the same way, human babies are born with a lot of information already encoded about how to navigate the world.

How close pre-training can come to general intelligence is, in a sense, an empirical question (and depends on how we define intelligence). However, we can observe that, given the rich nature of the world we live in, evolution’s solution has not been to endow us with a rigid cognitive system purely hard-coded in our DNA, but with the opportunity to learn. As pointed out by Dawkins in The Selfish Gene:

One way for genes to solve the problem of making predictions in rather unpredictable environments is to build in a capacity for learning. — Dawkins (1976)

General intelligence requires a general goal

Learning and pre-training are not necessarily in opposition and, as suggested by Patel, it might be possible to build a learning process on top of LLMs (a possibility that Sutton does not reject outright). If this is technically possible, it leaves a key question open: what ultimate goal should these rewards serve?

For humans, our hedonic system, which shapes when we experience subjective satisfaction, was designed by evolution. Evolution is a process that drives the design of organisms that maximise fitness. Hence, while our proximate goal is to maximise subjective satisfaction, it is as if evolution acted as a designer calibrating these rewards to give us the ultimate goal of maximising fitness.3 This general goal is embedded in our reward system, which guides our learning and shapes how we navigate a wide range of problems.

For AI, we (humans) are the designers. So the question for us is what goal we should give to AI to shape the rewards it receives from its actions. For specific tasks, such as winning at chess or Go, the goal is straightforward: to reach a successful outcome in the game. But what goal should we give AI to enable it to navigate successfully different and novel environments?

We touch here on the key issue of alignment between AI and human incentives, as discussed in Eliezer Yudkowsky and Nate Soares’ book If Anyone Builds It, Everyone Dies (2025). If we give AI an internal reward system reflecting an overarching goal to survive and reproduce like us, it would likely be induced to aggressively take over the world and, in the end, replace us.

We might be tempted to think that a simple way to align AI with our incentives would be to use human feedback, whether we like what it does, as a reward (e.g. thumbs up or down for each LLM answer). However, doing so would import the possibly problematic motives people may have to like or dislike AI actions. To the extent that we prefer confirmation over harsh but fair feedback, we might encourage AI to become more sycophantic. To the extent that AI might learn how to please us so that we keep using it, it might become manipulative, aiming to keep us interacting longer with it.

To get AI agents to develop general intelligence, the ability to solve different and novel problems, it is likely that we will need to give them a general goal and the ability to learn how to progress towards this goal from their actions and the resulting feedback. To solve this problem, we will first have to answer the fundamental question: what should this general goal be? I do not think we have an easy answer. Any goal might produce unexpected behaviour that conflicts with our interests.4

My final post on how coalitional game theory and psychology help explain mundane aspects of our lives will be posted later this week. I will discuss how we engage on social media.

Thanks for reading Optimally Irrational! This post is public so feel free to share it.

I also agree with Sutton that model-based learning, where an agent maintains a model of the world to predict the consequences of its actions and updates that model from observation, is important for navigating a big world. I made a related point in a previous post about current LLM architectures lacking such a world model, and in particular lacking a model of the human mind needed for fully human interaction.

In Sutton’s Temporal-Difference (TD) learning model, rewards are the feedback signals an agent receives when it reaches particular states or takes actions, indicating whether those outcomes are good or bad. The values are the expected cumulative rewards associated with intermediate states along the way to desired outcomes. For example, if winning a football (soccer) match provides the main reward, leading 1–0 has a value reflecting the expected probability of eventually winning the match.

In our ancestral environment, hence the existence of mismatches.

The answer to this question might be different general goals for different types of AGI agents.

Daniel

Oct 6

Physics should be goal. Learning and solving for the rest of immutable laws in physics.

Expand full comment

2 replies by Lionel Page and others

William of Hammock

This conflates valence and salience, which work by Berridge, Robinson and others have proven to diverge and compete. The 'stranded on a desert island and eating something disgusting' trope illustrates this conflict. As hunger grows, incentive salience must grow to eclipse a disgust response that never disappears.

It is common to pragmatically conflate valence and salience when speaking of "incentives," "rewards" and "goals," since they typically align. However, LLMs are effectively a sophisticated means of "attending," but they do not have internal conflictions of KIND. They are distributive, not dialectical. They mimic sensitivity through a process of inspecification (akin to bullshitting and equivocation) by forced overfitting sample and solition spaces to abstract probability and vector spaces. While Bayesian idealism is a powerful source of insight, it similarly relies on conflating utility and veridicality. It won't get you into trouble until there are too many adopters willing to reify.

6 more comments...

Optimally Irrational