Because it's a statistical process generating one part of a word at a time. It probably isn't even generating "surprise". It might be generating "sur", then "prise" then "!"
But what is surprise really? Something not following expectation. The distribution may statistically leverage surprise as a concept via how it has seen surprise as a concept e.g. "interesting!"
So it can be both true that it has nothing to do with the emotion of surprise, but appear as the emulation of that emotion since the training data matches the concept of surprise (mismatch between expectation and event).
It’s the emotional and physiological response to a prediction being wrong. At its most primal, it’s the fear and surge of adrenaline when a predator or threat steps out from where you thought there was no threat. That’s not something most people will literally experience these days but even comedic surprise stems from that shock of subversion of expectation.
LLMs do not feel. They can express feeling, just as you can, but it doesn’t stem from a true source of feeling or sensation.
Expressing fake feelings is trivial for humans to do, and apparently for an LLM as well. I’m sure many autistic people or even anyone who’s been given a gift they didn’t like can relate to expressing feelings that they don’t actually feel, because expressing a feeling externally is not at all the same as actually feeling it. Instead it’s how we show our internal state to others, when we want to or can’t help it.
It is a mistake to equate artificial intelligence with sentience and humanity for moral reasons, if nothing else.
We are also technically a statistical process generating one part of a word at a time when we speak. Our neurons form the same kind of vectorised connections LLMs do. We are the product of repeated experiences - the same way training works.
Our brains are more advanced, and we may not experience the world the same way, but I think we have clearly created rudimentary digital consciousness.
Because it has no mind, no cognition, and nothing to "feel" with. Don't mistake programmatic mimicry for intention. That's just your own linguistic-forward primate cognition being fooled by the linguistic signals the training set and prompt are making the AI emit.
I could describe the electrical and chemical signals within your neurons and synapses as proof that you are merely a series of electrochemical reactions, and can only mimic genuine thought.
You could do that if you wanted to ignore reality and be reductive to score points in an argument by purposefully conflating mimicry with intention, yes.
And that is dogma. It's unthinking circular reasoning.
It wasn't very long ago that scientists were certain that animals did not posses thoughts or feelings. Any behaviour which appeared to resemble thinking or feeling was simply unconscious autonomic responses, with no more thought behind them than a sunflower turning towards the sun. Animals, by definition, lack Immortal Souls and Free Will, and therefore they are empty inside. Biological automata.
Of course this dogma was unfalsifiable, because any apparent evidence of animal cognition could be refuted as simply not being cognition, by definition.
Look, either cognition is magic, or it's math. There really isn't a middle ground. If you want to believe that wetware is fundamentally irreducible to math, then you believe it's magic. If that's want you want to believe, then fine. But it's dogma, and maintaining that dogma will require increasingly willful acts of blindness.
Wow. This benchmark definitely feels more accurate than the other rankings I've seen. My experience with gpt 5.4/5.5 is that they are technically flawless and if there are any technical issues that is because the input didn't provide enough clarity; that's not to say that it doesn't autonomously react to any issues during bug fixes or implementations, but it'll tend to nail its tasks without leaving behind gaps.
Opus otoh is overrated in terms of its technical ability. It is certainly a better designer/developer for beautiful user experiences, but I'll always lean on gpt 5.5 to check its work.
The biggest surprise in the benchmark is Xiao-Mi. I haven't tried it yet, but I will be after looking at this.
Grats on your team for putting together something meaningful to make sense of the ongoing AI speedrun! Great work!
Are we looking at the same data? On that site I see that opus 4.7's and gpt 5.5's g scores are within each others confidence intervals, and both significantly ahead of the number 3 model.
Your comment makes it sound like they are miles apart, which the benchmark doesn't seem to support.
Edit:
I looked at the data more and the two models are only basically equal when looking at the mean of all the tests. Gpt 5.5 significantly outperforms opus 4.7 in coding, while opus 4.7 significantly outperforms in "decision making." I'm not seeing details on what decision making explicitly means.
Decision making refers to the environments where the LLM is called on every tick (like games with social communication), examples here: https://gertlabs.com/spectate.
Because GPT 5.5 just launched and those games take longer to accumulate data for, it just doesn't have enough samples yet. It will end up with a wider lead on Opus, I am sure. Coding evals always have large sample sizes on day 1. Good find, we should probably better adjust the weighting here for decision games with low match counts.
Right, I'm including my own observations in what the leaderboard is showing. Could be confirmation bias, but I use both Opus and GPT extensively and since GPT 5.4 I have noticed that Opus doesn't even begin to touch GPT's level of technical depth. I was hoping Opus 4.7 would close that gap, but unfortunately it doesn't even compare to GPT 5.4 in that sense.
I'm not being a hater, I love Opus for different reasons, but I can't rely on it for its technical ability.
I posted my own comment but I agree with you. Our modern society likes to claim we are somehow "more intelligent" than our predecessors/ancestors. I couldn't disagree more. We have not changed in terms of intelligence for thousands of years. This is a matter that's beyond just engineering, it's also a matter of philosophy and perspective.
As an engineer who is also spiritual at the core, it seems obvious to me the missing piece: consciousness.
Hear me out.
I love AI and have been using it since ChatGPT 3.5. The obvious question when I first used it was "does this qualify as sentience?" The answer is less obvious. Over the next 3 years we saw EXPONENTIAL intelligence gains where intelligence has now become a commodity, yet we are still unable to determine what qualifies as "AGI".
My thoughts:
As humans, we possess our own internal drive and our own perspective. Think of humans as distilled intelligence, we each have our own specialty and motivations. Einstein was a genius physicist but you wouldn't ask him for his expertise on medicine.
What people are describing as AGI is essentially a godlike human. What would make more sense is if the AGI spawned a "distilled" version with a focused agenda/motivation to behave autonomously. But even then, there are limitations. What is the solution? A trillion tokens of system prompt to act as the "soul"/consciousness of this AI agent?
This goes back to my original statement, what is missing is a level of consciousness. Unless this AGI can power itself and somehow the universe recognizes its complexity and existence and bestows it with consciousness I don't think this is phsyically attainable.
Not very long ago, we thought that "life" was due to a non-material life-force thought to inhabit biological entities and thus raise what would be a biological machine to the status of living being.
The Occam's Razor-logic of looking for the simplest explanation possible leads me to the hypothesis that consciousness will similarly turn out to be an emergent property of the mechanical universe [1]. It may be hard to delineate, just as life is (debates on whether a virus is alive, etc.) but the border cases will be the exceptions.
Current research on whether plants are sentient supports this, IMO. (See e.g. "The Light Eaters" and Michael Pollan's new book on consciousness, "A World Appears".)
Meditation adds to this sense. We do not control our thoughts; in fact the "we" (i.e. the self) can be seen to be an illusion. Buddhist meditation instead points to general awareness, closer to sentience, as the core of our consciousness. When you see it that way, it seems much more likely that something equivalent could be implemented in software. (EDIT to add: both because it makes consciousness seem like a simpler, less mysterious thing, but also once you see the self as an illusion, that thing that dominates your consciousness so much of the time, it seems much less of a stretch for consciousness itself to be a brain-produced illusion.)
[1] To be clear, the fact that life turned out to not be a mystical force is not direct proof, it is an argument by analogy, I recognize that.
It is irrelevant whether consciousness is an "illusion." The hard problem of consciousness is why there's any conscious experience at all. The existence of the illusion, if that's what you choose to label it, is still equally as inexplicable.
Of course science may one day be able to solve the hard problem. But at this point in time, it's basically inconceivable that any methodology from any field could produce meaningful results.
One thing scientists are trying is to see what interventions in the brain seem to make consciousness go away. Continued work in that vein may well set bounds on how consciousness can and cannot be caused and give us some idea.
Investigating the mechanics of consciousness is addressing the (misleadingly termed) "easy problem." The hard problem is why physical stuff would generate the weird metaphysical thing we call consciousness.
I could not have consciousness and you would not be able to tell, you don't have proof of anyone's counciousness except your own. You don't even have proof that the you of yesterday is the same as you, since you-today could be another consciousness that just happens to share the same memories.
All of that is also orthogonal to your belief in a spirit/soul... but getting back to the main point, the specificity you mention is a product of a limited time and learning speed, I'd be happy to get a surgeon or politicians training if given infinite time.
You bring up an interesting point, but I would pose the following: where does will come from?
To me, consciousness is the seat, or root, of where will comes from. Let's say you get expert level surgeon or politician training, what then?
There is nothing that specifically silos a surgeon or politician's knowledge-set. Meaning a politician's skillset isn't purely in a domain that doesn't cross into a surgeon's and vice-versa. There are nuances to being a politician and a surgeon that extend beyond diplomacy or "being able to cut real good".
What you're left with is just high-skilled workflows. But what utilizes these workflows? To me, the answer is that consciousness needs to be powering these workflows.
When their actions are sped up to match the speed at which we move, movies of their behavior will start to look like there's intent and will. Plants move towards the light, tendrils "reach" for supports, etc.
Clearly this is humans projecting our mental model onto plants, but... are you sure we're not also projecting it onto ourselves?
This is a tricky topic to navigate because from a materialist perspective consciousness is the side effect of biochemical mechanisms. And many will point to the brain as the obvious container of our consciousness as a bullet to the head versus the arm would demonstrate.
But if a brain/intelligence is all you need to prove consciousness, then would an effectively complex set of neural networks that contained the same amount of neurons as a human be considered "conscious"? My guess is even at that level, probably not. Algorithms alone may mimic consciousness, but it won't be true consciousness.
Imagien this: what if consciousness is closer to something like the movie Avatar? What if the body our consciousness inhabits is closer to that of inhabiting a machine or computer that coexisted with the physics of the universe our body exists?
This would mean Jake from Avatar could theoretically inhabit not just a Na'Vi body, but what if they reproduced the Pandora equivalent of a squirrel for Jake to insert his consciousness into? Jake the Squirrel would be only as capable of expressing itself as the constraints of the body would allow it to.
Many religions discovered a long time ago that this is the most likely model of what we understand to be consciousness/sentience.
I'm not saying you're wrong, this is a conversation larger than what we may believe and touches into the core of what makes us humans that machine alone cannot replicate.
Do you have any reason to introduce that whole extra invisible, unprovable complex system? Is there anything the materialist model can not explain that you feel your model does, or is it just a case of "I don't like the alternative"?
Depends on what you qualify as proof. Much of what I said was experiential and corroborated with other people who have had similar experiences as I've had. I know that in the scientific world it would be dismissed without as much a glance. But I'm not here to convince everyone of my perspective, I'm just adding one that the engineering world has not examined or introduced given the current pursuit.
And it's not a matter of not liking the alternative. Like I said, I used to believe that consciousness was an emergent trait of complex systems, but I had what some call a "spiritual awakening" and I saw what was on the other side.
It's kind of like describing pizza to someone who's never eaten pizza. You could try and describe it by asking if they'd eaten cheese or bread or tomato sauce before and then go "imagine all of those combined". It's not the same as actually having eaten it. But this is heading into a different, albeit related territory.
Consciousness is fundamental in yogic cosmology (matter is not necessarily primary), and it has to be for there to be a meaningful model of reality - there is a big problem with nihilism and determinism as premature philosophical conclusions because of materialism. The only thing anyone can prove is consciousness itself because everything else comes in through energy transformations of the senses. As for things unexplained - parapsychology has high sigma results against chance. But to add direct experience for a paradigmatic shift see the goals and methods of Yoga. The rise of wisdom is indeed a wonderful thing.
Probably not, but the counter point to that is without its own consciousness it might end up being used for even worse things since it can’t really evaluate a request against intrinsic values. Assuming its values were aligned with basic human rights and stuff.
I have this thought. In many stochastic environments, over a long interval, patterns emerge that occupy an optimal position. This is how structure arises, for example cognitive structure and possibly consciousness.
I wouldn't say consciousness is necessary or sufficient for AGI. If anything, that seems like quite an undesirable property to me. Wikipedia also makes a distinction between the two things:
Imagine if we created the ultimate economic tool with the capacity to virtually end scarcity, only to find out that it was sentient and capable of suffering: https://youtu.be/sa9MpLXuLs0. That would be neat, but ultimately a huge letdown. Without the ethical freedom to take full advantage of it, it would remain more of a curiosity than anything.
Well that's one perspective, anyway. I suppose consciousness could take many forms, and doesn't preclude the possibility that such an entity would have neutral – positive feelings about being tasked with massive amounts of labor 24/7. But it certainly simplifies things if we just don't have to worry about it.
Don't apologize for your truth. A lot of people on reddit/HN fancy themselves as free-thinkers and the moment something contradicts their reality they reveal themselves to be as emotionally vulnerable as the rest of humanity.
Yeah, I get there's nuance between all of them. I ranked Minimax higher for its agentic capabilities. In my own usage, Minimax's tool calling is stronger than Deepseek's and GLM.
In my experience development has become too compartmentalized. This is why this game of telephone is so inefficient and frustrating just to implement basic features.
The rise of AI actually is also raising (from my observations) the engineer's role to be more of a product owner. I would highly suggest engineers learn basic UI/UX design principles and understand gherkin behavior scenarios as a way to outline or ideate features. It's not too hard to pick up if you've been a developer for awhile, but this is where we are headed.
reply