Would you say that GPT-4 can reason now? I am not convinced this is case, it seems like it has just become more consistent at providing us with an output that we consider reasonable because it was engineered precisely to do that.
Let's assume reasoning entails going beyond the stochastic parrot level. Can LLMs have skills not demonstrated in the training set?
Here is a paper demonstrating that GPT-4 can combine up to 5 skills from a set of 100, effectively covering 100^5 tuples of skills, while only seeing much fewer combinations in training on a specific topic.
> simple probability calculations indicate that GPT-4's reasonable performance on k=5 is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training
https://arxiv.org/abs/2310.17567
So they show ability to freely combine skills, and the limit of k=5 measured in this benchmark illustrate that models do generalize. They are able to apply skills in new combinations correctly, but there is also a limit.
The interesting part is how they demonstrate that, let's say on a topic with n=1000 samples in the training set it is impossible to have sufficient training examples covering tuples of 5 skills, but models (mostly GPT-4) can handle it. Other models top out at tuples of only 2 or 3 skills.
Models combining skills in new ways are not just parroting. They can perform meaningful work outside their training distribution.
TL;DR: They're close enough to make people argue and publish papers about similarities to the human hippocampus
I have a hunch these models are approximating an important subset of what we call reasoning. In dangerously reductive terms, it's a question of how closely and how much of a function's output we can approximate.
There was at least one paper[1] showing similarities between AI models and the hippocampus. That lines up with another part of human neuroscience: at least part of human reasoning appears to take place inside the hippocampus itself [2].
From my neuroscience background, the takeaways seem to be:
* Carmack is right: we're missing some important bridging concepts for AGI.
* Whether current LLMs can reason depends on how you define reasoning
I'm unsure whether finding answers in those areas would be good thing. Instead of alignment issues or misuse, I'm more worried about how quickly people would overreact to it. We might already be seeing that in business.
then you have to define reason, and it gets all philosophical. suffice to say, it's able to take a implies b and b implies c to get a implies c, and make up things along the way so calling it glorified autocomplete is a dishonest representation of its abilities. it doesn't love or feel jealous but it's very good at writing essays about whatever I ask it to. it doesn't need to do more than that to be useful to me, today.
the human words we have for consciousness aren't good enough to describe what ChatGPT does. thinking, reasoning, understanding. it processes. it computes. it matrix multiplies. it takes the dot product and the determinant. there are eigenvectors and eigenvalues. it's tensoring and outputting the code and prose I asked it for.
It’s definitely not the case. LLMs of any sort do not in any sense reason or understand anything.
They literally just make stuff up (technically just a continuation of whatever you fed in), which usually sounds good, is often true and sometimes even helpful. Because those are qualities of the training data that was used and is the basis for the stuff it’s making up.
Neural networks are not new, and they're just mathematical systems.
LLMs don't think. At all. They're basically glorified autocorrect. What they're good for is generating a lot of natural-sounding text that fools people into thinking there's more going on than there really is.
I agree. The McCullough-Pitts paper was published in 1943.
> they're just mathematical systems.
What do you mean by "mathematical system"? AFAIK the GPT4 model is literally a computer program.
> LLMs don't think. At all.
This is the same assertion that OP made and I'm still confused as to how anyone could be certain of its truth given that no one actually knows what is going on inside of the GPT4 program.
> They're basically glorified autocorrect. What they're good for is generating a lot of natural-sounding text that fools people into thinking there's more going on than there really is.
Is that an argument for the claim "LLMs don't think."? It doesn't seem like it to me, but maybe I'm mistaken.
Not new, but we don't understand how they work at the large scale.
I don't think reductionistic arguments hold much water. Sure, neural networks are just matrix multiplication. In the same way that a brain is just a bunch of cells. Understanding the basic building blocks doesn't mean understanding the whole.
We can always say that LLMs don't think if we define "think" as using a biological brain, but the fact is that they generate outputs that from the human perspective, can only plausibly be generated via reasoning. So they, at the very least, have processes that can functionally achieve the same goal as reasoning. The "stochastic parrot" metaphor, while apt in its day, has proven obsolete with pretty much all the examples of things that LLMs "could not do" in early papers being actually doable with the likes of GPT-4; so arguments against the possibility of LLMs reasoning look like constant moving of the goalposts.