Lots of retrospectively-good predictions in there, but this was the one I undervalued the extent of:
> We also have a bad habit of changing the definition of machine intelligence when a program gets really good to claim that the problem wasn’t really that hard in the first place
We've done this so much recently that I'm now seeing rewritten definitions of "real" intelligence that most humans do not meet.
There's quite a rationale justification for the ever 'shifting goalposts.' When humans describe some milestone in the future as finally being "real AI", they're not really just describing that milestone, but the many adjacent capabilities that they expect that milestone to entail. But we're quite clever, and like any old metric needing to be juked, we invariably find ways to achieve the milestone while sidestepping all the capabilities it's supposed to entail.
Chess is the obvious example. A machine capable of playing chess at a human level was supposed to indicate the advent of genuine artificial intelligence at one time. It wasn't that playing chess well means one is intelligent, but rather it was assumed it'd entail abstract planning, strategic thought, intuition, and creativity. Of course now we have software which can crush even a world champion, but none of those adjacent capabilities emerged at all.
And so I think it's also increasingly obvious that this is the same thing with chatbots. Many of us thought those 'surrounding capabilities' were finally here, even more so with OpenAI regularly demonstrating exceptional competence on a wide array of distinct metrics, such as performance on the Bar exam. But once you use the system for a while it becomes clear that its knowledge base is absolutely and unbelievably immense, but its 'understanding' of that knowledge is literally zero. It will arbitrarily create e.g. API calls that do not exist, mix up utterly simple concepts, and fail to learn from its mistakes in any meaningful way whatsoever.
I'm sure if you've used ChatGPT for anything you've run into the utterly annoying scenario of:
- "How do I [x]?"
- "Sure! That's easy, just do [A]."
- "No, you're hallucinating."
- "Oh sorry, thanks. You're right you actually need to do [B]!"
- "No, you're still hallucinating."
- "Oh sorry, you're right. You just need to do [A].
If a human, even a stupid human, acted in this way - you'd assume they were trolling you, especially one gifted with the ability for infinite perfect and complete recall.
Have you used GPT-4 significantly? I've had many experiences* of answers that, as far as I can tell, require strong reasoning skills and a world model. It is unreliable, as you say, but only demonstrating those skills a small fraction of the time would still be proof of their existence.
*: (such as when GPT-4 describes what would be output by Python code that I write and feed it, if it was run, despite GPT-4 not having any access to a Python interpreter and therefore having to simulate what one would do with my code, and despite my code not being in its training set.)
> especially one gifted with the ability for infinite perfect and complete recall.
You know that LLMs don't have this, right? There is no database they have access to containing their training data. They just have the weights that were optimized in response to seeing that training data.
What makes you think there isn't a limited interpreter, or an expert system effectively akin to an interpreter working behind the scenes? I think this is a fairly easy concept to test. Give it 'code' that is trivial to understand, but in a [hopefully] novel syntax that might also fuzz up guess-the-next-word. I just gave ChatGPT a prompt of:
------
"I'm working with a new computer language. What would be the output of:
IsTrue is not true
If IsTrue is true then print IsTrue
If IsTrue is not true then print IsNotTrue"
------
It hemmed and hawed, and accurately describe what the program would do, which is basically just repeating back the last two lines to me. But refused to tell me what the output would be. When I demanded it tell me what the value of "IsTrue" would be, so I could figure out the output, I got:
"I apologize for the confusion, but as an AI language model, I don't have access to the specific values of variables in your code or the ability to execute code directly. In the given context, the value of IsTrue is not specified, so I cannot determine its exact value. It could be either true or a value that is not true, depending on how the variable is defined or assigned in your code."
I then gave it the exact same program in C#, and it unsurprisingly gave me not only a far more meaningful description of what the code does, but also the exact output - immediately.
> What makes you think there isn't a limited interpreter
Because OpenAI says there isn't, and there are open source LLMs which show the same (but less profound) ability.
> expert system effectively akin to an interpreter working behind the scenes?
You're describing reasoning again. No-one hardcoded an "expert system" for Python into ChatGPT. If it has become one, it is through reading about Python (and same for C#).
Then what would be your hypothesis on why ChatGPT is incapable of reasonining about the most utterly of trivial pseudo code examples that even the worst developer in the world would instantly understand, yet can parse and output complete and sophisticated programs in certain other languages? Another bit of evidence is how ChatGPT will habitually mixes things up (as is the very nature of LLM systems as currently programmed), yet what you will never see happen, as in 0% of the time, is it randomly mixing code in a chat response. Again, this is extremely trivially explained by the fact that code generation and chat generation are distinct systems, but otherwise...?
Also, expert system [1] needs not be in quotes. It's an 'AI' technology dating back to the 60s. It's essentially just a fancy term for hard-coding queryable domain specific knowledge into a system. As an aside, where has OpenAI claimed any of this is false? Or, for that matter, which open source LLMs produce code that's not completely buggered?
Sorry, quick clarification: I asked if you've been using GPT-4, have you? I agree that GPT-3 / "normal ChatGPT" does not have the abilities I'm talking about.
:) I heard someone say "LLMs are just lossy compression algorithms" dismissively yesterday, and I would love to hear their understanding of what a brain does.
> We also have a bad habit of changing the definition of machine intelligence when a program gets really good to claim that the problem wasn’t really that hard in the first place
We've done this so much recently that I'm now seeing rewritten definitions of "real" intelligence that most humans do not meet.