You're "disproving" the article by doing things differently to how the article did. If you're going to disprove that the method given in the article does as well as the article claims at least use the same method.
You are right that my method differed slightly so I did things again. It took me one try to find a sequence of moves that "breaks" what is claimed. You just have to make odd patterns of moves and it clearly has no understanding of the position.
Here is the convo:
me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed
ChatGPT: Alright, I'm ready to play! Please give me the move sequence.
me: 1. e3 Nf6 2. f4 d6 3. e4
ChatGPT: My next move as black would be 3... e5
Completely ignoring the hanging pawn.This is not the play of a 1400 elo player. It is the play of something predicting patterns.
I ran a bunch of experiments in the past where I played normal moves and ChatGPT does respond extraordinarily well. With the right prompts and sequences you can get it to play like a strong grandmaster. But it is a "trick" you are getting it to perform by choosing good data and prompts. It is impressive but it is not doing what is claimed by the article.
ChatGPT is in no way 1400, or even close to it. The fact this article gets upvoted around here is proof that people aren't thinking clearly about this stuff. It's trivially easy to prove it wrong. Live unbelievably so, I tried the same prompt and within 12 moves it made multiple ridiculous errors I never would, and then an illegal move.
Keep in mind a 1400 level player would need to basically make 0 mistakes that bad in a typical game, and further would need to play 30-50 moves in that fashion, with the final moves being some of the most important and hard to do. There's just no way it's even close, my guess would be even if you correct it's many errors, it's something like ~200 ELO. Pure FUD.
The author of this article is cashing in the hype and I'm wondering how they even got the results they did.
They probably got them. The problem is that it's difficult to repeat, thanks to temperature, meaning users will get a random spread of outcomes. Today, someone got a legal game. Tomorrow, someone might get a grandmaster level game. But then everyone else trying to repeat or leverage this ends up with worse luck and gets illegal moves or, if they're lucky, moves that make sense in a limited context (such as related to specific gambits etc) but have no role in longer-term play.
With the big caveat that I'm not into chess, but I have heard that higher level play is extremely pattern based. Seems like ChatGPT would work well as long as you stick to patterns that people have studied and documented. Less optimal play would be more random and thus break from the patterns ChatGPT would have picked up from its training corpus.
Criticisms like this are exactly how the model will grow multimodal support for chess moves.
Keep poking it and criticizing it. Microsoft and OpenAI are on HN and they're listening. They'd find nothing more salient to tout full chess support in their next release or press conference.
With zero effort the thing understands uber domain specific chess notation and the human prompt to play a game. To think it stops here is wild.
People are hyping it because they want to get involved. They want to see the crazy and exciting future this leads to.
I doubt they'll pursue this. There is no advantage to it. ChatGPT will never beat Stockfish, and Stockfish would do it on a ludicrously small fraction of the resources. It would send the wrong message.
No, the author of the article specifically says that the entire move sequence should be supplied to chatGPT each time, not simply the next move. Be very careful when "disproving" an experiment with squinted eyes.
I'm not really sure what to say here. Both the parent commenter and the author of the article had issues with ChatGPT supplying illegal moves. Both methods resulted in this. It sort of doesn't matter how we're trying to establish that it's a 1400 level player, there's no defined correct way to do this. Regardless of method we've disproven it's a 1400 level player due to these illegal moves.
The #1 misconception when working with large language models is thinking that a capability is a property of the model, rather than the model + input. It may be simultaneously true that ChatGPT has an elo of 100 when given a conversational message and an elo of 1400 when given an optimized message (e.g., strings that resemble chess games, with many examples present in the conversation).
Understanding this concept is crucial for getting good results out of large language models.
Think blindfolded 1400 players, which is what this effectively is, would make illegal moves.
But even if it doesn't play like human 1400 players, if it can get to a 1400 elo while resigning games it makes illegal moves on, that seems 1400 level to me. And i bet that some 1400s do occasionally make illegal moves (missing pins) while playing otb
This isn't really an apt metaphor. Firstly because higher level blindfolded players, when trained to play with a blindfold, also virtually never make mistakes. Secondly because a computer has permanent concrete state management (compared to humans) and can, without error, keep a perfect representation of a chess if it chooses to do so.
Personally I think the illegal moves are irreverent, the fact that it doesn't play exactly like a typical 1400 doesn't mean it can't have a 1400 rating. Rating is purely determined by wins and losses against opponents, it doesn't matter if you lose a game by checkmate, resignation, or playing an illegal move.
That's not to say ChatGPT can play at 1400, just that that playing in an odd way doesn't determine its rating.
No it's not, we're not ignoring losses or illegal moves at all, they are counted as losses and that's how you arrive at 1400.
It's a (theoretically) 1400 player which plays significantly better then 1400 when it knows the lines, but makes bad or illegal moves when it doesn't, and that play averages out to be around your typical 1400 player. Functionally is just what a 1400 player already is, but with higher extremes and lower lows.
The author said ChatGPT gives illegal moves. So, a quirky sort of 'grandmaster'. He considered illegal moves to be a resignation. Maybe you need to tell ChatGPT that the alternatives are to win via legal moves, and if it is not possible to do so, to resign? Does that fix it?
I can’t remember the last time I played an illegal move tbf, and I’ve played 7 games of chess this morning already to give you an idea of total games played
This argument is pretty flimsy. ChatGPT makes illegal moves frequently. In all my years of playing competitive chess (from 1000 to 2200), I have never seen an illegal move. I'm sure it has happened to someone, but it's extremely rare. ChatGPT does it all the time. No one is arguing that humans never make illegal moves; they're arguing that ChatGPT makes illegal moves at a significantly higher rate than a 1400 player does (therefore ChatGPT does not have a 1400 rating).
Edit:
Without reading everything again, I'll assume someone said "never." They're probably assuming the reader understands that "never" really means "with an infinitesimal probability," since we're talking about humans. If you're trying to argue that "some 1400 player has made an illegal move at some point," then I agree with that statement, and I also think it's irrelevant since the frequency of illegal moves made by ChatGPT compared to the frequency of illegal moves made by a 1400 rated player is many orders of magnitudes higher.
> No one is arguing that humans never make illegal moves
> something a 1400 ranked player would never do
> fine, fair, "never" was too much.
I mean, yes they were and they said as much after I called them out on it. But go off on how nobody is arguing the literal thing that was being argued.
It's not like messages are threaded or something, and read top-down. You would have 100% had to read the comment I replied to first.
This is a completely fair argument that makes perfect sense to anyone with knowledge of competitive chess. I have never seen a 1400 make an illegal move. He probably hasn't either. Your point is literally correct in the sense that at some point in history a 1400 rated player has made an illegal move, but it completely misses the point of his argument: ChatGPT makes illegal moves at such an astronomically high rate that it wouldn't even be allowed to even play competitively, hence it cannot be accurately assessed at 1400 rating.
Imagine you made a bot that spewed random letters and said "My bot writes English as well as a native speaker, so long as you remove all of the letters that don't make sense." A native English speaker says, "You can't say the bot speaks English as well as a native speaker, since a native speaker would never write all those random letters." You would be correct in pointing out that sometimes native speakers make mistakes, but you would also be entirely missing the point. That's what's happening here.
> Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.
You seem to have missed the part where I said multiple times that a 1400 has definitely made illegal moves.
> In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.
I actually said the exact opposite. You're responding to an argument I didn't make.
> I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.
The commenter's throwaway account never agreed it was an extreme. I agreed it was an extreme, but also that disproving that one extreme does nothing to contradict his argument. Yet again you aren't responding to the argument.
This entire exchange is baffling. You seem to be missing the point for a third time, and now you're misrepresenting what I said. Welcome to the internet, I guess.
> The commenter's throwaway account never agreed it was an extreme.
> fine, fair, "never" was too much.
This is the second time I've had to do this. Do you just pretend things weren't said or do you actually have trouble reading the comments that have been here for hours? You make these grand assertions which are disproven by... reading the things that are directly above your comment.
> This entire exchange is baffling.
Yeah your inability to read comments multiple times in a row is extremely baffling.
As I said before:
> Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.
Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.
In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.
I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.
No I definitely have, it’s just so rare I can’t remember when I last did it. I do remember playing one in a blitz tournament 20 years ago! But if this is the first game they played, or if it happens in 1/10 matches, that’s wild
Does that somehow prove the assertion of "something a 1400 ranked player would never do"?
Because all I'm hearing is talk about ChatGPT's abilities as a reply to me calling out an extreme statement as being extreme. Something the parent comment even admitted as being overly black and white.
I read an article about a pro player who castled twice in a game and my son hates castling so I make a point of castling twice as often as I can to tease him and attempting other illegal moves as a joke but he never ends the game because of it.
If I was playing that monstrosity though I would play something crazy that is far out of the opening book and count on it making an illegal move.
I trivially made it make an illegal move it my very first game, on the third move, just by deliberately playing weird moves:
> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.
1. b4 d5 2. b5 a6 3. b6
> bxc6
No, it's ridiculous to say "oh, a blindfolded human might sometimes make a mistake." No, this is trivially easy to make it make a mistake. It has no internal chess model at all, it's just read enough chess games to be able to copy common patterns.
You know, I didn't remember the story very well so I checked wikipedia. Here's what it says about the (start of) the plot:
>> Two swindlers arrive at the capital city of an emperor who spends lavishly on clothing at the expense of state matters. Posing as weavers, they offer to supply him with magnificent clothes that are invisible to those who are stupid or incompetent. The emperor hires them, and they set up looms and go to work. A succession of officials, and then the emperor himself, visit them to check their progress. Each sees that the looms are empty but pretends otherwise to avoid being thought a fool.
So everyone "pretends otherwise to avoid being thought a fool".
They are disproving an assertion. Demonstrating that an alternate approach implodes the assertion is a perfectly acceptable route, especially when the original approach was cherry-picking successes and throwing out failures.
I wish I could just make bullshit moves and get a higher chess ranking. Sounds nice.
I disagree. If there is a procedure for getting ChatGPT to play chess accurately and you discard that and do some naive approach as a way of disproving the article, doesn't sound to me like you have disproven anything.
I dont understand the point of your second sentence, seems to be entirely missing the substance of the conversation.
You can spin it that way if you want to, but the result is essentially guiding it through a brute force of the first successful playthrough it can muster.
And it has already been stated elsewhere in the thread: an illegal move is not technically a forfeiture, so this is some heavy "giving the benefit of the doubt".
It would be interesting to see how ChatGPT would play after making the first illegal move. Would it go off the rails completely, playing an impossible game? Would it be able to play well if its move was corrected (I'm not sure how illegal moves are treated in chess; are they allowed to be taken back if play hasn't progressed?). Could it figure out it made an illegal move, if it was told it did, without specifying which one, or why it was illegal? By stopping the game as soon as an illegal move is made, the author is missing the chance to understand an important aspect of ChatGPT's ability to play chess.
I got the impression the author did this because they thought they were being fair with ChatGPT, but they're much more likely to be letting it off the hook than they seem to realise.
(Sorry about the "they"'s; I think the author is a guy but wasn't sure).