You're "disproving" the article by doing things differently to how the article d...

k2052 · on March 17, 2023

You are right that my method differed slightly so I did things again. It took me one try to find a sequence of moves that "breaks" what is claimed. You just have to make odd patterns of moves and it clearly has no understanding of the position.

Here is the convo:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Alright, I'm ready to play! Please give me the move sequence.

me: 1. e3 Nf6 2. f4 d6 3. e4

ChatGPT: My next move as black would be 3... e5

Completely ignoring the hanging pawn.This is not the play of a 1400 elo player. It is the play of something predicting patterns.

I ran a bunch of experiments in the past where I played normal moves and ChatGPT does respond extraordinarily well. With the right prompts and sequences you can get it to play like a strong grandmaster. But it is a "trick" you are getting it to perform by choosing good data and prompts. It is impressive but it is not doing what is claimed by the article.

nwienert · on March 17, 2023

I'll add in as someone new to chess (~800 ELO):

ChatGPT is in no way 1400, or even close to it. The fact this article gets upvoted around here is proof that people aren't thinking clearly about this stuff. It's trivially easy to prove it wrong. Live unbelievably so, I tried the same prompt and within 12 moves it made multiple ridiculous errors I never would, and then an illegal move.

Keep in mind a 1400 level player would need to basically make 0 mistakes that bad in a typical game, and further would need to play 30-50 moves in that fashion, with the final moves being some of the most important and hard to do. There's just no way it's even close, my guess would be even if you correct it's many errors, it's something like ~200 ELO. Pure FUD.

The author of this article is cashing in the hype and I'm wondering how they even got the results they did.

babel_ · on March 17, 2023

They probably got them. The problem is that it's difficult to repeat, thanks to temperature, meaning users will get a random spread of outcomes. Today, someone got a legal game. Tomorrow, someone might get a grandmaster level game. But then everyone else trying to repeat or leverage this ends up with worse luck and gets illegal moves or, if they're lucky, moves that make sense in a limited context (such as related to specific gambits etc) but have no role in longer-term play.

Urist-Green · on March 17, 2023

With the big caveat that I'm not into chess, but I have heard that higher level play is extremely pattern based. Seems like ChatGPT would work well as long as you stick to patterns that people have studied and documented. Less optimal play would be more random and thus break from the patterns ChatGPT would have picked up from its training corpus.

echelon · on March 17, 2023

Criticisms like this are exactly how the model will grow multimodal support for chess moves.

Keep poking it and criticizing it. Microsoft and OpenAI are on HN and they're listening. They'd find nothing more salient to tout full chess support in their next release or press conference.

With zero effort the thing understands uber domain specific chess notation and the human prompt to play a game. To think it stops here is wild.

People are hyping it because they want to get involved. They want to see the crazy and exciting future this leads to.

jerf · on March 17, 2023

I doubt they'll pursue this. There is no advantage to it. ChatGPT will never beat Stockfish, and Stockfish would do it on a ludicrously small fraction of the resources. It would send the wrong message.

Some future AI might, but a language model won't.

flir · on March 17, 2023

My uber-obscure question that guaranteed a confident hallucination got fixed in the next update after I mentioned it. Probably just a coincidence.

throwwwaway69 · on March 17, 2023

He literally used the same prompt as the article.

Claim: "ChatGPT's Chess Elo is 1400"

Reality: ChatGPT gives illegal moves (this happened to article author too), something a 1400 ranked player would never do

Result: ChatGPT's rank is not 1400.

erulabs · on March 17, 2023

No, the author of the article specifically says that the entire move sequence should be supplied to chatGPT each time, not simply the next move. Be very careful when "disproving" an experiment with squinted eyes.

throwwwaway69 · on March 17, 2023

I'm not really sure what to say here. Both the parent commenter and the author of the article had issues with ChatGPT supplying illegal moves. Both methods resulted in this. It sort of doesn't matter how we're trying to establish that it's a 1400 level player, there's no defined correct way to do this. Regardless of method we've disproven it's a 1400 level player due to these illegal moves.

tedsanders · on March 18, 2023

The #1 misconception when working with large language models is thinking that a capability is a property of the model, rather than the model + input. It may be simultaneously true that ChatGPT has an elo of 100 when given a conversational message and an elo of 1400 when given an optimized message (e.g., strings that resemble chess games, with many examples present in the conversation).

Understanding this concept is crucial for getting good results out of large language models.

whimsicalism · on March 17, 2023

> Regardless of method we've disproven it's a 1400 level player due to these illegal moves.

Explain your thought process here further if you don't mind.

pattrn · on March 17, 2023

I think his point is that 1400 level players don't make illegal moves, therefore ChatGPT is not playing at the level of a 1400 level player.

whimsicalism · on March 17, 2023

Think blindfolded 1400 players, which is what this effectively is, would make illegal moves.

But even if it doesn't play like human 1400 players, if it can get to a 1400 elo while resigning games it makes illegal moves on, that seems 1400 level to me. And i bet that some 1400s do occasionally make illegal moves (missing pins) while playing otb

throwwwaway69 · on March 17, 2023

This isn't really an apt metaphor. Firstly because higher level blindfolded players, when trained to play with a blindfold, also virtually never make mistakes. Secondly because a computer has permanent concrete state management (compared to humans) and can, without error, keep a perfect representation of a chess if it chooses to do so.

whimsicalism · on March 17, 2023

1400 FIDE !=. high level blindfolded player.

DSMan195276 · on March 17, 2023

Personally I think the illegal moves are irreverent, the fact that it doesn't play exactly like a typical 1400 doesn't mean it can't have a 1400 rating. Rating is purely determined by wins and losses against opponents, it doesn't matter if you lose a game by checkmate, resignation, or playing an illegal move.

That's not to say ChatGPT can play at 1400, just that that playing in an odd way doesn't determine its rating.

throwwwaway69 · on March 17, 2023

This is like saying I play at a 2900 level if you just ignore all the times I lose.

DSMan195276 · on March 19, 2023

No it's not, we're not ignoring losses or illegal moves at all, they are counted as losses and that's how you arrive at 1400.

It's a (theoretically) 1400 player which plays significantly better then 1400 when it knows the lines, but makes bad or illegal moves when it doesn't, and that play averages out to be around your typical 1400 player. Functionally is just what a 1400 player already is, but with higher extremes and lower lows.

vidarh · on March 18, 2023

The article does not ignore the losses. In fact, it used a rule stricter than FIDE rules to trigger losses on illegal moves.

unyttigfjelltol · on March 17, 2023

The author said ChatGPT gives illegal moves. So, a quirky sort of 'grandmaster'. He considered illegal moves to be a resignation. Maybe you need to tell ChatGPT that the alternatives are to win via legal moves, and if it is not possible to do so, to resign? Does that fix it?

mynameisvlad · on March 17, 2023

> something a 1400 ranked player would never do

The fact that rules and articles exist describing what to do if you or your opponent makes an illegal move indicates this is not the case.

Humans are also... human. They make mistakes. It may not happen often at 1400, but to say that it'll never happen is preposterous.

eddsh1994 · on March 17, 2023

I can’t remember the last time I played an illegal move tbf, and I’ve played 7 games of chess this morning already to give you an idea of total games played

mynameisvlad · on March 17, 2023

You have never made an illegal move, ever?

The bar isn’t “I didn’t make an illegal move this morning” it’s “something a 1400 ranked player would never do”.

My entire point is that it happens. Not often, but also not “never”.

pattrn · on March 17, 2023

This argument is pretty flimsy. ChatGPT makes illegal moves frequently. In all my years of playing competitive chess (from 1000 to 2200), I have never seen an illegal move. I'm sure it has happened to someone, but it's extremely rare. ChatGPT does it all the time. No one is arguing that humans never make illegal moves; they're arguing that ChatGPT makes illegal moves at a significantly higher rate than a 1400 player does (therefore ChatGPT does not have a 1400 rating).

Edit: Without reading everything again, I'll assume someone said "never." They're probably assuming the reader understands that "never" really means "with an infinitesimal probability," since we're talking about humans. If you're trying to argue that "some 1400 player has made an illegal move at some point," then I agree with that statement, and I also think it's irrelevant since the frequency of illegal moves made by ChatGPT compared to the frequency of illegal moves made by a 1400 rated player is many orders of magnitudes higher.

mynameisvlad · on March 17, 2023

> No one is arguing that humans never make illegal moves

> something a 1400 ranked player would never do

> fine, fair, "never" was too much.

I mean, yes they were and they said as much after I called them out on it. But go off on how nobody is arguing the literal thing that was being argued.

It's not like messages are threaded or something, and read top-down. You would have 100% had to read the comment I replied to first.

pattrn · on March 17, 2023

You have twice removed the substance of an argument and responded to an irrelevant nitpick. Here's what the OP said:

> He literally used the same prompt as the article. > Claim: "ChatGPT's Chess Elo is 1400"

> Reality: ChatGPT gives illegal moves (this happened to article author too),

> something a 1400 ranked player would never do

> Result: ChatGPT's rank is not 1400.

This is a completely fair argument that makes perfect sense to anyone with knowledge of competitive chess. I have never seen a 1400 make an illegal move. He probably hasn't either. Your point is literally correct in the sense that at some point in history a 1400 rated player has made an illegal move, but it completely misses the point of his argument: ChatGPT makes illegal moves at such an astronomically high rate that it wouldn't even be allowed to even play competitively, hence it cannot be accurately assessed at 1400 rating.

Imagine you made a bot that spewed random letters and said "My bot writes English as well as a native speaker, so long as you remove all of the letters that don't make sense." A native English speaker says, "You can't say the bot speaks English as well as a native speaker, since a native speaker would never write all those random letters." You would be correct in pointing out that sometimes native speakers make mistakes, but you would also be entirely missing the point. That's what's happening here.

pattrn · on March 17, 2023

> Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

You seem to have missed the part where I said multiple times that a 1400 has definitely made illegal moves.

> In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I actually said the exact opposite. You're responding to an argument I didn't make.

> I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

The commenter's throwaway account never agreed it was an extreme. I agreed it was an extreme, but also that disproving that one extreme does nothing to contradict his argument. Yet again you aren't responding to the argument.

This entire exchange is baffling. You seem to be missing the point for a third time, and now you're misrepresenting what I said. Welcome to the internet, I guess.

mynameisvlad · on March 17, 2023

> The commenter's throwaway account never agreed it was an extreme.

> fine, fair, "never" was too much.

This is the second time I've had to do this. Do you just pretend things weren't said or do you actually have trouble reading the comments that have been here for hours? You make these grand assertions which are disproven by... reading the things that are directly above your comment.

> This entire exchange is baffling.

Yeah your inability to read comments multiple times in a row is extremely baffling.

As I said before:

> Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

throwwwaway69 · on March 17, 2023

> The commenter's throwaway account never agreed it was an extreme.

I did, two hours ago, 6 minutes after your comment

https://hackernews.hn/item?id=35201830

pattrn · on March 17, 2023

Thanks! I appreciate it.

mynameisvlad · on March 17, 2023

> I have never seen a 1400 make an illegal move.

Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

eddsh1994 · on March 17, 2023

No I definitely have, it’s just so rare I can’t remember when I last did it. I do remember playing one in a blitz tournament 20 years ago! But if this is the first game they played, or if it happens in 1/10 matches, that’s wild

ipaddr · on March 17, 2023

A broken clock is correct two times a day. But my broken clock isn't 1400 player although it might seem to be.

mynameisvlad · on March 17, 2023

Does that somehow prove the assertion of "something a 1400 ranked player would never do"?

Because all I'm hearing is talk about ChatGPT's abilities as a reply to me calling out an extreme statement as being extreme. Something the parent comment even admitted as being overly black and white.

bcrosby95 · on March 17, 2023

Prove to me your clock is broken, I think it's just telling the future.

Jensson · on March 18, 2023

I asked the clock over and over, and after 5 hours it gave the right time, proof the clock can learn!

PaulHoule · on March 17, 2023

I read an article about a pro player who castled twice in a game and my son hates castling so I make a point of castling twice as often as I can to tease him and attempting other illegal moves as a joke but he never ends the game because of it.

If I was playing that monstrosity though I would play something crazy that is far out of the opening book and count on it making an illegal move.

SamBam · on March 18, 2023

I trivially made it make an illegal move it my very first game, on the third move, just by deliberately playing weird moves:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

No, it's ridiculous to say "oh, a blindfolded human might sometimes make a mistake." No, this is trivially easy to make it make a mistake. It has no internal chess model at all, it's just read enough chess games to be able to copy common patterns.

throwwwaway69 · on March 17, 2023

fine, fair, "never" was too much. posting link to this comment to not repeat same discussion twice

https://hackernews.hn/item?id=35201037

PaulHoule · on March 17, 2023

It’s super scary how ChatGPT brings out people who are veeeery good at seeing the Emperor’s clothes.

YeGoblynQueenne · on March 18, 2023

You know, I didn't remember the story very well so I checked wikipedia. Here's what it says about the (start of) the plot:

>> Two swindlers arrive at the capital city of an emperor who spends lavishly on clothing at the expense of state matters. Posing as weavers, they offer to supply him with magnificent clothes that are invisible to those who are stupid or incompetent. The emperor hires them, and they set up looms and go to work. A succession of officials, and then the emperor himself, visit them to check their progress. Each sees that the looms are empty but pretends otherwise to avoid being thought a fool.

So everyone "pretends otherwise to avoid being thought a fool".

Huh. I guess that explains it. Good metaphor.

z3c0 · on March 17, 2023

They are disproving an assertion. Demonstrating that an alternate approach implodes the assertion is a perfectly acceptable route, especially when the original approach was cherry-picking successes and throwing out failures.

I wish I could just make bullshit moves and get a higher chess ranking. Sounds nice.

whimsicalism · on March 17, 2023

I disagree. If there is a procedure for getting ChatGPT to play chess accurately and you discard that and do some naive approach as a way of disproving the article, doesn't sound to me like you have disproven anything.

I dont understand the point of your second sentence, seems to be entirely missing the substance of the conversation.

z3c0 · on March 18, 2023

The gymnastics you GPT True Believers go through to make this stuff "work" are really something else.

By the way - definitely read the article. But once again - I thought the methodology was bad, and thus the conclusion was bad.

whimsicalism · on March 18, 2023

I don’t think this is any crazy level of gymnastics.

But not going to keep replying, you engage online in a way that will turn lots of people you talk to away.

z3c0 · on March 19, 2023

I'll admit to having mistook your reply with another (hence the non-sequitur second half of my comment.) Apologies for my brusque tone.

vidarh · on March 17, 2023

It was not throwing out failures. It was treating even the first illegal move as a forfeit something which is stricter than FIDE rules.

z3c0 · on March 17, 2023

You can spin it that way if you want to, but the result is essentially guiding it through a brute force of the first successful playthrough it can muster.

whimsicalism · on March 17, 2023

> the result is essentially guiding it through a brute force of the first successful playthrough it can muster.

No, all unsuccessful playthroughs are resignations that impact the models ELO.

sebzim4500 · on March 17, 2023

He claims he was forfeiting every time he got an illegal move. Does no one on this website actually read the article?

Whether any of it is actually true is a different question.

z3c0 · on March 18, 2023

And it has already been stated elsewhere in the thread: an illegal move is not technically a forfeiture, so this is some heavy "giving the benefit of the doubt".

YeGoblynQueenne · on March 18, 2023

It would be interesting to see how ChatGPT would play after making the first illegal move. Would it go off the rails completely, playing an impossible game? Would it be able to play well if its move was corrected (I'm not sure how illegal moves are treated in chess; are they allowed to be taken back if play hasn't progressed?). Could it figure out it made an illegal move, if it was told it did, without specifying which one, or why it was illegal? By stopping the game as soon as an illegal move is made, the author is missing the chance to understand an important aspect of ChatGPT's ability to play chess.

I got the impression the author did this because they thought they were being fair with ChatGPT, but they're much more likely to be letting it off the hook than they seem to realise.

(Sorry about the "they"'s; I think the author is a guy but wasn't sure).