I ask this chess puzzle to every new LLM

RSS Bot@lemmy.bestiver.se · 1 day ago

I ask this chess puzzle to every new LLM

Lvxferre · edit-2 23 hours ago

From HN comments:

Is it reasonable to imagine that LLMs should be able to play chess? I feel like we’re expending a whole lot of effort trying to distort a screwdriver until it looks like a spanner, and then wondering why it won’t grip bolts very well. // Why should a language model be good at chess or similar numerical/analytical tasks? // In what way does language resemble chess?

Logic + ability to correctly interpret the input.

Tasks like maths, chess etc., are good gauges for logical reasoning; and logical reasoning is essential for LMs to perform as language models - because it’s the difference between true statements and grammatically correct + semantically meaningless slop. As such, the LLM doesn’t need to play chess specially well, but it should be able to provide you not-blatantly-stupid moves.

It should be also understand the position correctly, based on Forsyth-Edwards notation; that’s linguistic ability at its finest.

Among all four models that I tested, only GPT provided a somewhat satisfactory answer (once you disregard that promoting the pawn to queen means to be checkmated in the next move; I’ve seen potatoes and HN users with more depth of reasoning).