Is it reasonable to imagine that LLMs should be able to play chess? I feel like we’re expending a whole lot of effort trying to distort a screwdriver until it looks like a spanner, and then wondering why it won’t grip bolts very well. // Why should a language model be good at chess or similar numerical/analytical tasks? // In what way does language resemble chess?
Logic + ability to correctly interpret the input.
Tasks like maths, chess etc., are good gauges for logical reasoning; and logical reasoning is essential for LMs to perform as language models - because it’s the difference between true statements and grammatically correct + semantically meaningless slop. As such, the LLM doesn’t need to play chess specially well, but it should be able to provide you not-blatantly-stupid moves.
It should be also understand the position correctly, based on Forsyth-Edwards notation; that’s linguistic ability at its finest.
Among all four models that I tested, only GPT provided a somewhat satisfactory answer (once you disregard that promoting the pawn to queen means to be checkmated in the next move; I’ve seen potatoes and HN users with more depth of reasoning).
From HN comments:
Logic + ability to correctly interpret the input.
Tasks like maths, chess etc., are good gauges for logical reasoning; and logical reasoning is essential for LMs to perform as language models - because it’s the difference between true statements and grammatically correct + semantically meaningless slop. As such, the LLM doesn’t need to play chess specially well, but it should be able to provide you not-blatantly-stupid moves.
It should be also understand the position correctly, based on Forsyth-Edwards notation; that’s linguistic ability at its finest.
Among all four models that I tested, only GPT provided a somewhat satisfactory answer (once you disregard that promoting the pawn to queen means to be checkmated in the next move; I’ve seen potatoes and HN users with more depth of reasoning).