I asked this one in the LLMs provided by DDG. The only one to correctly “interpret” the board was GPT-4o mini; Claude and Llama proposed the wrong moves, and Mixtral behaved as if whites were to play. GPT suggested to promote the pawn to a queen. ¬.¬
Lichess board analysis was however quick to point out the right move:
From HN comments:
Logic + ability to correctly interpret the input.
Tasks like maths, chess etc., are good gauges for logical reasoning; and logical reasoning is essential for LMs to perform as language models - because it’s the difference between true statements and grammatically correct + semantically meaningless slop. As such, the LLM doesn’t need to play chess specially well, but it should be able to provide you not-blatantly-stupid moves.
It should be also understand the position correctly, based on Forsyth-Edwards notation; that’s linguistic ability at its finest.
Among all four models that I tested, only GPT provided a somewhat satisfactory answer (once you disregard that promoting the pawn to queen means to be checkmated in the next move; I’ve seen potatoes and HN users with more depth of reasoning).