• Lvxferre
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    23 hours ago

    From HN comments:

    Is it reasonable to imagine that LLMs should be able to play chess? I feel like we’re expending a whole lot of effort trying to distort a screwdriver until it looks like a spanner, and then wondering why it won’t grip bolts very well. // Why should a language model be good at chess or similar numerical/analytical tasks? // In what way does language resemble chess?

    Logic + ability to correctly interpret the input.

    Tasks like maths, chess etc., are good gauges for logical reasoning; and logical reasoning is essential for LMs to perform as language models - because it’s the difference between true statements and grammatically correct + semantically meaningless slop. As such, the LLM doesn’t need to play chess specially well, but it should be able to provide you not-blatantly-stupid moves.

    It should be also understand the position correctly, based on Forsyth-Edwards notation; that’s linguistic ability at its finest.

    Among all four models that I tested, only GPT provided a somewhat satisfactory answer (once you disregard that promoting the pawn to queen means to be checkmated in the next move; I’ve seen potatoes and HN users with more depth of reasoning).