• Phil@awful.systems
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 hours ago

    So it looks like Mr. “Not consistently candid” has been at it again?

    I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.