OpenAI released its o3 model in December, bragging about the model’s unparalleled ability to do math and science problems. The model’s success on the FrontierMath benchmark — solving 25.2% of…
So it looks like Mr. “Not consistently candid” has been at it again?
I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.
So it looks like Mr. “Not consistently candid” has been at it again?
I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.