Throughout its huge GPT-5 livestream on Thursday, OpenAI confirmed off a number of charts that made the mannequin appear fairly spectacular — however in case you look intently, some graphs have been a bit bit off.
In a single, sarcastically exhibiting how properly GPT-5 does in “deception evals throughout fashions,” the dimensions is all over. For “coding deception,” for instance, the chart proven onstage says GPT-5 with pondering apparently will get a 50.0 p.c deception charge, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which one way or the other has a bigger bar. OpenAI seems to have correct numbers for this chart in its GPT-5 weblog submit, nevertheless, the place GPT-5’s deception charge is labeled as 16.5 p.c.
With this chart, OpenAI confirmed onstage that one among GPT-5’s scores is decrease than o3’s however is proven with an even bigger bar. On this similar chart, o3 and GPT-4o’s scores are totally different however proven with equally-sized bars. It was dangerous sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup,” although he famous {that a} right model is in OpenAI’s weblog submit.
An OpenAI advertising staffer additionally apologized, saying, “We mounted the chart within the weblog guys, apologies for the unintentional chart crime.”
OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to truly make the charts, it’s nonetheless not an important search for the corporate on its huge launch day — particularly when it’s touting the “important advances in lowering hallucinations” with its new mannequin.