Final week, OpenAI pulled a GPT-4o replace that made ChatGPT “overly flattering or agreeable” — and now it has defined what precisely went fallacious. In a weblog submit revealed on Friday, OpenAI stated its efforts to “higher incorporate person suggestions, reminiscence, and more energizing information” may have partly led to “tipping the scales on sycophancy.”
In these updates, OpenAI had begun utilizing information from the thumbs-up and thumbs-down buttons in ChatGPT as an “further reward sign.” Nevertheless, OpenAI stated, this may occasionally have “weakened the affect of our main reward sign, which had been holding sycophancy in examine.” The corporate notes that person suggestions “can generally favor extra agreeable responses,” doubtless exacerbating the chatbot’s overly agreeable statements. The corporate stated reminiscence can amplify sycophancy as effectively.
OpenAI says one of many “key points” with the launch stems from its testing course of. Although the mannequin’s offline evaluations and A/B testing had constructive outcomes, some professional testers recommended that the replace made the chatbot appear “barely off.” Regardless of this, OpenAI moved ahead with the replace anyway.
“Trying again, the qualitative assessments have been hinting at one thing essential, and we should always’ve paid nearer consideration,” the corporate writes. “They have been selecting up on a blind spot in our different evals and metrics. Our offline evals weren’t broad or deep sufficient to catch sycophantic habits… and our A/B assessments didn’t have the precise indicators to indicate how the mannequin was acting on that entrance with sufficient element.”
Going ahead, OpenAI says it’s going to “formally take into account behavioral points” as having the potential to dam launches, in addition to create a brand new opt-in alpha section that can enable customers to offer OpenAI direct suggestions earlier than a wider rollout. OpenAI additionally plans to make sure customers are conscious of the modifications it’s making to ChatGPT, even when the replace is a small one.