Lawyers representing The New York Times and Daily News claim OpenAI’s alleged unauthorized use of their content to train AI models without consent has been complicated by the accidental deletion of potentially relevant data by OpenAI engineers.
Last fall, OpenAI consented to provide The New York Times and Daily News with access to two digital machines, allowing lawyers for both publications to conduct targeted searches of their copyrighted content within OpenAI’s AI training datasets. Software-based computer systems, known as digital machines, operate within a laptop’s working system, primarily used for testing, data backup, and app functionality. Meanwhile, lawyers representing the publishers reveal that their team has invested over 150 hours since November 1 scrutinizing OpenAI’s training dataset.
On November 14, OpenAI’s engineers deliberately deleted all search data stored on a single server, as per the previously mentioned correspondence, which was submitted to US authorities. In the courtroom of the United States District Court for the Southern District of New York on Wednesday evening.
OpenAI’s primary objective was to enhance its knowledge base, which yielded significant financial returns. Notwithstanding the irreparable misplacement of folder construction and file names, the retrieved knowledge cannot be utilised to determine the origin of the information plaintiffs’ copied articles were used to train OpenAI’s models, as stated in the letter.
“Plaintiffs have faced unreasonable demands to redo their work entirely, wasting valuable human resources and computational power.” “Plaintiffs were recently informed that the previously obtained evidence is ineffective, requiring a substantial redo of the entire week’s worth of expert and legal analysis. This unexpected development necessitates submission of this supplementary filing.”
The plaintiffs’ counsel maintains that there is no intention to imply the deletion was deliberate. While acknowledging the incident, critics argue that it highlights OpenAI’s susceptibility to hosting dubious content through its own tools, effectively rendering it a prime location for searching and exploiting questionable data sets.
Despite repeated requests, an OpenAI spokesperson refused to provide a formal statement.
Despite being filed late on a Friday, November 22, counsel for OpenAI responded with a letter to the concerns raised by lawyers for The Times and Daily News on Wednesday, seeking to address the issues at hand. OpenAI’s legal team categorically rejected allegations that the company removed evidence from their servers, instead attributing the issue to a system configuration error on the part of the plaintiffs that caused a technical problem.
“OpenAI’s counsel submitted that the plaintiffs sought a configuration adjustment for one of several machines provided by OpenAI for searching training datasets.” “Notwithstanding plaintiffs’ requested modification, however, it led to dismantling the folder structure and some file names on one arduous drive – intended as a temporary cache. Nevertheless, no files were irreparably lost.”
OpenAI has consistently argued that its use of publicly available information, including articles from reputable sources like The Times and Daily News, constitutes fair use in cases such as this. OpenAI claims that its language models, which mimic human writing by analyzing billions of e-book excerpts and essays, do not require licensing or compensation for the examples, even if they generate revenue from these styles.
OpenAI has solidified licensing agreements with an expanding array of prominent publishing companies, including The Associated Press, Axel Springer, parent company of Business Insider, The Financial Times, Dotdash Meredith, owner of People, and News Corp. Although OpenAI has chosen not to disclose the specific terms of these agreements, it is publicly known that one partner, Dotdash, is receiving a minimum of $16 million annually for its involvement.
OpenAI has declined to confirm or deny whether it trained its AI systems on specific copyrighted materials without obtaining necessary permissions.