Meta Disussed Using Copyrighted Content for AI Training Purposes, Lawsuit Reveals

Court documents have revealed that employees at Meta have been discussing using copyrighted material obtained through legally dubious means to train its AI models.
The internal chats were revealed in Kadrey v. Meta, a case brought by a group of authors including Sarah Silverman who accuse Meta of using their protected works for machine learning purposes. Employees talked about the ethics of obtaining copies of ebooks via peer-to-peer networks, also known as torrenting.
Previous materials submitted to the case claimed that CEO Mark Zuckerberg gave his blessing to use pirated material to train Meta’s Llama AI models.
But in fresh documents, messages from employees working on Meta’s Llama AI models are revealed with one research engineer writing that it is her opinion to “ask forgiveness, not for permission” when it comes to using copyrighted material to train AI.
Senior manager Melani Kambadur concurred with the engineer saying that Meta’s lawyers are being “less conservative” than they had previously.
“Yeah we definitely need to get licenses or approvals on publicly available data still,” Kambadur said, according to the documents. “[D]ifference now is we have more money, more lawyers, more bizdev help, ability to fast track/escalate for speed, and lawyers are being a bit less conservative on approvals.”
The employees also chatted about LibGen, a website that aggregates other people’s content. Tech Crunch notes that LibGen has previously been fined tens of millions of dollars for copyright infringement. However, bosses inside Meta gave the impression that if the company did not use LibGen then they would be at a disadvantage compared to its market competitors.
Director of Product Management at Meta, Sony Theakanath, described LibGen as “essential to meet SOTA [state-of-the-art] numbers across all categories” in an email to Meta AI VP Joelle Pineau.
The documents also allege that Meta may have scraped content from Reddit while riding roughshod over past decisions on AI training so that the AI models had enough data to feed off.
Director of Product Management at Meta’s generative AI, Chaya Nayak, suggested that Facebook and Instagram posts were not enough to build topline AI models. “We need more data,” she wrote.
TechCrunch reports that Meta has added two Supreme Court litigators to its defense team from law firm Paul Weiss, in a sign of how seriously Meta is taking the litigation. Last week, PetaPixel reported that Thomson Reuters won an early victory for copyright holders after a judge granted a partial summary in favor of the company in its copyright infringement lawsuit against Ross Intelligence.
Source link