If you follow AI products such as ChatGPT and Gemini, you also have to realize one of the sad realities about them. We can’t have advanced AI without proper training, and the training process involves exposing the AI to tons of high-quality data. Another reality is that I, as a ChatGPT user, do not want the personal data in my chats with the AI to help train better models that could be even more useful. Similarly, copyright content owners aren’t happy with AI firms training their chatbots on their works without consent. Yet it’s something that happens all the time. Also, some AI companies might not want to spend the money to obtain consent when they can get the data from shadier corners of the internet.
It’s not just OpenAI that has to face copyright lawsuits, as Meta is fighting its own AI-related copyright infringement case. While the class action suit against Meta isn’t surprising, the revelations that have come from it shed more light on the kind of data AI models like Meta AI use.
Meta reportedly downloaded as much as 82TB of pirated books from illegal sources to train its AI. The figure comes from alleged communications between Meta employees that came to light in the lawsuit. It follows Meta’s admission that it torrented tens of millions of pirated books.
The post Meta allegedly used 82TB of stolen books to train its AI appeared first on BGR.
Today’s Top Deals
Best Apple Watch deals for February 2025
Today’s deals: $99 AirPods 4, $19 3-in-1 wireless charging station, $33 Blink Video Doorbell, more
Amazon gift card deals, offers & coupons 2025: Get $500+ free
Today’s deals: $20 Amazon credit, $4 Anker UDB-C cables, $30 Magic Bullet, $160 TP-Link mesh WiFi, more
Meta allegedly used 82TB of stolen books to train its AI originally appeared on BGR.com on Mon, 10 Feb 2025 at 12:07:00 EDT. Please see our terms for use of feeds.