east
Screwed up... till tha casket drops!!
...but not the coli, abw, boxden, nt, or the rest of the multiverse 
list here: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf
www.dropsitenews.com


list here: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf
Meta has scraped data from the most-trafficked domains on the internet —including news organizations, education platforms, niche forums, personal blogs, and even revenge porn sites—to train its artificial intelligence models, according to a leaked list obtained by Drop Site News. By scraping data from roughly 6 million unique websites, including 100,000 of the top-ranked domains, Meta has generated millions of pages of content to use for Meta’s AI-training pipeline.
The sites that Meta scrapes consist of copyrighted content, pirated content, and adult videos, some of whose content is potentially illegally obtained or recorded, as well as news and original content from prominent outlets and content publishers. They include mainstream businesses like Getty Images, Shopify, Shutterstock, but also extreme pornographic content, including websites advertising explicit sexual content and humiliation porn that exploits teenagers.
While high-profile sites like The New York Times, which has engaged in litigation to prevent their content from being used to train AI models, are absent from the list, the leak shows that Meta often found ways to stop sites from defending themselves from being scraped. The scrapers ignored common web protocols that site owners use to block automated scraping, including “robots.txt” which is a text file placed on websites aimed at preventing the indexing of context. The data were shared with Drop Site by whistleblowers frustrated over Meta’s support for Israel in conducting its genocide in the Gaza Strip. According to the whistleblowers, the data is indicative of Meta’s unethical and potentially illegal business practices more broadly.

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI
The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal.

Last edited: