Leaked list reveals websites Meta is scraping to train its AI (LSA, many Black porn sites among them)

east

Screwed up... till tha casket drops!!
Joined
Aug 5, 2012
Messages
5,633
Reputation
4,955
Daps
18,304
Reppin
The Bronx ➡️ New England
...but not the coli, abw, boxden, nt, or the rest of the multiverse :mjlol:

tcOmTCk.png


list here: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf

Meta has scraped data from the most-trafficked domains on the internet —including news organizations, education platforms, niche forums, personal blogs, and even revenge porn sites—to train its artificial intelligence models, according to a leaked list obtained by Drop Site News. By scraping data from roughly 6 million unique websites, including 100,000 of the top-ranked domains, Meta has generated millions of pages of content to use for Meta’s AI-training pipeline.

The sites that Meta scrapes consist of copyrighted content, pirated content, and adult videos, some of whose content is potentially illegally obtained or recorded, as well as news and original content from prominent outlets and content publishers. They include mainstream businesses like Getty Images, Shopify, Shutterstock, but also extreme pornographic content, including websites advertising explicit sexual content and humiliation porn that exploits teenagers.

While high-profile sites like The New York Times, which has engaged in litigation to prevent their content from being used to train AI models, are absent from the list, the leak shows that Meta often found ways to stop sites from defending themselves from being scraped. The scrapers ignored common web protocols that site owners use to block automated scraping, including “robots.txt” which is a text file placed on websites aimed at preventing the indexing of context. The data were shared with Drop Site by whistleblowers frustrated over Meta’s support for Israel in conducting its genocide in the Gaza Strip. According to the whistleblowers, the data is indicative of Meta’s unethical and potentially illegal business practices more broadly.
 
Last edited:

Pazzy

Superstar
Bushed
Joined
Jun 11, 2012
Messages
30,394
Reputation
-6,274
Daps
47,824
Reppin
NULL
...but not the coli, abw, boxden, nt, or the rest of the multiverse :mjlol:

list here: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf



They are here data collecting too. Read the terms of service
 

east

Screwed up... till tha casket drops!!
Joined
Aug 5, 2012
Messages
5,633
Reputation
4,955
Daps
18,304
Reppin
The Bronx ➡️ New England
The internet is one big data mine. Collecting information is big business. They have been trying to curate the perfect echo chamber for each individual
that's the facial purpose, there's always an ulterior one tho

In a ceremony in June at Joint Base Myer-Henderson Hall in Arlington, Va., four current and former executives from Meta, OpenAI and Palantir lined up onstage to swear an oath to support and defend the United States. At the event, they were pronounced lieutenant colonels in the Army's new technical innovation unit, Detachment 201, which will advise the Army on new technologies for potential combat.

Last year, Meta changed its policies to allow its A.I. technologies to be used for military purposes. Andrew Bosworth, Meta’s chief technology officer and one of the new lieutenant colonels in Detachment 201, said America’s “national security benefits enormously from American industry bringing these technologies to life.”

Meta declined to comment.
 

Wargames

One Of The Last Real Ones To Do It
Joined
Apr 1, 2013
Messages
29,450
Reputation
6,401
Daps
112,189
Reppin
New York City
...but not the coli, abw, boxden, nt, or the rest of the multiverse :mjlol:

tcOmTCk.png


list here: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf


Can’t have the black male AI say fukk these crackers in a middle of an answer Based on source material.

:yeshrug:


Though give it time, the AI will eventually figure it out on its own

:ufdup:
 
Top