bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298

OPINION
GUEST ESSAY

The Internet Is About to Get Much Worse​

Sept. 23, 2023

By Julia Angwin

Ms. Angwin is a contributing Opinion writer and an investigative journalist.

Greg Marston, a British voice actor, recently came across “Connor” online — an A.I.-generated clone of his voice, trained on a recording Mr. Marston had made in 2003. It was his voice uttering things he had never said.

Back then, he had recorded a session for IBM and later signed a release form allowing the recording to be used in many ways. Of course, at that time, Mr. Marston couldn’t envision that IBM would use anything more than the exact utterances he had recorded. Thanks to artificial intelligence, however, IBM was able to sell Mr. Marston’s decades-old sample to websites that are using it to build a synthetic voice that could say anything. Mr. Marston recently discovered his voice emanating from the Wimbledon website during the tennis tournament. (IBM said it is aware of Mr. Marston’s concern and is discussing it with him directly.)

His plight illustrates why many of our economy’s best-known creators are up in arms. We are in a time of eroding trust, as people realize that their contributions to a public space may be taken, monetized and potentially used to compete with them. When that erosion is complete, I worry that our digital public spaces might become even more polluted with untrustworthy content.

Already, artists are deleting their work from X, formerly known as Twitter, after the company said it would be using data from its platform to train its A.I. Hollywood writers and actors are on strike partly because they want to ensure their work is not fed into A.I. systems that companies could try to replace them with. News outlets including The New York Times and CNN have added files to their website to help prevent A.I. chatbots from scraping their content.

Authors are suing A.I. outfits, alleging that their books are included in the sites’ training data. OpenAI has argued, in a separate proceeding, that the use of copyrighted data for training A.I. systems is legal under the “fair use” provision of copyright law.

While creators of quality content are contesting how their work is being used, dubious A.I.-generated content is stampeding into the public sphere. NewsGuard has identified 475 A.I.-generated news and information websites in 14 languages. A.I.-generated music is flooding streaming websites and generating A.I. royalties for scammers. A.I.-generated books — including a mushroom foraging guide that could lead to mistakes in identifying highly poisonous fungi — are so prevalent on Amazon that the company is asking authors who self-publish on its Kindle platform to also declare if they are using A.I.

This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.

We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay.

But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.


Consider, for instance, that the volunteers who build and maintain Wikipedia trusted that their work would be used according to the terms of their site, which requires attribution. Now some Wikipedians are apparently debating whether they have any legal recourse against chatbots that use their content without citing the source.

Regulators are trying to figure it out, too. The European Union is considering the first set of global restrictions on A.I., which would require some transparency from generative A.I. systems, including providing summaries of copyrighted data that was used to train its systems.

That would be a good step forward, since many A.I. systems do not fully disclose the data they were trained on. It has primarily been journalists who have dug up the murky data that lies beneath the glossy surface of the chatbots. A recent investigation detailed in The Atlantic revealed that more than 170,000 pirated books are included in the training data for Meta’s A.I. chatbot, Llama. A Washington Post investigation revealed that OpenAI’s ChatGPT relies on data scraped without consent from hundreds of thousands of websites.

But transparency is hardly enough to rebalance the power between those whose data is being exploited and the companies poised to cash in on the exploitation.

Tim Friedlander, founder and president of the National Association of Voice Actors, has called for A.I. companies to adopt ethical standards. He says that actors need three Cs: consent, control and compensation.


In fact, all of us need the three Cs. Whether we are professional actors or we just post pictures on social media, everyone should have the right to meaningful consent on whether we want our online lives fed into the giant A.I. machines.

And consent should not mean having to locate a bunch of hard-to-find opt-out buttons to click — which is where the industry is heading.

Compensation is harder to figure out, especially since most of the A.I. bots are primarily free services at the moment. But make no mistake, the A.I. industry is planning to and will make money from these systems, and when it does, there will be a reckoning with those whose works fueled the profits.

For people like Mr. Marston, their livelihoods are at stake. He estimates that his A.I. clone has already lost him jobs and will cut into his future earnings significantly. He is working with a lawyer to seek compensation. “I never agreed or consented to having my voice cloned, to see/hear it released to the public, thus competing against myself,” he told me.

But even those of us who don’t have a job directly threatened by A.I. think of writing that novel or composing a song or recording a TikTok or making a joke on social media. If we don’t have any protections from the A.I. data overgrazers, I worry that it will feel pointless to even try to create in public. And that would be a real tragedy.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298

Garbage AI on Google News​

JOSEPH COX

JAN 18, 2024 AT 9:33 AM

404 Media reviewed multiple examples of AI rip-offs making their way into Google News. Google said it doesn't focus on how an article was produced—by an AI or human—opening the way for more AI-generated articles.

The Google logo.


Google News is boosting sites that rip-off other outlets by using AI to rapidly churn out content, 404 Media has found. Google told 404 Media that although it tries to address spam on Google News, the company ultimately does not focus on whether a news article was written by an AI or a human, opening the way for more AI-generated content making its way onto Google News.

The presence of AI-generated content on Google News signals two things: first, the black box nature of Google News, with entry into Google News’ rankings in the first place an opaque, but apparently gameable, system. Second, is how Google may not be ready for moderating its News service in the age of consumer-access AI, where essentially anyone is able to churn out a mass of content with little to no regard for its quality or originality.

“I want to read the original stories written by journalists who actually researched them and spoke to primary sources. Any news junkie would,” Brian Penny, a ghostwriter who first flagged some of the seemingly AI-generated articles to 404 Media, said.

💡

Do you know about any other AI-generated content farms? I would love to hear from you. Using a non-work device, you can message me securely on Signal at +44 20 8133 5190. Otherwise, send me an email at joseph@404media.co.

One example was a news site called Worldtimetodays.com, which is littered with full page and other ads. On Wednesday it published an article about Star Wars fandom. The article was very similar to one published a day earlier on the website Distractify, with even the same author photo. One major difference, though, was that Worldtimetodays.com wrote “Let’s be honest, war of stars fans,” rather than Star Wars fans. Another article is a clear rip-off of a piece from Heavy.com, with Worldtimetodays.com not even bothering to replace the Heavy.com watermarked artwork. Gary Graves, the listed author on Worldtimetodays.com, has published more than 40 articles in a 24 hour period.

Both of these rip-off articles appear in Google News search results. The first appears when searching for “Star Wars theory” and setting the results to the past 24 hours. The second appears when searching for the subject of the article with a similar 24 hour setting.

Gallery Image

Gallery Image

LEFT: THE DISTRACTIFY ARTICLE. RIGHT: THE ARTICLE ON WORLDTIMETODAYS.COM.​

Aaron Nobel, editor-in-chief of Heavy.com, told 404 Media in an email that “I was not aware of this particular ripoff or this particular website. But over the years we've encountered many other sites that rip and republish content at scale.” Neither Distractify or Worldtimetodays.com responded to a request for comment.

There are a few different ways to use Google News. One is to simply open the main Google News homepage, where Google surfaces what it thinks are the most important stories of the day. Another is to search for a particular outlet, where you’ll then see recent stories from just that site. A third is to search by “topic,” such as “artificial intelligence,” “Taylor Swift,” or whatever it is you’re interested in. Appearing in topic searches is especially important for outlets looking to garner more attention for their writings on particular beats. 404 Media, at the time of writing does not appear in topic searches (except people, funnily enough, writing about 404 Media, like this Fast Company article about us and other worker-owned media outlets). As in, if you searched “CivitAI,” an artificial intelligence company we’ve investigated extensively, our investigations would not appear in Google News, only people aggregating our work or producing their own would.


In another example of AI-generated rip-off content, Penny sent screenshots of search results for news related to the AI tool “midjourney.” At one point, those included articles from sites such as “WatchdogWire” and “Examiner.com.” These articles appear to use the same images, very similar or identical headlines, and pockets of similar text.

The Examiner.com domain was once used by a legitimate news service and went through various owners and iterations. The site adopted its current branding in around 2022, according to archived versions of the site on the Wayback Machine. With that in mind, it’s worth remembering that some of these sites that more recently pivoted to AI-generated content may have been accepted into Google News long ago, even before the advent of consumer-level AI.

Gallery Image

Gallery Image

Gallery Image

A SERIES OF GOOGLE NEWS SCREENSHOTS PROVIDED BY PENNY.​

Looking at WatchdogWire and Examiner.com more broadly, both sites regularly publish content with the same art and identical or very similar headlines in quick succession every day. Ahmed Baig, one of the listed authors on WatchdogWire, has published more than 500 articles in the past 30 days, according to his author page. Baig did not respond to a request for comment sent over LinkedIn asking whether he was taking work from other outlets and using AI to reword them. Baig lists himself as the editor-in-chief of WatchdogWire, as well as the head of SEO for a company called Sproutica. A contact email for Examiner.com uses the Sproutica domain.

Someone who replied to a request for comment to that address, and who signed off as “Nabeel,” confirmed Examiner.com is using AI to copy other peoples’ articles. “Sometimes it doesn’t perform well by answering out of context text, therefore, my writer proofread the content,” they wrote. “It's an experiment for now which isn't responding as expected in terms of Google Search. Despite publishing 400+ stories it attracted less than 1000 visits.”

The articles on WatchdogWire and Examiner.com are almost always very similar to those published on Watcher.Guru, another news site which also has a popular Twitter account with 2.1 million followers and which regularly goes viral on the platform. When asked if Watcher.Guru has any connection to WatchdogWire or Examiner.com, a person in control of the Watcher.Guru Twitter account told 404 Media in a direct message that “we are not affiliated with these sites. These sites are using AI to steal our content and featured images.”

In another case, Penny sent a screenshot of a Google News result that showed articles from CBC and another outlet called “PiPa News.” The PiPa News piece appears to be a rewrite of the CBC one, with a very similar headline and body of text. PiPa News did not respond to an emailed request for comment. Kerry Kelly from CBC’s public affairs department, said in an email that “We are aware of an increase in outlets and individuals using CBC News articles without proper licensing or attribution, and are working to curb this trend through media monitoring, takedown requests for individual sites, and connecting with social media platforms when appropriate.”




A SCREENSHOT OF WATCHER.GURU'S WEBSITE ON THURSDAY.​




A SCREENSHOT OF EXAMINER.COM'S WEBSITE ON THURSDAY.​

A Google spokesperson said the company focuses on the quality of the content, and not how it was created. Their statement read: “Our focus when ranking content is on the quality of the content, rather than how it was produced. Automatically-generated content produced primarily for ranking purposes is considered spam, and we take action as appropriate under our policies.” Google reiterated that websites are automatically considered for Google News, and that it can take time for the system to identify new websites. The company added that its Google News ranking systems aim to reward original content that demonstrates things such as expertise and trustworthiness.

With that in mind, after 404 Media approached Google for comment, Penny found that the WatchdogWire and Examiner.com results had apparently been removed from search results for the “midjourney” query and another for and “stable diffusion.” Google did not respond when asked multiple times to confirm if it took any action.

404 Media remains outside of news topics results for the beats we cover.









 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298

Google Responds To Claims Of Google News Boosting Garbage AI Content​

Jan 19, 2024 - 7:51 am 0by Barry Schwartz

Filed Under Google Search Engine Optimization


Robots Writing Google Logo

Danny Sullivan, the Google Search Liaison, responded to the article from 404 Media titled Google News Is Boosting Garbage AI-Generated Articles. In short, Sullivan said that the way they filtered Google News was by date and not relevancy (relevancy filtering is the default) and that Google can always do better.

First, go read the article if you can get through the paywall. :smile:

Then here is Danny Sullivan's response, which he posted on X, Mastodon and on Bluesky:

Let me summarize before I quote:

(1) Google News is not boosting AI content

(2) The examples in the article show that the author used filters and special searches to showcase the AI content above the default settings in Google News.

(3) He said focus on quality, it is not about how it is produced but if the content is quality or not.

(4) Google is not perfect, Google will do better.

(5) Why 404 Media doesn't rank is because it is a new site and Google News needs time to trust it but Google will look for ways to improve this process for new news sites entering the market (we've been asking for this for years).

(6) 404 Media uses a paywall or subscription process for its content and thus should use the paywalled structured data so Google can understand the content and rank it.

Now here is what Danny Sullivan posted:

Jason, I’d like to clarify Google News is not somehow “boosting” AI content to the top of search results. This isn't the case. I also appreciate the frustration of a new publication like yours wanting to quickly appear in Google News. I do. It’s something I hope we’ll improve on. Here’s more on both.

Google News, like Google Search, indexes content from across the web. However, appearing in our index isn’t a guarantee content will rank well. In today’s story, the screenshots of AI content supposedly being boosted involve overriding our default ranking to manually force it higher in the results.

It’s possible to use the News tab on Google Search to show news-related content sorted by date, as opposed to by relevance, as done with the screenshots. Doing this is expressly asking our systems to ignore the regular relevance ranking they do and simply show the latest content in descending order.

As for AI content, as we’ve said before, our focus is on quality of content, not production. This shouldn’t be misinterpreted as if we’ve granted a free pass to churning out lots of low quality content. It isn't & doing so is against our policies. More here.

No automated systems are perfect, and this isn’t to say that our default ranking systems would never show sub-par content. We are also always working to improve them to ensure that quality, original journalism is successful...

As for your content appearing in Google News, it can take time for our systems to recognize & surface material from new pubs. The systems tend to want to see some period of news publishing over time. That said, we’ll look to see if we can find ways to improve the process generally to do a better job.

I’d also encourage your publication (or any pub) that has paywall or gated content to provide our access to our crawler, so we can fully understand the work your doing. This can be done in a way that it does not allow readers to bypass registration. More here.

Forum discussion at X, Mastodon and on Bluesky.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298

A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine​

Researchers warn that most of the text we view online has been poorly translated into one or more languages—usually by a machine.


By Jules Roscoe

January 17, 2024, 12:57pm
1705514234136-gettyimages-1434945213.jpeg

IMAGE:
DELMAINE DONSON VIA GETTY IMAGES


A “shocking” amount of the internet is machine-translated garbage, particularly in languages spoken in Africa and the Global South, a new study has found.

Researchers at the Amazon Web Services AI lab found that over half of the sentences on the web have been translated into two or more languages, often with increasingly worse quality due to poor machine translation (MT), which they said raised “serious concerns” about the training of large language models.

“We actually got interested in this topic because several colleagues who work in MT and are native speakers of low resource languages noted that much of the internet in their native language appeared to be MT generated,” Mehak Dhaliwal, a former applied science intern at AWS and current PhD student at the University of California, Santa Barbara, told Motherboard. “So the insight really came from the low-resource language speakers, and we did the study to understand the issue better and see how widespread it was.”

“With that said, everyone should be cognizant that content they view on the web may have been generated by a machine,” Dhaliwal added.

The study, which was submitted to the pre-print server arXiv last Thursday, generated a corpus of 6.38 billion sentences scraped from the web. It looked at patterns of multi-way parallelism, which describes sets of sentences that are direct translations of one another in three or more languages. It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages.

Like all machine learning efforts, machine translation is impacted by human bias, and skews toward languages spoken in the Western world and the Global North. Because of this, the quality of the translations varies wildly, with “low-resource” languages from places like Africa having insufficient training data to produce accurate text.

“In general, we observed that most languages tend to have parallel data in the highest-resource languages,” Dhaliwal told Motherboard in an email. “Sentences are more likely to have translations in French than a low resource language, simply by virtue of there being much more data in French than a low resource language.”

High-resource languages, like English or French, tended to have an average parallelism of 4, meaning that sentences had translational equivalents in three other languages. Low-resource languages, like the African languages Wolof or Xhosa, had an average parallelism of 8.6. Additionally, lower-resource languages tended to have much worse translations.

“We find that highly multi-way parallel translations are significantly lower quality than 2-way parallel translation,” the researchers state in the paper. “The more languages a sentence has been translated into, the lower quality the translations are, suggesting a higher prevalence of machine translation.”

In highly multi-way parallel languages, the study also found a selection bias toward shorter, “more predictable” sentences of between 5-10 words. Because of how short the sentences were, researchers found it difficult to characterize their quality. However, “searching the web for the sentences was enlightening,” the study stated. “The vast majority came from articles that we characterized as low quality, requiring little or no expertise or advance effort to create, on topics like being taken more seriously at work, being careful about your choices, six tips for new boat owners, deciding to be happy, etc.”

The researchers argued that the selection bias toward short sentences from low-quality articles was due to “low quality content (likely produced to generate ad revenue) being translated via MT en masse into many lower resource languages (again likely for the purpose of generating ad revenue). It also suggests that such data originates in English and is translated into other languages.”

This means that a large portion of the internet in lower-resource languages is poorly machine-translated, which poses questions for the development of large language models in those languages, the researchers said.

“Modern AI is enabled by huge amounts of training data, typically several hundred billion tokens to a few trillion tokens,” the study states. “Training at this scale is only possible with web-scraped data. Our findings raise numerous concerns for multilingual model builders: Fluency (especially across sentences) and accuracy are lower for MT data, which could produce less fluent models with more hallucinations, and the selection bias indicates the data may be of lower quality, even before considering MT errors.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298

Here lies the internet, murdered by generative AI​


Corruption everywhere, even in YouTube's kids content​


ERIK HOEL

FEB 27, 2024



https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://substack-post-media.s3.amazonaws.com/public/images/bb062015-02ee-4351-8c26-edae4b11f369_1181x1299.jpeg


Art for The Intrinsic Perspective is by Alexander Naughton


The amount of AI-generated content is beginning to overwhelm the internet. Or maybe a better term is pollute. Pollute its searches, its pages, its feeds, everywhere you look. I’ve been predicting that generative AI would have pernicious effects on our culture since 2019, but now everyone can feel it. Back then I called it the coming “semantic apocalypse.” Well, the semantic apocalypse is here, and you’re being affected by it, even if you don’t know it. A minor personal example: last year I published a nonfiction book, The World Behind the World, and now on Amazon I find this.



https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://substack-post-media.s3.amazonaws.com/public/images/a2dddd14-7390-43f6-9131-f6b51cc1d869_1233x688.png

What, exactly, are these “workbooks” for my book? AI pollution. Synthetic trash heaps floating in the online ocean. The authors aren’t real people, some a$$hole just fed the manuscript into an AI and didn’t check when it spit out nonsensical summaries. But it doesn’t matter, does it? A poor sod will click on the $9.99 purchase one day, and that’s all that’s needed for this scam to be profitable since the process is now entirely automatable and costs only a few cents. Pretty much all published authors are affected by similar scams, or will be soon.

Now that generative AI has dropped the cost of producing bullshyt to near zero, we see clearly the future of the internet: a garbage dump. Google search? They often lead with fake AI-generated images amid the real things. Post on Twitter? Get replies from bots selling porn. But that’s just the obvious stuff. Look closely at the replies to any trending tweet and you’ll find dozens of AI-written summaries in response, cheery Wikipedia-style repeats of the original post, all just to farm engagement. AI models on Instagram accumulate hundreds of thousands of subscribers and people openly shill their services for creating them. AI musicians fill up YouTube and Spotify. Scientific papers are being AI-generated. AI images mix into historical research. This isn’t mentioning the personal impact too: from now on, every single woman who is a public figure will have to deal with the fact that deepfake porn of her is likely to be made. That’s insane.


And rather than this being pure skullduggery, people and institutions are willing to embrace low-quality AI-generated content, trying to shift the Overton window to make things like this acceptable:



https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://substack-post-media.s3.amazonaws.com/public/images/c7948aa9-8240-4003-b78a-54044965a40d_438x314.png



That’s not hardball capitalism. That’s polluting our culture for your own minor profit. It’s not morally legitimate for the exact same reasons that polluting a river for a competitive edge is not legitimate. Yet name-brand media outlets are embracing generative AI just like SEO-spammers are, for the same reasons.

E.g., investigative work at Futurism caught Sports Illustrated red-handed using AI-generated articles written by fake writers. Meet Drew Ortiz.



https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://substack-post-media.s3.amazonaws.com/public/images/89eb2a3d-fdec-4349-afc3-1227f15840a9_1200x421.jpeg

He doesn’t exist. That face is an AI-generated portrait, which was previously listed for sale on a website. As Futurism describes:


Ortiz isn't the only AI-generated author published by Sports Illustrated, according to a person involved with the creation of the content…

"At the bottom [of the page] there would be a photo of a person and some fake description of them like, 'oh, John lives in Houston, Texas. He loves yard games and hanging out with his dog, Sam.' Stuff like that," they continued. "It's just crazy."

This isn’t what everyone feared, which is AI replacing humans by being better—it’s replacing them because AI is so much cheaper. Sports Illustrated was not producing human-quality level content with these methods, but it was still profitable.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298
{continued}

The AI authors' writing often sounds like it was written by an alien; one Ortiz article, for instance, warns that volleyball "can be a little tricky to get into, especially without an actual ball to practice with."

Sports Illustrated, in a classy move, deleted all the evidence. Drew was replace by Sora Tanaka, bearing a face also listed for sale on the same website with the description of a “joyful asian young-adult female with long brown hair and brown eyes.”


https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://substack-post-media.s3.amazonaws.com/public/images/fd80967f-7175-47bc-b755-6463b640bdf4_1200x448.jpeg

Given that even prestigious outlets like The Guardian refuse to put any clear limits on their use of AI, if you notice odd turns of phrase or low-quality articles, the likelihood that they’re written by an AI, or with AI-assistance, is now high.

Sadly, the people affected the most by generative AI are the ones who can’t defend themselves. Because they don’t even know what AI is. Yet we’ve abandoned them to swim in polluted information currents. I’m talking, unfortunately, about toddlers. Because let me introduce you to…

the hell that is AI-generated children’s YouTube content.

YouTube for kids is quickly becoming a stream of synthetic content. Much of it now consists of wooden digital characters interacting in short nonsensical clips without continuity or purpose. Toddlers are forced to sit and watch this runoff because no one is paying attention. And the toddlers themselves can’t discern that characters come and go and that the plots don’t make sense and that it’s all just incoherent dream-slop. The titles don’t match the actual content, and titles that are all the parents likely check, because they grew up in a culture where if a YouTube video said BABY LEARNING VIDEOS and had a million views it was likely okay. Now, some of the nonsense AI-generated videos aimed at toddlers have tens of millions of views.

Here’s a behind-the-scenes video on a single channel that made 1.2 million dollars via AI-generated “educational content” aimed at toddlers.

As the video says:


These kids, when they watch these kind of videos, they watch them over and over and over again.

They aren’t confessing. They’re bragging. And the particular channel they focus on isn’t even the worst offender—at least that channel’s content mostly matches the subheadings and titles, even if the videos are jerky, strange, off-putting, repetitious, clearly inhuman. Other channels, which are also obviously AI-generated, get worse and worse. Here’s a “kid’s education” channel that is AI-generated (took about one minute to find) with 11.7 million subscribers.

They don’t use proper English, and after quickly going through some shapes like the initial video title promises (albeit doing it in a way that makes you feel like you’re going insane) the rest of the video devolves into randomly-generated rote tasks, eerie interactions, more incorrect grammar, and uncanny musical interludes of songs that serve no purpose but to pad the time. It is the creation of an alien mind.

Here’s an example of the next frontier: completely start-to-finish AI-generated music videos for toddlers. Below is a how-to video for these new techniques. The result? Nightmarish parrots with twisted double-beaks and four mutated eyes singing artificial howls from beyond. Click and behold (or don’t, if you want to sleep tonight).

All around the nation there are toddlers plunked down in front of iPads being subjected to synthetic runoff, deprived of human contact even in the media they consume. There’s no other word but dystopian. Might not actual human-generated cultural content normally contain cognitive micro-nutrients (like cohesive plots and sentences, detailed complexity, reasons for transitions, an overall gestalt, etc) that the human mind actually needs? We’re conducting this experiment live. For the first time in history developing brains are being fed choppy low-grade and cheaply-produced synthetic data created en masse by generative AI, instead of being fed with real human culture. No one knows the effects, and no one appears to care. Especially not the companies, because…

OpenAI has happily allowed pollution.

Why blame them, specifically? Well, first of all, their massive impact—e.g., most of the kids videos are built from scripts generated by ChatGPT. And more generally, what AI capabilities are considered okay to deploy has long been a standard set by OpenAI. Despite their supposed safety focus, OpenAI failed to foresee that its creations would thoroughly pollute the internet across all platforms and services. You can see this failure in how they assessed potential negative outcomes in the announcement of GPT-2 on their blog, back in 2019. While they did warn that these models could have serious longterm consequences for the information ecosystem, the specifics they were concerned with were things like:


Generate misleading news articles

Impersonate others online

Automate the production of abusive or faked content to post on social media

Automate the production of spam/phishing content

This may sound kind of in line with what’s happened, but if you read further, it becomes clear that what they meant by “faked content” was mainly malicious actors promoting misinformation, or the same shadowy malicious actors using AI to phish for passwords, etc.

These turned out to be only minor concerns compared to AI’s cultural pollution. OpenAI kept talking about “actors” when they should have been talking about “users.” Because it turns out, all AI-generated content is fake! Or it’s all kind of fake. AI-written websites, now sprouting up like an unstoppable invasive species, don’t necessarily have an intent to mislead; it’s just that AI content is low-effort banalities generated for pennies, so you can SEO spam and do all sorts of manipulative games around search to attract eyeballs and ad revenue.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,340
Reputation
7,364
Daps
134,298
{continued}

That is, the OpenAI team didn’t stop to think that regular users just generating mounds of AI-generated content on the internet would have very similar negative effects to as if there were a lot of malicious use by intentional bad actors. Because there’s no clear distinction! The fact that OpenAI was both honestly worried about negative effects, and at the same time didn’t predict the enshyttification of the internet they spearheaded, should make us extremely worried they will continue to miss the negative downstream effects of their increasingly intelligent models. They failed to foresee the floating mounds of clickbait garbage, the synthetic info-trash cities, all to collect clicks and eyeballs—even from innocent children who don’t know any better. And they won’t do anything to stop it, because…

AI pollution is a tragedy of the commons.

This term, "tragedy of the commons,” originated in the rising environmentalism of the 20th century, and would lead to many of the regulations that keep our cities free of smog and our rivers clean. Garrett Hardin, an ecologist and biologist, coined it in an article in Science in 1968. The article is still instructively relevant. Hardin wrote:[/SIZE]

An implicit and almost universal assumption of discussions published in professional and semipopular scientific journals is that the problem under discussion has a technical solution…

He goes on to discuss several problems for which there are no technical solutions, since rational actors will drive the system toward destruction via competition:

The tragedy of the commons develops in this way. Picture a pasture open to all. It is to be expected that each herdsman will try to keep as many cattle as possible on the commons. Such an arrangement may work reasonably satisfactorily for centuries because tribal wars, poaching, and disease keep the numbers of both man and beast well below the carrying capacity of the land. Finally, however, comes the day of reckoning, that is, the day when the long-desired goal of social stability becomes a reality. At this point, the inherent logic of the commons remorselessly generates tragedy.

One central example of Hardin’s became instrumental to the environmental movement.

… the tragedy of the commons reappears in problems of pollution. Here it is not a question of taking something out of the commons, but of putting something in—sewage, or chemical, radioactive, and heat wastes into water; noxious and dangerous fumes into the air; and distracting and unpleasant advertising signs into the line of sight. The calculations of utility are much the same as before. The rational man finds that his share of the cost of the wastes he discharges into the commons is less than the cost of purifying his wastes before releasing them. Since this is true for everyone, we are locked into a system of "fouling our own nest," so long as we behave only as independent, rational, free-enterprisers.

We are currently fouling our own nests. Since the internet economy runs on eyeballs and clicks the new ability of anyone, anywhere, to easily generate infinite low-quality content via AI is now remorselessly generating tragedy.

The solution, as Hardin noted, isn’t technical. You can’t detect AI outputs reliably anyway (another initial promise that OpenAI abandoned). The companies won’t self regulate, given their massive financial incentives. We need the equivalent of a Clean Air Act: a Clean Internet Act. We can’t just sit by and let human culture end up buried.

Luckily we’re on the cusp of all that incredibly futuristic technology promised by AI. Any day now, our GDP will start to rocket forward. In fact, soon we’ll cure all disease, even aging itself, and have robot butlers and Universal Basic Income and high-definition personalized entertainment. Who cares if toddlers had to watch inhuman runoff for a few billion years of viewing-time to make the future happen? It was all worth it. Right? Let’s wait a little bit longer. If we wait just a little longer utopia will surely come.
 
Top