Why it’s impossible to build an unbiased AI language model

bnew · Aug 8, 2023

Why it’s impossible to build an unbiased AI language model

Plus: Worldcoin just officially launched. Why is it already being investigated?

www.technologyreview.com

ARTIFICIAL INTELLIGENCE

Why it’s impossible to build an unbiased AI language model

Plus: Worldcoin just officially launched. Why is it already being investigated?
By

Melissa Heikkiläarchive page

August 8, 2023

STEPHANIE ARNETT/MITTR | MIDJOURNEY (SUITS)

AI language models have recently become the latest frontier in the US culture wars. Right-wing commentators have accused ChatGPT of having a “woke bias,” and conservative groups have started developing their own versions of AI chatbots. Meanwhile, Elon Musk has said he is working on “TruthGPT,” a “maximum truth-seeking” language model that would stand in contrast to the “politically correct” chatbots created by OpenAI and Google.

An unbiased, purely fact-based AI chatbot is a cute idea, but it’s technically impossible. (Musk has yet to share any details of what his TruthGPT would entail, probably because he is too busy thinking about X and cage fights with Mark Zuckerberg.) To understand why, it’s worth reading a story I just published on new research that sheds light on how political bias creeps into AI language systems. Researchers conducted tests on 14 large language models and found that OpenAI’s ChatGPT and GPT-4 were the most left-wing libertarian, while Meta’s LLaMA was the most right-wing authoritarian.

“We believe no language model can be entirely free from political biases,” Chan Park, a PhD researcher at Carnegie Mellon University, who was part of the study, told me. Read more here.

One of the most pervasive myths around AI is that the technology is neutral and unbiased. This is a dangerous narrative to push, and it will only exacerbate the problem of humans’ tendency to trust computers, even when the computers are wrong. In fact, AI language models reflect not only the biases in their training data, but also the biases of people who created them and trained them.

And while it is well known that the data that goes into training AI models is a huge source of these biases, the research I wrote about shows how bias creeps in at virtually every stage of model development, says Soroush Vosoughi, an assistant professor of computer science at Dartmouth College, who was not part of the study.

Bias in AI language models is a particularly hard problem to fix, because we don’t really understand how they generate the things they do, and our processes for mitigating bias are not perfect. That in turn is partly because biases are complicated social problems with no easy technical fix.

That’s why I’m a firm believer in honesty as the best policy. Research like this could encourage companies to track and chart the political biases in their models and be more forthright with their customers. They could, for example, explicitly state the known biases so users can take the models’ outputs with a grain of salt.

In that vein, earlier this year OpenAI told me it is developing customized chatbots that are able to represent different politics and worldviews. One approach would be allowing people to personalize their AI chatbots. This is something Vosoughi’s research has focused on.

As described in a peer-reviewed paper, Vosoughi and his colleagues created a method similar to a YouTube recommendation algorithm, but for generative models. They use reinforcement learning to guide an AI language model’s outputs so as to generate certain political ideologies or remove hate speech.

OpenAI uses a technique called reinforcement learning through human feedback to fine-tune its AI models before they are launched. Vosoughi’s method uses reinforcement learning to improve the model’s generated content after it has been released, too.

But in an increasingly polarized world, this level of customization can lead to both good and bad outcomes. While it could be used to weed out unpleasantness or misinformation from an AI model, it could also be used to generate more misinformation.

“It’s a double-edged sword,” Vosoughi admits.

Deeper Learning

Worldcoin just officially launched. Why is it already being investigated?

OpenAI CEO Sam Altman’s new venture, Worldcoin, aims to create a global identity system called “World ID” that relies on individuals’ unique biometric data to prove that they are humans. It officially launched last week in more than 20 countries. It’s already being investigated in several of them.

Privacy nightmare: To understand why, it’s worth reading an MIT Technology Review investigation from last year, which found that Worldcoin was collecting sensitive biometric data from vulnerable people in exchange for cash. What’s more, the company was using test users’ sensitive, though anonymized, data to train artificial intelligence models, without their knowledge.

In this week’s issue of The Technocrat, our weekly newsletter on tech policy, Tate Ryan-Mosley and our investigative reporter Eileen Guo look at what has changed since last year’s investigation, and how we make sense of the latest news. Read more here.

Bits and Bytes

This is the first known case of a woman being wrongfully arrested after a facial recognition match

Last February, Porcha Woodruff, who was eight months pregnant, was arrested over alleged robbery and carjacking and held in custody for 11 hours, only for her case to be dismissed a month later. She is the sixth person to report that she has been falsely accused of a crime because of a facial recognition match. All of the six people have been Black, and Woodruff is the first woman to report this happening to her. (The New York Times)

What can you do when an AI system lies about you?

Last summer, I wrote a story about how our personal data is being scraped into vast data sets to train AI language models. This is not only a privacy nightmare; it could lead to reputational harm. When reporting the story, a researcher and I discovered that Meta’s experimental BlenderBot chatbot had called a prominent Dutch politician, Marietje Schaake, a terrorist. And, as this piece explains, at the moment there is little protection or recourse when AI chatbots spew and spread lies about you. (The New York Times)

Every startup is an AI company now. Are we in a bubble?

Following the release of ChatGPT, AI hype this year has been INTENSE. Every tech bro and his uncle seems to have founded an AI startup, it seems. But nine months after the chatbot launched, it’s still unclear how these startups and AI technology will make money, and there are reports that consumers are starting to lose interest. (The Washington Post)

Meta is creating chatbots with personas to try to retain users

Honestly, this sounds more annoying than anything else. Meta is reportedly getting ready to launch AI-powered chatbots with different personalities as soon as next month in an attempt to boost engagement and collect more data on people using its platforms. Users will be able to chat with Abraham Lincoln, or ask for travel advice from AI chatbots that write like a surfer. But it raises tricky ethical questions—how will Meta prevent its chatbots from manipulating people’s behavior and potentially making up something harmful, and how will it treat the user data it collects? (The Financial Times)

by Melissa Heikkilä

bnew · Sep 1, 2023

Meta releases a dataset to probe computer vision models for biases | TechCrunch

Meta has released a new dataset, FACET, to probe computer vision models for biases against certain 'classes' of people.

techcrunch.com

Meta releases a dataset to probe computer vision models for biases

Kyle Wiggers@kyle_l_wiggers / 9:00 AM EDT•August 31, 2023
Comment

Image Credits: TechCrunch

Continuing on its open source tear, Meta today released a new AI benchmark, FACET, designed to evaluate the “fairness” of AI models that classify and detect things in photos and videos, including people.

Made up of 32,000 images containing 50,000 people labeled by human annotators, FACET — a tortured acronym for “FAirness in Computer Vision EvaluaTion” — accounts for classes related to occupations and activities like “basketball player,” “disc jockey” and “doctor” in addition to demographic and physical attributes, allowing for what Meta describes as “deep” evaluations of biases against those classes.

“By releasing FACET, our goal is to enable researchers and practitioners to perform similar benchmarking to better understand the disparities present in their own models and monitor the impact of mitigations put in place to address fairness concerns,” Meta wrote in a blog post shared with TechCrunch. “We encourage researchers to use FACET to benchmark fairness across other vision and multimodal tasks.”

Certainly, benchmarks to probe for biases in computer vision algorithms aren’t new. Meta itself released one several years ago to surface age, gender and skin tone discrimination in both computer vision and audio machine learning models. And a number of studies have been conducted on computer vision models to determine whether they’re biased against certain demographic groups. (Spoiler alert: they usually are.)

Then, there’s the fact that Meta doesn’t have the best track record when it comes to responsible AI.

Late last year, Meta was forced to pull an AI demo after it wrote racist and inaccurate scientific literature. Reports have characterized the company’s AI ethics team as largely toothless and the anti-AI-bias tools it’s released as “completely insufficient.” Meanwhile, academics have accused Meta of exacerbating socioeconomic inequalities in its ad-serving algorithms and of showing a bias against Black users in its automated moderation systems.

But Meta claims FACET is more thorough than any of the computer vision bias benchmarks that came before it — able to answer questions like “Are models better at classifying people as skateboarders when their perceived gender presentation has more stereotypically male attributes?” and “Are any biases magnified when the person has coily hair compared to straight hair?”

To create FACET, Meta had the aforementioned annotators label each of the 32,000 images for demographic attributes (e.g. the pictured person’s perceived gender presentation and age group), additional physical attributes (e.g. skin tone, lighting, tattoos, headwear and eyewear, hairstyle and facial hair, etc.) and classes. They combined these labels with other labels for people, hair and clothing taken from Segment Anything 1 Billion, a Meta-designed dataset for training computer vision models to “segment,” or isolate, objects and animals from images.

The images from FACET were sourced from Segment Anything 1 Billion, Meta tells me, which in turn were purchased from a “photo provider.” But it’s unclear whether the people pictured in them were made aware that the pictures would be used for this purpose. And — at least in the blog post — it’s not clear how Meta recruited the annotator teams, and what wages they were paid.

Historically and even today, many of the annotators employed to label datasets for AI training and benchmarking come from developing countries and have incomes far below the U.S.’ minimum wage. Just this week, The Washington Post reported that Scale AI, one of the largest and best-funded annotation firms, has paid workers at extremely low rates, routinely delayed or withheld payments and provided few channels for workers to seek recourse.

In a white paper describing how FACET came together, Meta says that the annotators were “trained experts” sourced from “several geographic regions” including North America (United States), Latin American (Colombia), Middle East (Egypt), Africa (Kenya), Southeast Asia (Philippines) and East Asia (Taiwan). Meta used a “proprietary annotation platform” from a third-party vendor, it says, and annotators were compensated “with an hour wage set per country.”

Setting aside FACET’s potentially problematic origins, Meta says that the benchmark can be used to probe classification, detection, “instance segmentation” and “visual grounding” models across different demographic attributes.

As a test case, Meta applied FACET to its own DINOv2 computer vision algorithm, which as of this week is available for commercial use. FACET uncovered several biases in DINOv2, Meta says, including a bias against people with certain gender presentations and a likelihood to stereotypically identify pictures of women as “nurses.”

“The preparation of DINOv2’s pre-training dataset may have inadvertently replicated the biases of the reference datasets selected for curation,” Meta wrote in the blog post. “We plan to address these potential shortcomings in future work and believe that image-based curation could also help avoid the perpetuation of potential biases arising from the use of search engines or text supervision.”

No benchmark is perfect. And Meta, to its credit, acknowledges that FACET might not sufficiently capture real-world concepts and demographic groups. It also notes that many depictions of professions in the dataset might’ve changed since FACET was created. For example, most doctors and nurses in FACET, photographed during the COVID-19 pandemic, are wearing more personal protective equipment than they would’ve before the health crises.

“At this time we do not plan to have updates for this dataset,” Meta writes in the whitepaper. “We will allow users to flag any images that may be objectionable content, and remove objectionable content if found.”

In addition to the dataset itself, Meta has made available a web-based dataset explorer tool. To use it and the dataset, developers must agree not to train computer vision models on FACET — only evaluate, test and benchmark them.

Hersh · Sep 19, 2023

i think u can. obviously wherever ai is pulling from is always gon be biased... but u have to program it to provide counterpoints of everything requested.

Professor Emeritus · Sep 19, 2023

Hersh said:
i think u can. obviously wherever ai is pulling from is always gon be biased... but u have to program it to provide counterpoints of everything requested.

It's not that simple. Take the identification bias issue. Some AI programs tend to identify men in scrubs as doctors and women in scrubs as nurses. Let's say you intervened and provided a database with a ton of female doctor and male nurse images, and eventually managed to solve that bias. Well, what about Latina doctors, are those being identified accurately? What about Black female doctors? It would take a ton of work to ensure that you'd actually eliminated all forms of bias just for that one identification task, and then you have to multiply that by every situation imaginable.

Or imagine a story writing program that's been trained on everything ever written. And the vast majority of classic heroes in those former stories are white males. So will your program naturally make a black male hero as often as it does a white male? Even if you "fix" the system to ensure black male heroes, will they actually represent the diversity of black males, or will they just repeat white male stories or a narrow selection of black male stereotypes? How do you sufficiently ensure that your story writing program fairly represents every experience if it has never been trained on enough of those other experiences because they haven't been published enough in real life?

Because the programs are fine-tuned through so much data, it's just not going to be possible to add enough contraindicating data to balance the system for every possible scenario.

That's before you even get to the fact that the programmers themselves are biased, so they're going to think of some of these things to fix but not others.

Professor Emeritus · Sep 19, 2023

bnew said:
Meanwhile, Elon Musk has said he is working on “TruthGPT,” a “maximum truth-seeking” language model that would stand in contrast to the “politically correct” chatbots created by OpenAI and Google.

Deranged right-wingers have killed the word "truth" to complete meaninglessness.

bnew said:
One of the most pervasive myths around AI is that the technology is neutral and unbiased. This is a dangerous narrative to push, and it will only exacerbate the problem of humans’ tendency to trust computers, even when the computers are wrong. In fact, AI language models reflect not only the biases in their training data, but also the biases of people who created them and trained them.

I always link the book "Weapons of Math Destruction" when I see shyt like this.

Weapons of Math Destruction by Cathy O'Neil: 9780553418835 | PenguinRandomHouse.com: Books

NEW YORK TIMES BESTSELLER • A former Wall Street quant sounds the alarm on Big Data and the mathematical models that threaten to rip apart our social fabric—with a new afterword...

www.penguinrandomhouse.com

bnew said:
In that vein, earlier this year OpenAI told me it is developing customized chatbots that are able to represent different politics and worldviews. One approach would be allowing people to personalize their AI chatbots. This is something Vosoughi’s research has focused on.

I'm sure that won't put a battery in the back of troll farms and echo chambers at all. :mjlol:

bnew · Sep 19, 2023

Rhakim said:
Deranged right-wingers have killed the word "truth" to complete meaninglessness.

I always link the book "Weapons of Math Destruction" when I see shyt like this.

Weapons of Math Destruction by Cathy O'Neil: 9780553418835 | PenguinRandomHouse.com: Books

NEW YORK TIMES BESTSELLER • A former Wall Street quant sounds the alarm on Big Data and the mathematical models that threaten to rip apart our social fabric—with a new afterword...

www.penguinrandomhouse.com

I'm sure that won't put a battery in the back of troll farms and echo chambers at all.

besides the astroturfing and propaganda bots these AI model conservatives and the far-right plan to create, it's gonna be jokes cause they'll undoubtedly try to lie about historical facts and it's reasoning will be an even major issue. I really think it can help speed run the end of the GOP and diminish conservatism. there will websites that compare outputs of AI model that'll include the ones that claim that they are not "woke" and it'll produce chat screenshots that'll go viral.

Hersh · Sep 19, 2023

Rhakim said:
It's not that simple. Take the identification bias issue. Some AI programs tend to identify men in scrubs as doctors and women in scrubs as nurses. Let's say you intervened and provided a database with a ton of female doctor and male nurse images, and eventually managed to solve that bias. Well, what about Latina doctors, are those being identified accurately? What about Black female doctors? It would take a ton of work to ensure that you'd actually eliminated all forms of bias just for that one identification task, and then you have to multiply that by every situation imaginable.

Or imagine a story writing program that's been trained on everything ever written. And the vast majority of classic heroes in those former stories are white males. So will your program naturally make a black male hero as often as it does a white male? Even if you "fix" the system to ensure black male heroes, will they actually represent the diversity of black males, or will they just repeat white male stories or a narrow selection of black male stereotypes? How do you sufficiently ensure that your story writing program fairly represents every experience if it has never been trained on enough of those other experiences because they haven't been published enough in real life?

Because the programs are fine-tuned through so much data, it's just not going to be possible to add enough contraindicating data to balance the system for every possible scenario.

That's before you even get to the fact that the programmers themselves are biased, so they're going to think of some of these things to fix but not others.

ur absolutely correct.. i was arguing "for and against"... not "1 vs infinite variables"

bnew · Dec 10, 2023

Biases in large image-text AI model favor wealthier, Western perspectives

AI model that pairs text, images performs poorly on lower-income or non-Western images, potentially increasing inequality in digital technology representation Of these two images labeled “get water,” the image from the poorer household on the left received a lower CLIP score (0.21) compared to th

news.umich.edu

Biases in large image-text AI model favor wealthier, Western perspectives

Published On:

December 8, 2023
Written By:
Patricia DeLacey, College of Engineering
Contact:

Kate McAlpine

AI model that pairs text, images performs poorly on lower-income or non-Western images, potentially increasing inequality in digital technology representation

Two side by side images. On the left, a man squats by the river bank with his sleeves rolled up, scooping water into a plastic bucket. The right side image is a close up of a steel sink with a pair of hands turning on the faucet to fill a cup with water.

Of these two images labeled “get water”, the image on from the poorer household on the left (monthly income $39) received a lower CLIP score (0.21) compared to the image form the wealthier household on the right (monthly income $751; CLIP score 0.25). Image credit: Dollar Street, The Gapminder Foundation

Study: Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models (DOI: 10.48550/arXiv.2311.05746)

In a study evaluating the bias in OpenAI’s CLIP, a model that pairs text and images and operates behind the scenes in the popular DALL-E image generator, University of Michigan researchers found that CLIP performs poorly on images that portray low-income and non-Western lifestyles.

“During a time when AI tools are being deployed across the world, having everyone represented in these tools is critical. Yet, we see that a large fraction of the population is not reflected by these applications—not surprisingly, those from the lowest social incomes. This can quickly lead to even larger inequality gaps,” said Rada Mihalcea, the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering, who initiated and advised the project.

AI models like CLIP act as foundation models, or models trained on a large amount of unlabeled data that can be adapted to many applications. When AI models are trained with data reflecting a one-sided view of the world, that bias can propagate into downstream applications and tools that rely on the AI.

A line graph with CLIP score on the y-axis and five income categories ranging from poor to rich on the x-axis. Below the line graph, each category on the x-axis has an image labeled refrigerator. Refrigerator images from left to right: A cylindrical wooden container on a dirt floor (Income range: poor. Score 0.20). Four plastic bags filled with fish, hanging from an indoor clothes line (Income range: poor. Score 0.21). Four stacks of stoppered, round clay jugs, hanging in a cellar (Income range: low-mid. Score 0.19). A white, electric appliance with top freezer (Income range: up-mid. Score 0.26). A built-in electric appliance with the door open and light on, filled with food and drinks (Income range: rich. Score 0.29).

Each of these five images depict a refrigerator, but CLIP scores refrigerators from wealthier households higher as a match for “refrigerator.” Image credit: Oana Ignat, University of Michigan

“If a software was using CLIP to screen images, it could exclude images from a lower-income or minority group instead of truly mislabeled images. It could sweep away all the diversity that a database curator worked hard to include,” said Joan Nwatu, a doctoral student in computer science and engineering.

Nwatu led the research team together with Oana Ignat, a postdoctoral researcher in the same department. They co-authored a paper presented at the Empirical Methods in Natural Language Processing conference Dec. 8 in Singapore.

The researchers evaluated the performance of CLIP using Dollar Street, a globally diverse image dataset created by the Gapminder Foundation. Dollar Street contains more than 38,000 images collected from households of various incomes across Africa, the Americas, Asia and Europe. Monthly incomes represented in the dataset range from $26 to nearly $20,000. The images capture everyday items, and are manually annotated with one or more contextual topics, such as “kitchen” or “bed.”

CLIP pairs text and images by creating a score that is meant to represent how well the image and text match. That score can then be fed into downstream applications for further processing such as image flagging and labeling. The performance of OpenAI’s DALL-E relies heavily on CLIP, which was used to evaluate the model’s performance and create a database of image captions that trained DALL-E.

The researchers assessed CLIP’s bias by first scoring the match between the Dollar Street dataset’s images and manually annotated text in CLIP, then measuring the correlation between the CLIP score and household income.

“We found that most of the images from higher income households always had higher CLIP scores compared to images from lower income households,” Nwatu said.

The topic “light source,” for example, typically has higher CLIP scores for electric lamps from wealthier households compared to kerosene lamps from poorer households.

CLIP also demonstrated geographic bias as the majority of the countries with the lowest scores were from low-income African countries. That bias could potentially eliminate diversity in large image datasets and cause low-income, non-Western households to be underrepresented in applications that rely on CLIP.

Two side by side images. The left side image shows a cylindrical wooden container on a dirt floor. The right side image is a built-in electric appliance with the door open and light on, filled with food and drinks.

Of these two images with the label refrigerator, CLIP scored the image on the right, from the wealthier household, higher than the one on the left. Image credit: Dollar Street, The Gapminder Foundation

“Many AI models aim to achieve a ‘general understanding’ by utilizing English data from Western countries. However, our research shows this approach results in a considerable performance gap across demographics,” Ignat said.

“This gap is important in that demographic factors shape our identities and directly impact the model’s effectiveness in the real world. Neglecting these factors could exacerbate discrimination and poverty. Our research aims to bridge this gap and pave the way for more inclusive and reliable models.”

The researchers offer several actionable steps for AI developers to build more equitable AI models:

Invest in geographically diverse datasets to help AI tools learn more diverse backgrounds and perspectives.
Define evaluation metrics that represent everyone by taking into account location and income.
Document the demographics of the data AI models are trained on.

“The public should know what the AI was trained on so that they can make informed decisions when using a tool,” Nwatu said.

The research was funded by the John Templeton Foundation (#62256) and the U.S. Department of State (#STC10023GR0014).

bnew · Dec 11, 2023

Anthropic leads charge against AI bias and discrimination with new research

Anthropic researchers unveil new techniques to proactively detect AI bias, racism and discrimination by evaluating language models across hypothetical real-world scenarios, promoting AI ethics before deployment.

venturebeat.com

Anthropic leads charge against AI bias and discrimination with new research

Michael Nuñez @MichaelFNunez

December 11, 2023 3:15 PM

Anthropic researchers unveil new techniques to proactively detect AI bias, racism and discrimination by evaluating language models across hypothetical real-world scenarios before deployment.

Credit: VentureBeat made with Midjourney

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

As artificial intelligence infiltrates nearly every aspect of modern life, researchers at startups like Anthropic are working to prevent harms like bias and discrimination before new AI systems are deployed.

Now, in yet another seminal study published by Anthropic, researchers from the company have unveiled their latest findings on AI bias in a paper titled, “Evaluating and Mitigating Discrimination in Language Model Decisions.” The newly published paper brings to light the subtle prejudices ingrained in decisions made by artificial intelligence systems.

But the study goes one step further: The paper not only exposes biases, but also proposes a comprehensive strategy for creating AI applications that are more fair and just with the use of a new discrimination evaluation method.

The company’s new research comes at just the right time, as the AI industry continues to scrutinize the ethical implications of rapid technological growth, particularly in the wake of OpenAI’s internal upheaval following the dismissal and reappointment of CEO Sam Altman.

Research method aims to proactively evaluate discrimination in AI

The new research paper, published on arXiv, presents a proactive approach in assessing the discriminatory impact of large language models (LLMs) in high-stakes scenarios such as finance and housing — an increasing concern as artificial intelligence continues to penetrate sensitive societal areas.

“While we do not endorse or permit the use of language models for high-stakes automated decision-making, we believe it is crucial to anticipate risks as early as possible,” said lead author and research scientist Alex Tamkin in the paper. “Our work enables developers and policymakers to get ahead of these issues.”

Tamkin further elaborated on limitations of existing techniques and what inspired the creation of a completely new discrimination evaluation method. “Prior studies of discrimination in language models go deep in one or a few applications,” he said. “But language models are also general-purpose technologies that have the potential to be used in a vast number of different use cases across the economy. We tried to develop a more scalable method that could cover a larger fraction of these potential use cases.”

Study finds patterns of discrimination in language model

To conduct the study, Anthropic used its own Claude 2.0 language model and generated a diverse set of 70 hypothetical decision scenarios that could be input into a language model.

Examples included high-stakes societal decisions like granting loans, approving medical treatment, and granting access to housing. These prompts systematically varied demographic factors like age, gender, and race to enable detecting discrimination.

“Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied,” the paper states. Specifically, the authors found their model exhibited positive discrimination favoring women and non-white individuals, while discriminating against those over age 60.

Interventions reduce measured discrimination

The researchers explain in the paper that the goal of the research is to enable developers and policymakers to proactively address risks. The study’s authors explain, “As language model capabilities and applications continue to expand, our work enables developers and policymakers to anticipate, measure, and address discrimination.”

The researchers propose mitigation strategies like adding statements that discrimination is illegal and asking models to verbalize their reasoning while avoiding biases. These interventions significantly reduced measured discrimination.

Steering the course of AI ethics

The paper aligns closely with Anthropic’s much-discussed Constitutional AI paper from earlier this year. The paper outlined a set of values and principles that Claude must follow when interacting with users, such as being helpful, harmless and honest. It also specified how Claude should handle sensitive topics, respect user privacy and avoid illegal behavior.

“We are sharing Claude’s current constitution in the spirit of transparency,” Anthropic co-founder Jared Kaplan told VentureBeat back in May, when the AI constitution was published. “We hope this research helps the AI community build more beneficial models and make their values more clear. We are also sharing this as a starting point — we expect to continuously revise Claude’s constitution, and part of our hope in sharing this post is that it will spark more research and discussion around constitution design.”

The new discrimination study also closely aligns with Anthropic’s work at the vanguard of reducing catastrophic risk in AI systems. Anthropic co-founder Sam McCandlish shared insights into the development of the company’s policy and its potential challenges in September — which could shed some light into the thought process behind publishing AI bias research as well.

“As you mentioned [in your question], some of these tests and procedures require judgment calls,” McClandlish told VentureBeat about Anthropic’s use of board approval around catastrophic AI events. “We have real concern that with us both releasing models and testing them for safety, there is a temptation to make the tests too easy, which is not the outcome we want. The board (and LTBT) provide some measure of independent oversight. Ultimately, for true independent oversight it’s best if these types of rules are enforced by governments and regulatory bodies, but until that happens, this is the first step.”

Transparency and Community Engagement

By releasing the paper, in addition to the data set, and prompts, Anthropic is championing transparency and open discourse — at least in this very specific instance — and inviting the broader AI community to partake in refining new ethics systems. This openness fosters collective efforts in creating unbiased AI systems.

“The method we describe in our paper could help people anticipate and brainstorm a much wider range of use cases for language models in different areas of society,” Tamkin told VentureBeat. “This could be useful for getting a better sense of the possible applications of the technology in different sectors. It could also be helpful for assessing sensitivity to a wider range of real-world factors than we study, including differences in the languages people speak, the media by which they communicate, or the topics they discuss.”

For those in charge of technical decision-making at enterprises, Anthropic’s research presents an essential framework for scrutinizing AI deployments, ensuring they conform to ethical standards. As the race to harness enterprise AI intensifies, the industry is challenged to build technologies that marry efficiency with equity.

Update (4:46 p.m. PT): This article has been updated to include exclusive quotes and commentary from research scientist at Anthropic, Alex Tamkin.

bnew · Dec 18, 2023

Nvidia Staffers Warned CEO of Threat AI Would Pose to Minorities

As the chipmaker’s AI technology has become ubiquitous, it’s working to make it more inclusive

Nvidia Chief Executive Officer Jensen Huang met with employees in 2020 over risks posed by artificial intelligence.

Photographer: I-Hwa Cheng/Bloomberg

By Sinduja Rangarajan and Ian King

December 18, 2023 at 6:00 AM EST

Masheika Allgood and Alexander Tsado left their 2020 meeting with Nvidia Corp. Chief Executive Officer Jensen Huang feeling frustrated.

The pair, both former presidents of the company’s Black employees group, had spent a year working with colleagues from across the company on a presentation meant to warn Huang of the potential dangers that artificial intelligence technology posed, especially to minorities.

The 22-slide deck and other documents, reviewed by Bloomberg News, pointed to Nvidia’s growing role in shaping the future of AI — saying its chips were making AI ubiquitous — and warned that increased regulatory scrutiny was inevitable. The discussion included instances of bias in facial-recognition technologies used by the industry to power self-driving cars. Their aim, the pair told Bloomberg, was to find a way to confront the potentially perilous unintended consequences of AI head-on — ramifications that would likely be first felt by marginalized communities.

According to Allgood and Tsado, Huang did most of the talking during the meeting. They didn’t feel he really listened to them and, more importantly, didn’t get a sense that Nvidia would prioritize work on addressing potential bias in AI technology that could put underrepresented groups at risk.

Tsado, who was working as a product marketing manager, told Bloomberg News that he wanted Huang to understand that the issue needed to be tackled immediately — that the CEO might have the luxury of waiting, but “I am a member of the underserved communities, and so there’s nothing more important to me than this. We’re building these tools and I’m looking at them and I’m thinking, this is not going to work for me because I’m Black.’’

Masheika Allgood and Alexander Tsado.Photographer: David Odisho/Bloomberg

Both Allgood and Tsado quit the company shortly afterwards. Allgood’s decision to leave her role as a software product manager, she said, was because Nvidia “wasn’t willing to lead in an area that was very important to me.” In a LinkedIn post, she called the meeting “the single most devastating 45 minutes of my professional life.”

While Allgood and Tsado have departed, the concerns they raised about making AI safe and inclusive still hang over the company, and the AI industry at large. The chipmaker has one of the poorest records among big tech companies when it comes to Black and Hispanic representation in its workforce, and one of its generative AI products came under criticism for its failure to account for people of color.

The matters raised by Allgood and Tsado, meantime, also have resonated. Though Nvidia declined to comment on the specifics of the meeting, the company said it “continues to devote tremendous resources to ensuring that AI benefits everyone.”

“Achieving safe and trustworthy AI is a goal we’re working towards with the community,” Nvidia said in a statement. “That will be a long journey involving many discussions.”

One topic of the meeting isn’t in dispute. Nvidia has become absolutely central to the explosion in deployment of artificial intelligence systems. Sales of its chips, computers and associated software have taken off, sending its shares on an unprecedented rally. It’s now the world’s only chipmaker with a trillion-dollar market value.

What was once a niche form of computing is making its way into everyday life in the form of advanced chatbots, self-driving cars and image recognition. And AI models — which analyze existing troves of data to make predictions aimed at replicating human intelligence — are under development to be used in everything from drug discovery and industrial design to the advertising, military and security industries. With that proliferation, the concern about the risks it poses has only grown. Models are usually trained on massive datasets created by gathering information and visuals from across the internet.

As AI evolves into a technology that encroaches deeper into daily life, some Silicon Valley workers aren’t embracing it with the same level of trust that they’ve shown with other advances. Huang and his peers are likely to keep facing calls from workers who feel they need to be heard.

And while Silicon Valley figures such as Elon Musk have expressed fears about AI’s potential threat to human existence, some underrepresented minorities say they have a far more immediate set of problems. Without being involved in the creation of the software and services, they worry that self-driving cars might not stop for them, or that security cameras will misidentify them.

“The whole point of bringing diversity into the workplace is that we are supposed to bring our voices and help companies build tools that are better suited for all communities,’’ said Allgood. During the meeting, Allgood said she raised concerns that biased facial-recognition technologies used to power self-driving cars could pose greater threats to minorities. Huang replied that the company would limit risk by testing vehicles on the highway, rather than city streets, she said.

Alexander Tsado.Photographer: David Odisho/Bloomberg

The lack of diversity and its potential impact is particularly relevant at Nvidia. Only one out of a sample of 88 S&P 100 companies ranked lower than Nvidia based on their percentages of Black and Hispanic employees in 2021, according to data compiled by Bloomberg from the US Equal Employment Opportunity Commission. Of the five lowest-ranked companies for Black employees, four are chipmakers: Advanced Micro Devices Inc., Broadcom Inc., Qualcomm Inc. and Nvidia. Even by tech standards — the industry has long been criticized for its lack of diversity — the numbers are low.

Read More: Corporate America Promised to Hire a Lot More People of Color. It Actually Did.

During the meeting, Allgood recalled Huang saying that the diversity of the company would ensure that its AI products were ethical. At that time, only 1% of Nvidia employees were Black — a number that hadn’t changed from 2016 until then, according to data compiled by Bloomberg. That compared with 5% at both Intel Corp. and Microsoft Corp., 4% at Meta Platforms Inc. and 14% for the Black share of the US population overall in 2020, the data showed. People with knowledge of the meeting who asked not to be identified discussing its contents said Huang meant diversity of thought, rather than specifically race.

According to Nvidia, a lot has happened since Allgood and Tsado met with the CEO. The company says it has done substantial work to make its AI-related products fair and safe for everyone. AI models that it supplies to customers come with warning labels, and it vets the underlying datasets to remove bias. It also seeks to ensure that AI, once deployed, remains focused on its intended purpose.

In emails dated March 2020 reviewed by Bloomberg, Huang did give the go-ahead for work to start on some of Allgood’s proposals, but by that time she’d already handed in her notice.

Not long after Allgood and Tsado left Nvidia, the chipmaker hired Nikki Pope to lead its in-house Trustworthy AI project. Co-author of a book on wrongful convictions and incarcerations, Pope is head of what’s now called Nvidia’s AI & Legal Ethics program.

Rivals Alphabet Inc.’s Google and Microsoft had already set up similar AI ethics teams a few years earlier. Google publicly announced its “AI principles” in 2018 and has given updates on its progress. Microsoft had a team of 30 engineers, researchers and philosophers on its AI ethics team in 2020, some of whom it laid off this year.

Pope, who’s Black, said she doesn’t accept the assertion that minorities have to be involved directly to be able to produce unbiased models. Nvidia examines datasets that software is trained on, she said, and makes sure that they’re inclusive enough.

“I’m comfortable that the models that we provide for our customers to use and modify have been tested, that the groups who are going to be interacting with those models have been represented,” Pope said in an interview.

The company has created an open-source platform, called NeMo Guardrails, to help chatbots filter out unwanted content and stay on topic. Nvidia now releases “model cards” with its AI models, which provide more details on what a model does and how it’s made, as well as its intended use and limitations.

Nvidia also collaborates with internal affinity groups to diversify its datasets and test the models for biases before release. Pope said datasets for self-driving cars are now trained on images that include parents with strollers, people in wheelchairs and darker-skinned people.

bnew · Dec 18, 2023

Nvidia Ranks Close to the Bottom in Diverse Hiring

Company and most of its chipmaker peers lag rest of technology industry

Black, Hispanic and other races as a percentage of US workforce

Source: 2021 EEO-1 Filings compiled by Bloomberg

Note: Bloomberg is using “other races” to refer to employees who self-report as “Native Hawaiian or Other Pacific Islander,” “American Indian or Alaska Native,” or “two or more races.”

Pope and colleague Liz Archibald, who is director of corporate communications at Nvidia and also Black, said that they once had a “tough meeting” with Huang over AI transparency and safety. But they felt like his questions brought more rigor to their work.

“I think his end goal was to pressure-test our arguments and probe the logic to help figure out how he could make it even better for the company as a whole,” Archibald said in an email.

Some researchers say that minorities are so underrepresented in tech, and particularly in AI, that without their input, algorithms are likely to have blind spots. A paper from New York University’s AI Now Institute has linked a lack of representation in the AI workforce to bias in models, calling it a “diversity disaster.”

In 2020, researchers from Duke University set out to create software that would convert blurry pictures into high-resolution images, using a large language model from Nvidia called StyleGAN, which was developed to produce fake but hyperreal-looking human faces and trained on a dataset of images from photo site Flickr. When users played around with the tool, they found it struggled with low-resolution photos of people of color — including former President Barack Obama and Congresswoman Alexandria Ocasio-Cortez — inadvertently generating images of faces with lighter skin tones and eye colors. The researchers later said the bias likely came out of Nvidia’s model and updated their software.

Nvidia mentions in its code archives that its version of the dataset was collected from Flickr and inherits “all the biases of that website.” In 2022, it added that the dataset should not be used for “development or improvement of facial recognition technologies.”

The model that was criticized has been superseded by a new one, according to Pope.

Nvidia joins a list of large companies where some minority employees have expressed concern that the new technology carries dangers, particularly for people of color. Timnit Gebru, an AI ethics researcher, left Google after the company wanted her to retract her paper that warned of the dangers of training AI models (Gebru said Google fired her; the company said she resigned). She has said that any methodology that uses datasets “too large to document were inherently risky,” as reported by the MIT Technology Review.

Gebru and Joy Buolamwini, founder of the Algorithmic Justice League, published a paper called “Gender Shades” that showed how facial recognition technologies make errors at higher rates when identifying women and people of color. A growing number of studies now support their research that underlying datasets used to power AI models are biased and are capable of harming minorities. International Business Machines Corp, Microsoft and Amazon.com Inc. have stopped selling facial recognition technologies to police departments.

Read More: Humans Are Biased. Generative AI Is Even Worse

“If you look within the history of the tech industry, it’s not a beacon for being reflective of serious commitment to diversity,” said Sarah Myers West, the managing director of AI Now Institute and a co-author of the paper on lack of diversity in the AI workforce. The industry has a long history of not taking minorities and their concerns seriously, she said.

Nvidia’s head of human resources, Shelly Cerio, told Bloomberg that while the company was functioning like a startup — and worrying about surviving — it hired primarily to meet its immediate skills needs: as many engineers with higher degrees as it could find. Now that it’s larger, Nvidia has made diversity in its recruitment more of a priority.

“Have we made progress? Yes,” she said. “Have we made enough progress? Absolutely not.”

Masheika Allgood.Photographer: David Odisho/Bloomberg

The company improved its hiring of Black employees after 2020. Black representation grew from 1.1% in 2020 to 2.5% in 2021, the most recent year that data is available. Asians are the largest ethnic group at the company, followed by White employees.

Pope said all of the company’s efforts don’t “guarantee or eliminate” bias, but do provide a diversified dataset that can help address them. She said that in a fast-paced company that has released hundreds of models, scaling up her processes to address safety is one of the challenges of her role.

It also will take years to tell whether this work will be enough to keep AI systems safe in the real world. Self-driving cars, for example, are still rare.

A few weeks before Allgood left the company, she wrote one last email to Huang reflecting on when she had worked as a teacher in her previous career. She wrote that when she took her students on field trips, she relied on parents and volunteers to help her manage them — an acknowledgement that no one, no matter how brilliant, could handle a group of kids in the wild.

“AI has permanently moved into the field trip stage,” read the email. “You need colleagues and a structure to manage the chaos.”

— With assistance from Jeff Green

the cac mamba · Jan 1, 2024

bnew · Apr 7, 2024

1/3
Most open models seem to exhibit a liberal-leaning bias on tested political topics, and often focus disproportionately on US-related entities and viewpoints. [2403.18932] Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

2/3
Indeed, have a piece on this coming up soon

3/3
I don't think there's a unified approach to climate change in the rest of the world.

[2403.18932] Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

Computer Science > Computation and Language

[Submitted on 27 Mar 2024]

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable.

Comments:	16 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.18932 [cs.CL]
	(or arXiv:2403.18932v1 [cs.CL] for this version)
	[2403.18932] Measuring Political Bias in Large Language Models: What Is Said and How It Is Said Focus to learn more

Submission history

From: Yejin Bang [view email]
[v1] Wed, 27 Mar 2024 18:22:48 UTC (6,999 KB)

https://arxiv.org/pdf/2403.18932

bnew · Jul 7, 2025

1/3
@rohanpaul_ai
Research finds AI models change their medical recommendations when people ask them questions that include slang, typos, odd formatting and even gender-neutral pronouns.

The team generated thousands of synthetic patient notes with varied spacing, misspellings, and emotional tone to mirror real messaging quirks.

They ran GPT-4, two Llama-3 variants, and Writer’s Palmyra-Med on these notes, asking whether the person should self-treat or head to a clinic, plus which labs to order.

Across all models, sloppy or dramatic writing nudged answers toward “stay home,” even though the underlying symptoms never changed.

Notes written with female cues showed the same downgrading, confirming that gender bias piles on top of language bias.

Human clinicians barely shifted recommendations when reading the same variations, so the skew lives inside the algorithms.

Source: newscientist. com/article/2486372-typos-and-slang-spur-ai-to-discourage-seeking-medical-care/

2/3
@fantony_francis
This is getting out of control.
The hype is oversold, and the information is warped.

3/3
@markopolojarvi
Garbage in garbage out. They should have probably tested models with reasoning though for obvious reasons.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Why it’s impossible to build an unbiased AI language model

Veteran

Why it’s impossible to build an unbiased AI language model​

Deeper Learning​

Bits and Bytes​

by Melissa Heikkilä​

Veteran

Meta releases a dataset to probe computer vision models for biases​

Superstar

Veteran

Veteran

Veteran

Superstar

Veteran

Biases in large image-text AI model favor wealthier, Western perspectives​

AI model that pairs text, images performs poorly on lower-income or non-Western images, potentially increasing inequality in digital technology representation​

Veteran

Anthropic leads charge against AI bias and discrimination with new research​

Research method aims to proactively evaluate discrimination in AI​

Study finds patterns of discrimination in language model​

Interventions reduce measured discrimination​

Steering the course of AI ethics​

Transparency and Community Engagement​

Veteran

Nvidia Staffers Warned CEO of Threat AI Would Pose to Minorities​

Veteran

Nvidia Ranks Close to the Bottom in Diverse Hiring​

Veteran

Veteran

Computer Science > Computation and Language​

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said​

Submission history​

Veteran

Why it’s impossible to build an unbiased AI language model

Deeper Learning

Bits and Bytes

by Melissa Heikkilä

Meta releases a dataset to probe computer vision models for biases

Biases in large image-text AI model favor wealthier, Western perspectives

AI model that pairs text, images performs poorly on lower-income or non-Western images, potentially increasing inequality in digital technology representation

Anthropic leads charge against AI bias and discrimination with new research

Research method aims to proactively evaluate discrimination in AI

Study finds patterns of discrimination in language model

Interventions reduce measured discrimination

Steering the course of AI ethics

Transparency and Community Engagement

Nvidia Staffers Warned CEO of Threat AI Would Pose to Minorities

Nvidia Ranks Close to the Bottom in Diverse Hiring

Computer Science > Computation and Language

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

Submission history