Improving DEI with balanced data for AI

Think:Act Magazine "It’s time to rethink AI"
Improving DEI with balanced data for AI

May 15, 2024

The push for fairer representation of marginalized communities in AI datasets

Listen to the article


by Grace Browne
Artwork byCarsten Gueth

The datasets we feed AI today will likely shape its ideological direction for decades to come. And while the industry has long presumed that these sets would be large enough to represent diversity through the sheer volume of data, researchers and advocates are quickly uncovering evidence that this is not the case. The question now is how a biased society can work toward teaching technology a new definition of neutrality.

Key takeaways from this piece
Bias is getting baked in:

Left unchecked, prejudice in algorithms can work to further entrench systemic racism or sexism.

It starts at the top:

Systemic biases stem from the upper echelons of the tech industry which is skewed toward white and male.

Build your awareness:

The problem with AI is not a future dystopian one, but a present danger built on prejudice that is already in the system.

Timnit Gebru was at the top of the field of ethical AI in 2018. At that time, it was a new area of focus that had slowly emerged to call attention to the fact that artificial intelligence was not simply a jumble of algorithms sitting in the cloud, but systems which contained biases that could wreak unintended consequences. Gebru, originally from Ethiopia, had become a star after publishing a landmark paper that found that facial analysis technology had higher error rates in women with darker skin tones due to unrepresentative training data.

She was headhunted by Google to co-lead its ethical AI team that same year. But her tenure was not a smooth one. In December 2020, Gebru was ousted from Google – Google maintains that she resigned – after being asked to retract a paper that was calling for technology companies to be doing more to ensure that AI systems were not exacerbating historical biases, as well as an email that called attention to her own company's approach to hiring minorities. And her departure was not a quiet one: An open letter expressing solidarity for Gebru was signed by over 1,500 Google employees.

29%: The increase in Black US patients who would receive additional care if an algorithmic bias used to identify and help patients with complex health needs were remedied, according to a 2019 study.

Source: Science

Since its release at the end of 2022, ChatGPT, a chatbot built by the company OpenAI, has exploded in popularity, ushering in a new era of widely used generative AI systems that create content including text, images and video at the touch of a button. "This shift marks the most important technological breakthrough since social media," Time said in early 2023. This new wave of excitement has also put governments and academics like Gebru alike on edge: As bigger, more powerful AI steams ahead, will we reach a point where we lose control? Her story has become symbolic of the technology companies' reluctance to address the harms and biases hidden in their algorithms – and the time to reckon with these issues is now.

Today, algorithms govern more of our lives than many realize. This can range from every time you type a query into a search engine to whether a judge hands out a prison sentence. Machine learning technology, once the preserve of complex research papers hidden in journals behind paywalls, has firmly ventured into the real world – and now the knowledge is out there, there's no way of putting it back. AI systems have proliferated into public and social systems, such as housing, social benefits and policing. And while it was once presumed that the datasets AI was trained on were so large that it would iron out any biases contained within the data, this has increasingly been proven not to be the case. The idea that algorithms may reflect the biases of the humans who train them wasn't really an accepted concept until the 2010s, when more and more researchers began to sound the alarm, like Gebru. Now, it's widely recognized that technology is not neutral. And left unchecked, biases and prejudice lurking in algorithms can lead to social harms, such as entrenching systemic racism or sexism.

Rashida Richardson

Rashida Richardson is a technology policy expert and researcher into the social and civil rights implications of artificial intelligence.

Rashida Richardson didn't begin her career enmeshed in the field of fair AI. Rather, she was a US lawyer, working on civil rights issues such as housing, school desegregation and criminal justice reform. Then she began noticing that in a lot of these systems, it was increasingly being proposed that algorithms do the dirty work. Companies were approaching the government offering their technology, which the government in turn was viewing as a silver bullet solution to its limited resources. Richardson was inherently skeptical: "How is an algorithm really going to fix something that stems from structural inequality that no one's been able to figure out?" she remembers wondering.

She took a look at some of the companies that were approaching police departments and making bold claims about what their technology could do. Richardson decided to investigate one of the main ways governments were using machine learning: predictive policing systems that use historical crime data to make predictions about where crime is likely to occur in the future, or who is most likely to be involved. In 2019, she co-published a paper that examined 13 jurisdictions in the United States that used these systems. Richardson and her colleagues found that nine of them were training algorithms based on data derived from unlawful police practices, or "dirty data." This included falsifying data to give the impression of falling crime rates or planting drugs on innocent people in order to reach arrest quotas. It meant these systems were at risk of unfairly targeting minorities.

"How is an algorithm really going to fix something that stems from structural inequality that no one's been able to figure out?"
Portrait of Rashida Richardson

Rashida Richardson

Law and technology policy expert

Predictive policing is just one area in which the algorithms give away their biases – and where they could therefore also cause harm. Take health care, for instance. There's been mushrooming interest in implementing AI into medicine to make it quicker, better and cheaper. But many ventures have shown that, if not designed carefully, AI can further fuel racial bias.

In a 2019 paper published in the journal Science, the authors reported that an algorithm widely used in hospitals in the United States was systematically discriminating against Black people. The software program, which was being used to determine who should get access to high-risk health care management programs, was routinely selecting healthier white patients over less healthy Black patients; the algorithm was being employed to manage care for 200 million people every year.

A paper published in 2022 looked at image recognition technologies which claimed they could classify skin cancers as well as human experts. When the researchers looked at the datasets used to train these AI systems, they found a stark paucity of images of darker skin. Most of the datasets contained images that originated from Europe, North America and Oceania exclusively. "These findings highlight the dangers of implementing algorithms for widespread use on broad populations without dataset transparency," the authors concluded.

Abstract artwork in Neon Green, Neon Purple and Silver by Carsten Gueth

Mark Yatskar, an assistant professor at the University of Pennsylvania who studies fairness in machine learning, feels pessimistic about serious change in his industry. Part of the issue, he has learned from his work, is that machine learning scientists rarely think of the end user of their research. But he believes asking researchers to ensure their systems are fair and ethical is not the answer: They're typically not the ones who are deploying them.

Timnit Gebru

Timnit Gebru is a political activist, computer scientist specialized in algorithmic bias and an advocate for diversity in technology. She is the co-founder of Black in AI and the founder of DAIR.

It's easy to call for more regulation, Yatskar says, but he doesn't think that's the right answer, "in part because there's not a perfect agreement among researchers who think about fairness, even about definitions." In what one researcher might call a fair algorithm, another may find plenty of problematic aspects. Full transparency, in which researchers can perform what's called an algorithmic audit in which they inspect the inputs, outputs and the code of an algorithm to hunt for bias, may work better. If they can't be fixed, that can be communicated in a public statement.

Another roadblock is that the data that algorithms are trained on are kept secret by the private companies that are doing the training. This makes it much more difficult for researchers to analyze them. And one inescapable conclusion is that these systemic biases stem, at least in part, from the upper echelons of the technology industry. Today, the people that make up the AI industry are overwhelmingly white and male. A 2019 report pointed out that 80% of AI professors were men. Women made up only 15% of AI research staff at Facebook; at Google, that number dropped to just 10%. "Such diversity of experience is a fundamental requirement for those who develop AI systems to identify and reduce the harms they produce," the authors wrote.

"We should focus on the very real and present exploitative practices of the companies who are rapidly centralizing power and increasing social inequities."
Portrait of Timnit Gebru

Timnit Gebru


Richardson, who has served as a technology adviser to the White House as well as the Federal Trade Commission, says there is simply no clear way to regulate these technologies. On the part of governments, policymakers suffer from poor understanding of how these technologies work. And the issues that are plaguing these AI systems are more systemic and harder to fix than just making an algorithm "fair." How do you make an unbiased algorithm when people in the real world are still guilty of bias and prejudice? "You can't unlink it from the social aspect – and we just don't know how to deal with those issues," Richardson says. "These are complicated problems that policymakers in society don't like to deal with."

The more we realize that this technology is not devoid of bias, the better. But while awareness of this big problem has swelled in recent years, still no one is quite sure what to do, says Richardson. "Even though there's more urgency, there aren't clear ideas on what to do," she says. "No one wants to be honest about how hard it is to figure out some of these issues."

After leaving Google, Gebru went on to found the Distributed AI Research Institute – or DAIR – a community-driven AI research institute that centers on diverse perspectives. She's also not finished calling attention to the harms of AI. In March 2023, thousands of people, including Elon Musk and Steve Wozniak, signed an open letter that called for a six-month pause on AI development to prevent dystopian threats such as "loss of control of our civilization." Gebru, along with a handful of other AI ethicists, co-authored a counterpoint to the letter. They argued that it failed to call out the current harms that AI causes. "It is indeed time to act," they wrote. "But the focus of our concern should not be imaginary 'powerful digital minds.' Instead, we should focus on the very real and very present exploitative practices of the companies claiming to build them, who are rapidly centralizing power and increasing social inequities."

About the author
Portrait of Grace Browne
Grace Browne
Grace Browne is a freelance journalist who covers science and health. Previously a staff writer at WIRED magazine, her writing has also appeared in outlets such as New Scientist, Undark, BBC Future, and Hakai Magazine, amongst others. For her work, she was shortlisted for Health Journalist of the Year at The Press Awards 2022 and for Best Specialist Writer at the British Society of Magazine Editors Talent Awards 2023. She lives in London.
All online publications of this edition
Load More
Portrait of Think:Act Magazine

Think:Act Magazine

Munich Office, Central Europe