It has been said that algorithms are “opinions embedded in code.” Few people understand the implications of that better than Abeba Birhane. Born and raised in Bahir Dar, Ethiopia, Birhane moved to Ireland to study: first psychology, then philosophy, then a PhD in cognitive science at University College Dublin.
During her doctorate, she found herself surrounded by software developers and data science students—immersed in the models they were building and the data sets they were using. But she started to realize that no one was really asking questions about what was actually in those data sets.
Artificial intelligence has infiltrated almost every aspect of our lives: It can determine whether you get hired, diagnose you with cancer, or make decisions about whether to release prisoners on parole. AI systems are often trained on gargantuan data sets, usually scraped from the web for cost-effectiveness and ease. But this means AI can inherit all the biases of the humans who design them, and any present in the data that feeds them. The end result mirrors society, with all the ugliness baked in.
Failing to recognize this risks causing real-world harm. AI has already been accused of underestimating the health needs of Black patients and of making it less likely that people of color will be approved for a mortgage.
Birhane redirected her research toward investigating the data sets that are increasingly shaping our world. She wants to expose their biases and hold the giant corporations that design and profit from them to account. Her work has garnered global recognition. In October 2022, she even got the opportunity to talk about the harms of Big Tech at a meeting with the Dalai Lama.
Often, Birhane only has to scratch the surface of a data set before the problems jump out. In 2020, Birhane and colleague Vinay Prabhu audited two popular data sets. The first is “80 Million Tiny Images,” an MIT set that’s been cited in hundreds of academic papers and used for more than a decade to teach machine learning systems how to recognize people and objects. It was full of offensive labels—including racist slurs for images of Black people. In the other data set, ImageNet, they found pornographic content, including upskirt images of women, which ostensibly did not require the individuals’ explicit consent because they were scraped from the internet. Two days after the pair published their study, the MIT team apologized and took down the Tiny Images dataset.
These problems come from the top. Machine learning research is overwhelmingly male and white, a demographic world away from the diverse communities it purports to help. And Big Tech firms don’t just offer online diversions—they hold enormous amounts of power to shape events in the real world.
Birhane and others have branded this “digital colonialism”—arguing that the power of Big Tech rivals the old colonial empires. Its harms will not affect us all equally, she argues: As technology is exported to the global south, it carries embedded Western norms and philosophies along with it. It’s sold as a way of helping people in underdeveloped nations, but it’s often imposed on them without consultation, pushing them further into the margins. “Nobody in Silicon Valley stays up worrying about the unbanked Black women in a rural part of Timbuktu,” Birhane says.