Bias and the Myth of the Objective Machine

In the past few years, AI has woven itself into nearly every corner of how we work, communicate, and make decisions. Hiring algorithms screen (and discard) resumes before a human ever reads them. Automated systems flag fraud, recommend medical treatments, and help determine prison sentences. The assumption underlying all of this is that machines, unlike people, are objective. Every layer of a machine learning system — the training data, the success metrics, the team that built it — was shaped by humans. And the data these systems learn from reflects not just human history, but the particular slice of humanity that got to record it. It’s ultimately a human problem, which means humans can fix it.

In 2015, a software engineer named Jacky ,Alciné opened Google Photos and discovered that its auto-tagging algorithm had labeled photos of him and his Black friends as “gorillas.” Google’s response wasn’t to actually fix the underlying bias in the model. They just removed “gorilla”, “chimp”, “chimpanzee”, and “monkey” as tags entirely. Years later, those categories were still blocked. That’s neither an effective nor a compassionate solution. It makes you wonder what other accidental uncomfortable, or problematic patterns will appear later because they were never surfaced during the design and build process.

The things AI is genuinely good at, like pattern recognition — speed, and repetition at scale — are the same things that make its biases so dangerous when left unchecked. As these tools grow in their scope and abilities to mimic human characteristics, those biases grow, as well. The more we rely on AI to make consequential decisions, the more those biases have consequences.

Man-made, Man-influenced

AI systems learn from human-generated data. That data carries the full weight of human history: prejudices, gaps in representation, and the tendency to center certain experiences as “normal” and treat everything else as edge cases.

Hiring algorithms have been shown to discriminate based on gender, race, and ethnicity. Speech recognition systems are twice as likely to misinterpret audio from Black speakers as from white speakers. AI-powered academic integrity tools flag non-native English writers as likely cheaters, based on patterns that don’t account for how English fluency develops.

Healthcare is an area where we often assume AI will be a force for pure good. But a meta-analysis published in JAMA Network Open reviewed 555 neuroimaging-based AI models for detecting psychiatric disorders and found that 83% of them had a high risk of bias. 83%!

Perhaps most concerningly, AI facial recognition systems from IBM, Microsoft, and Amazon have performed dramatically worse on darker-skinned faces — with error rates as high as 35% for darker-skinned women, compared to less than 1% for lighter-skinned men. The implications are dangerous: when facial recognition is used in law enforcement, hiring, or security, that error rate isn’t a statistic — it’s someone’s life.

The “Average User” Problem, Again

I love the lesson taught by the Air Force’s infamous cockpit problem: in the 1950s, they designed a cockpit for the “average” pilot — only to discover decades later that not a single one of the 4,000 pilots they measured actually matched that average. Designing for a fictional “average” really means designing for nobody.

AI has the same problem, but at an exponential scale. These systems are exceptional at identifying patterns in large datasets and finding trends, the majority, the most common case. But what happens to everyone who falls outside of that center?

A language model trained predominantly on text from white, Western, male sources will reflect those assumptions in its outputs. An image classification model trained on a dataset that underrepresents certain skin tones will perform worse on those skin tones. It may not be intentional, but it is doing exactly what it was designed to do: learn and output based on what it was given, with no emotional discernment. The problem is what humans decided to give it and who was (or more importantly, wasn’t) in the room when those decisions were made.

This Isn’t Inevitable

There is good news: none of this is technically inevitable. These are design choices, like anything else. We often speak in the software community (or we should be — please, speak about this!) how design choices are never neutral. You’re either helping a user move closer to what they want or need, or causing them some amount of friction or suffering in the process. It’s no different when AI gets involved: we end up with decisions to make about what data to use, how to validate models, who to test with, and who to include in the teams doing that work.

The lack of representation in tech has a direct impact on the technology we build. Had there been a person of color involved at any step in building Google Photos’ auto-tagging feature, the horrifying “gorilla” incident almost certainly wouldn’t have happened. One person’s presence doesn’t magically fix a dataset, but diverse teams ask different questions, catch different failure modes, and push back on assumptions that feel invisible to people for whom the status quo works just fine.

A Checklist

Yes, these models are constantly changing, and it can feel impossible to keep up. So what can we actually do about all this?

  • Diversify the teams building these systems. This isn’t to check a box or proudly claim how inclusive you are — it has a measurable technical impact as well as an ethical one. The people writing the algorithms, curating the training data, and defining the success metrics shape what the system values and what it overlooks.
  • Test with people who aren’t like you. If you’re validating a model and your test group is demographically homogeneous, you don’t actually know how it performs. You know how it performs on people like your test group.
  • Stay skeptical of your own outputs. AI-generated content like code, alt text, recommendations, and classifications can look authoritative while carrying embedded bias. “The model said so” is not real validation. Humans need to stay in the loop, now more than ever, and to challenge your own biases along with the machine’s.
  • Name the bias when you see it. Jacky Alciné tweeted about Google Photos in 2015 and the ensuing publicity forced a response, however inadequate. Calling these things out is important: it holds companies accountable and makes other people aware. We shouldn’t need to rely on users to catch these failures “in prod”, but when they do, it deserves a genuine response and repair.

Human skills like empathy, critical thinking, discernment, and compassion are essential to identify and mend the gaps these systems cause. These tools were built by people, and they reflect people. Fixing them will also require people — different ones, and more of them, at every step.

Sources

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *