Ask these Questions when Applying Machine Learning to People

The field of machine learning matured with applications like spam filtering, targeted advertising, self-driving cars, and weather predictions. As machine learning techniques are increasingly used to make predictions about people, there are a few machine learning ethics questions we need to be asking ourselves.

1. Will the future really be like the past? Do we want it to be?

The assumption that the future will resemble the past has held true for self driving cars, the weather, advertisement clicks, items people want to buy, and even who will fall in love with whom. It does not always hold true when we’re making predictions in social realms that we’re simultaneously trying to change, such as education, debt management, and criminal justice. On one hand, past data may contain trends or biases we don’t want to reinforce in our predictions. On the other hand, past data may not account for new social interventions.

2. Will more data make my model smarter?

Having more dimensions in the data with the same number of samples often results in models that perform worse and/or are less generalizable. Adding more samples to the training data may increase the likelihood of training a model that will be useful for predictions. However, more data samples are only useful if they represent the predictions you want to make.

3. Is this raw data objective? In what ways might it not be?

Raw data that doesn’t include anything about people, such as data about space or weather, might be more objective. Even in those realms, there’s subjectivity in how we collect data and how we interpret it. Raw data that is generated by people’s behavior, such as internet searches, shopping, calling 911, making arrests, grading students, examining patients, and making loan decisions, contains people’s biases.

4. Are we treating our results as analysis or as new facts?

Machine learning takes in large amounts of data to make decisions. In doing so, analog information is converted to real numbers, and real numbers are often converted to a classification or a yes/no decision. This does not increase the amount of information in the world; it analyzes it.

For some reason, it’s easier for us to understand that a credit score (which makes no claims to be based in machine learning) as an analysis of many facts about a person’s credit history. We’re well aware that for an individual person, a credit score is an analysis that ignores a great deal of information about their life. Similarly with machine learning, a prediction is an analysis of facts we already have, so let’s be wary of treating it as a new fact.

5. Is there an acceptable margin of error?

If a targeted advertisement application gets a prediction wrong, the only consequence is that a user doesn’t click the ad. However, if we’re making predictions to decide whether or not to put a person in prison or give someone an organ donation, what’s the acceptable margin of error? This boils down to big ethical questions.

There are solid arguments for accepting only a zero margin of error, which would mean we should not be using predictive learning to make these decisions. Other ethical approaches might support an acceptable margin of error for the greater good. But math can’t answer these ethical questions for us.