The Unintended Consequences and Negative Impact of New Machine Learning Applications

Machine learning applications are becoming more powerful and more pervasive, and as a result the risk of unintended consequences increases and must be carefully managed. Recent glitches by major companies demonstrate a failure to detect unintended consequences. In this post I describe a few examples and discuss ways to reduce unintended consequences associated with new machine learning applications.

It seems that machine learning is taking over the high-tech world. Most consumer applications have a machine learning component, and the recent progress in machine learning technology and scalable infrastructure technology lead to new use-cases that were previously considered strictly experimental. Forr example, improvements in speech recognition have recently led to a variety of digital assistants (MS Cortana, Amazon Alexa, Apple Siri, Google Now). Another example is new computer vision technology that led to automatic photo labeling.

Creating new applications that use machine learning is difficult work. And so it is understandable that when engineers and scientists make a technological breakthrough and have an effective product they rush to market without carefully considering possible unintended consequences. On the other hand, product managers more often than not do not have the engineering and science background to provide necessary guidance and leadership.

I describe below three recent examples.

  • In May 2015 Flickr released an automatic image tagging capability that mistakenly labeled a black man for an ape, Auschwitz for sports, and Dachau for a jungle gym. After the story broke out, Flickr apologized and removed some offensive tags. Obviously, the engineers, scientists, and product managers did not foresee these mistakes and the anger that would follow. Instead of celebrating a technological win, Flickr suffered a PR blow.
  • Soon afterwards, Google came up with a photo labeling tool similar to Flickr, which made similar mistakes. Black men were tagged as gorillas. Google removed the offensive tags and Google’s Yonatan Zunger admitted the mistake saying “Lots of work being done, and lots still to be done. But we’re very much on it”. Amazingly, they made the same mistake as Flickr in not understanding the impact of possible mistakes on their users and on the public. The result again was offending a wide segment of the population rather than scoring a PR win due to technological “success”.
  • A recent Carnegie Mellon University study showed that Google displayed ads in a way that discriminated based on the gender of the user. The study created multiple fake persona, some males and some females, with identical browsing history. Later, a third party site showed Google ads for senior executive positions six times more to the fake men than to the fake women. The ad matching algorithm may have picked up a signal in the data, but the end result is that it discriminated based on gender. In other words, the algorithm picked up on the gender pay gap and helped to perpetuate it. There are likely millions of people today who are affected by such data-based discrimination but it is very hard to prove (without the creation of fake persons with controlled browsing history as the CMU study did).

There are more examples, but these three cases above are sufficient to show a trend. Machine learning technology is advancing fast and neither the engineers and scientists nor the product managers are able to predict negative unintended consequences and slow down new launches (at least in some cases).

Some may argue that this is not such a big deal. The algorithms pick up signals in the data and when the team discovers that there is a “glitch” they fix it. One problem is that there may be a very large number of undetected cases with a very large impact on society. The Google ads case is one example where systematic discrimination was being perpetrated on a very large number of Google users for potentially a long time (and perhaps still is). A second problem is that the pace of innovation is high and it is hard to predict what future ML applications will do and what would be the possible unintended consequences. With self driving cars, for example, this could be a life or death situation.

This brings us to the question of what can we do to decrease the chance of such blunders happening in the future. Completely stifling machine learning innovation is not a desirable solution. There seem to be two possibilities: put the burden of anticipating unintended consequences on the engineers and scientists that develop the applications, or on product managers that provide guidance and direction.

There are two issues at hand: (a) estimating unintended consequences, and (b) assigning value or negative impact to them. Engineers and scientists are better at (a), while product managers (and if needed company executives) are better at (b). But it probably makes sense for these professionals to refine their answers to (a) and (b) by working together. Engineers and scientist need to broader their perspective and think beyond technology about company and society impact, and product managers need to have a better technical understanding of machine learning.

Based on the examples above and others, it is pretty clear that every novel machine learning application should have a careful review of estimating unintended consequences and estimating the corresponding negative impact, and if needed escalate to company executives for resolution.