Back to Prindle Institute

Should You Outsource Important Life Decisions to Algorithms?

photograph of automated fortune teller

When you make an important decision, where do you turn for advice? If you’re like most people, you probably talk to a friend, loved one, or trusted member of your community. Or maybe you want a broader range of possible feedback, so you pose the question to social media (or even the rambunctious hoard of Reddit). Or maybe you don’t turn outwards, but instead rely on your own reasoning and instincts. Really important decisions may require that you turn to more than one source, and maybe more than once.

But maybe you’ve been doing it wrong. This is the thesis of the book Don’t Trust Your Gut: Using Data to Get What You Really Want in Life by Seth Stephens-Davidowitz.

He summarizes the main themes in a recent article: the actual best way to make big decisions when it comes to your happiness is to appeal to the numbers.

Specifically, big data: the collected information about the behavior and self-reports of thousands of individuals just like you, analyzed to tell you who to marry, where to live, and how many utils of happiness different acts are meant to induce. As Stephens-Davidowitz states in the opening line of the book: “You can make better life decisions. Big Data can help you.”

Can it?

There are, no doubt, plenty of instances in which looking to the numbers for a better approximation of objectivity can help us make better practical decisions. The modern classic example that Stephens-Davidowitz appeals to is Moneyball, which documents how analytics shifted evaluations of baseball players from gut instinct to data. And maybe one could Moneyball one’s own life, in certain ways: if big data can give you a better chance of making the best kinds of personal decisions, then why not try?

If that all seems too easy, it might be because it is. For instance, Stephens-Davidowitz relies heavily on data from the Mappiness project, a study that pinged app users at random intervals to ask them what they were doing at that moment and how happy they felt doing it.

One activity that ranked fairly low on the list was reading a book, scoring just above sleeping but well below gambling. This is not, I take it, an argument that one ought to read less, sleep even less, and gamble much more.

Partly because there’s more to life than momentary feelings of happiness, and partly because it just seems like terrible advice. It is hard to see exactly how one could base important decisions on this kind of data.

Perhaps, though, the problem lies in the imperfections of our current system of measuring happiness, or any of the numerous problems of algorithmic bias. Maybe if we had better data, or more of it, then we’d be able to generate a better advice-giving algorithm. The problem would then lie not in the concept of basing important decisions on data-backed algorithmic advice, but in its current execution. Again, from Stephens-Davidowitz:

These are the early days of the data revolution in personal decision-making. I am not claiming that we can completely outsource our lifestyle choices to algorithms, though we might get to that point in the future.

So let’s imagine a point in the future where these kinds of algorithms have improved to a point where they will not produce recommendations for all-night gambling. Even then, though, reliance on an impersonal algorithm for personal decisions faces familiar problems, ones that parallel some raised in the history of ethics.

Consider utilitarianism, a moral system that says that one ought to act in ways that maximize the most good, for whatever we should think qualifies as good (for instance, one version holds that the sole or primary good is happiness, so one should act in ways that maximize happiness and/or minimize pain). The view comes in many forms but has remained a popular choice of moral systems. One of its major benefits is that it provides a determinate and straightforward way (at least, in principle) of determining which actions one morally ought to perform.

One prominent objection to utilitarianism, however, is that it is deeply impersonal: when it comes to determining which actions are morally required, people are inconsequential, since what’s important is just the overall increase in utility.

That such a theory warrants a kind of robotic slavishness towards calculation produces other unintuitive results, namely that when faced with moral problems one is perhaps better served by a calculator than actual regard for the humanity of those involved.

Philosopher Bernard Williams thus argued that these kinds of moral systems appeal to “one thought too many.” For example, if you were in a situation where you need to decide which of two people to rescue – your spouse or a stranger – one would hope that your motivation for saving your spouse was because it was your spouse, not because it was your spouse and because the utility calculations worked out in the favor of that action. Moral systems like utilitarianism, says Williams, fail to capture what really motivates moral actions.

That’s an unnuanced portrayal of a complex debate, but we can generate parallel concerns for the view that we should outsource personal decision-making to algorithms.

Algorithms using aggregate happiness data don’t care about your choices in the way that, say, a friend, family member, or even your own gut instinct does.

But when making personal decisions we should, one might think, seek out advice from sources that are legitimately concerned about what we find important and meaningful.

To say that one should adhere to such algorithms also seems to run into a version of the “one thought too many” problem. Consider someone who is trying to make an important life decision, say about who they should be in a relationship with, how they should raise a child, what kind of career to pursue, etc. There are lots of different kinds of factors one could appeal to when making these decisions. But even if a personal-decision-making algorithm said your best choice was to, say, date the person who made you laugh and liked you for you, your partner would certainly hope that you had made your decision based on factors that didn’t have to do with algorithms.

This is not to say that one cannot look to data collected about other people’s decisions and habits to try to better inform one’s own. But even if these algorithms were much better than they are now, a basic problem would remain with outsourcing personal decisions to algorithms, one that stems from a disconnect between meaningful life decisions and impersonal aggregates of data.

The Ethics of AI Behavior Manipulation

photograph of server room

Recently, news came from California that police were playing loud, copyrighted music when responding to criminal activity. While investigating a stolen vehicle report, video was taken of the police blasting Disney songs like those from the movie Toy Story. The reason the police were doing this was to make it easier to take down footage of their activities. If the footage has copyrighted music, then a streaming service like YouTube will flag it and remove it, so the reasoning goes.

A case like this presents several ethical problems, but in particular it highlights an issue of how AI can change the way that people behave.

The police were taking advantage of what they knew about the algorithm to manipulate events in their favor. This raises obvious questions: Does the way AI affects our behavior present unique ethical concerns? Should we be worried about how our behavior is adapting to suit an algorithm? When is it wrong to use one’s understanding of an algorithm as leverage to their own benefit? And, if there are ethical concerns about algorithms having this effect on our behavior should they be designed in ways to encourage you to act ethically?

It is already well-known that algorithms can affect your behavior by creating addictive impulses. Not long ago, I noted how the attention economy incentivizes companies to make their recommendation algorithms as addictive as possible, but there are other ways in which AI is altering our behavior. Plastic surgeons, for example, have noted a rise in what is being called “snapchat dysmorphia,” or patients who desperately want to look like their snapchat filter. The rise of deepfakes are also encouraging manipulation and deception, making it more difficult to tell reality apart from fiction. Recently, philosophers John Symons and Ramón Alvarado have even argued that such technologies undermine our capacity as knowers and diminishes our epistemic standing.

Algorithms can also manipulate people’s behavior by creating measurable proxies for otherwise immeasurable concepts. Once the proxy is known, people begin to strategically manipulate the algorithm to their advantage. It’s like knowing in advance what a test will include and then simply teaching the test. YouTubers chase whatever feature, function, length, or title they believe the algorithm will pick up and turn their video into a viral hit. It’s been reported that music artists like Halsey are frustrated by record labels who want a “fake viral moment on TikTok” before they will release a song.

This is problematic not only because viral TikTok success may be a poor proxy for musical success, but also because the proxies in the video that the algorithm is looking for also may have nothing to do with musical success.

This looks like a clear example of someone adapting their behavior to suit an algorithm for bad reasons. On top of that, the lack of transparency creates a market for those who know more about the algorithm and can manipulate it to take advantage of those that do not.

Should greater attention be paid to how algorithms generated by AI affect the way we behave? Some may argue that these kinds of cases are nothing new. The rise of the internet and new technologies may have changed the means of promotion, but trying anything to drum up publicity is something artists and labels have always done. Arguments about airbrushing and body image also predate the debate about deepfakes. However, if there is one aspect of this issue that appears unique, it is the scale at which algorithms can operate – a scale which dramatically affects their ability to alter the behavior of great swaths of people. As philosopher Thomas Christiano notes (and many others have echoed), “the distinctive character of algorithmic communications is the sheer scale of the data.”

If this is true, and one of the most distinctive aspects of AI’s ability to change our behavior is the scale at which it is capable of operating, do we have an obligation to design them so as to make people act more ethically?

For example, in the book The Ethical Algorithm, the authors present the case of an app that gives directions. When an algorithm is considering the direction to give you, it could choose to try and ensure that your directions are the most efficient for you. However, by doing the same for everyone it could lead to a great deal of congestion on some roads while other roads are under-used, making for an inefficient use of infrastructure. Alternatively, the algorithm could be designed to coordinate traffic, making for a more efficient overall solution, but at the cost of potentially getting personally less efficient directions. Should an app cater to your self-interest or the city’s overall best-interest?

These issues have already led to real world changes in behavior as people attempt to cheat the algorithm to their benefit. In 2015, there were reports of people reporting false traffic accidents or traffic jams to the app Waze in order to deliberately re-route traffic elsewhere. Cases like this highlight the ethical issues involved. An algorithm can systematically change behavior, and just like trying to ease congestion, it can attempt to achieve better overall outcomes for a group without everyone having to deliberately coordinate. However, anyone who becomes aware of the system of rules and how they operate will have the opportunity to try to leverage those rules to their advantage, just like the YouTube algorithm expert who knows how to make your next video go viral.

This in turn raises issues about transparency and trust. The fact that it is known that algorithms can be biased and discriminatory weakens trust that people may have in an algorithm. To resolve this, the urge is to make algorithms more transparent. If the algorithm is transparent, then everyone can understand how it works, what it is looking for, and why certain things get recommended. It also prevents those who would otherwise understand or reverse engineer the algorithm from leveraging insider knowledge for their own benefit. However, as Andrew Burt of the Harvard Business Review notes, this introduces a paradox.

The more transparent you make the algorithm, the greater the chances that it can be manipulated and the larger the security risks that you incur.

This trade off between security, accountability, and manipulation is only going to become more important the more that algorithms are used and the more that they begin to affect people’s behaviors. Some outline of the specific purposes and intentions of an algorithm as it pertains to its potential large-scale effect on human behavior should be a matter of record if there is going to be public trust. Particularly when we look to cases like climate change or even the pandemic, we see the benefit of coordinated action, but there is clearly a growing need to address whether algorithms should be designed to support these collective efforts. There also needs to be greater focus on how proxies are being selected when measuring something and whether those approximations continue to make sense when it’s known that there are deliberate efforts to manipulate them and turned to an individual’s advantage.

The Insufficiency of Black Box AI

image of black box spotlighted and on pedestal

Google and Imperial College London have collaborated in a trial of an AI system for diagnosing breast cancer. Their most recent results have shown that the AI system can outperform the uncorroborated diagnosis of a single trained doctor and perform on par with pairs of trained diagnosticians. The AI system was a deep learning model, meaning that it works by discovering patterns on its own by being trained on a huge database. In this case the database was thousands of mammogram images. Similar systems are used in the context of law enforcement and the justice system. In these cases the learning database is past police records. Despite the promise of this kind of system, there is a problem: there is not a readily available explanation of what pattern the systems are relying on to reach their conclusions. That is, the AI doesn’t provide reasons for its conclusions and so the experts relying on these systems can’t either.

AI systems that do not provide reasons in support of their conclusions are known as “black box” AI. In contrast to these are so-called “explainable AI”. This kind of AI system is under development and likely to be rapidly adopted within the healthcare field. Why is this so? Imagine visiting the doctor and receiving a cancer diagnosis. When you ask the doctor, “Why do you think I have cancer?” they reply only with a blank stare or reply, “I just know.” Would you find this satisfying or reassuring? Probably not, because you have been provided neither reason nor explanation. A diagnosis is not just a conclusion about a patient’s health but also the facts that lead up to that conclusion. There are certain reasons that the doctor might give you that you would reject as reasons that can support a cancer diagnosis.

For example an AI designed at Stanford University system being trained to help diagnosis tuberculosis used non-medical evidence to generate its conclusions. Rather than just taking into account the images of patients’ lungs, the system used information about the type of X-ray scanning device when generating diagnoses. But why is this a problem? If the information about what type of X-ray machine was used has a strong correlation with whether a patient  has tuberculosis shouldn’t that information be put to use? That is, don’t doctors and patients want to maximize the number of correct diagnoses they make? Imagine your doctor telling you, “I am diagnosing you with tuberculosis because I scanned you with Machine X, and people who are scanned by Machine X are more likely to have tuberculosis.” You would not likely find this a satisfying reason for a diagnosis. So if an AI is making diagnoses based on such facts this is a cause for concern.

A similar problem is discussed in philosophy of law when considering whether it is acceptable to convict people on the basis of statistical evidence. The thought experiment used to probe this problem involves a prison yard riot. There are 100 prisoners in the yard, and 99 of them riot by attacking the guard. One of the prisoners did not attack the guard, and was not involved in planning the riot. However there is no way of knowing specifically of each prisoner whether they did, or did not, participate in the riot. All that is known that 99 of the 100 prisoners participated. The question is whether it is acceptable to convict each prisoner based only on the fact that it is 99% likely that they participated in the riot.

Many who have addressed this problem answer in the negative—it is not appropriate to convict an inmate merely on the basis of statistical evidence. (However, David Papineau has recently argued that it is appropriate to convict on the basis of such strong statistical evidence.) One way to understand why it may be inappropriate to convict on the basis of statistical evidence alone, no matter how strong, is to consider the difference between circumstantial and direct evidence. Direct evidence is any evidence which immediately shows that someone committed a crime. For example, if you see Robert punch Willem in the face you have direct evidence that Robert committed battery (i.e., causing harm through touch that was not consented to). If you had instead walked into the room to see Willem holding his face in pain and Robert angrily rubbing his knuckles, you would only have circumstantial evidence that Robert committed battery. You must infer that battery occurred from what you actually witnessed.

Here’s the same point put another way. Given that you saw Robert punch Willem in the face, there is a 100% chance that Robert battered Willem—hence it is direct evidence. On the other hand, given that you saw Willem holding his face in pain and Robert angrily rubbing his knuckles, there is a 0% – 99% chance that Robert battered Willem. The same applies to any prisoner in the yard during the riot: given that they were in the yard during the riot, there is at best a 99% chance that the prisoner attacked the guard. The fact that a prisoner was in the yard at the time of the riot is a single piece of circumstantial evidence in favor of the conclusion that that prisoner attacked the guard. A single piece of circumstantial evidence is not usually taken to be sufficient to convict someone—further corroborating evidence is required.

The same point could be made about diagnoses. Even if 99% of people examined by Machine X have tuberculosis, simply being examined by Machine X is not a sufficient reason to conclude that someone has tuberculosis. Not reasonable doctor would make a diagnosis on such a flimsy basis, and no reasonable court would convict someone on the flimsy basis in the prison yard riot case above. Black box AI algorithms might not be basing diagnoses or decisions about law enforcement on such a flimsy basis. But because this sort of AI system doesn’t provide its reasons, there is no way to tell what makes its accurate conclusions correct, or its inaccurate conclusions incorrect. Any domain like law or medicine where the reasons that underlie a conclusion are crucially important is a domain in which explainable AI is a necessity, and in which black box AI must not be used.

The Digital Humanities: Overhyped or Misunderstood?

An image of the Yale Beinecke Rare Books Library

A recent series of articles appearing in The Chronicle of Higher Education has reopened the discussion about the nature of the digital humanities. Some scholars argue the digital humanities are a boon to humanistic inquiry and some argue they’re a detriment, but all sides seem to agree it’s worth understanding just what the scope and ambition of the digital humanities is and ought to be.

Continue reading “The Digital Humanities: Overhyped or Misunderstood?”

Judged by Algorithms

The Chinese government announced in October that they are setting up a “social credit” system, designed to determine trustworthiness. Every citizen will be put into a database which uses fiscal and government information – including online purchases – to determine their trustworthiness ranking. Information in the ranking includes everything from traffic tickets to academic degrees to if women have taken birth control. Citizens currently treat it like a game, comparing their scores to others in attempts to get the highest score out of their social circle. Critics call the move “dystopian,” but this is only the latest algorithm designed to judge people without face to face interaction.

Continue reading “Judged by Algorithms”