Back to Prindle Institute

The Insufficiency of Black Box AI

image of black box spotlighted and on pedestal

Google and Imperial College London have collaborated in a trial of an AI system for diagnosing breast cancer. Their most recent results have shown that the AI system can outperform the uncorroborated diagnosis of a single trained doctor and perform on par with pairs of trained diagnosticians. The AI system was a deep learning model, meaning that it works by discovering patterns on its own by being trained on a huge database. In this case the database was thousands of mammogram images. Similar systems are used in the context of law enforcement and the justice system. In these cases the learning database is past police records. Despite the promise of this kind of system, there is a problem: there is not a readily available explanation of what pattern the systems are relying on to reach their conclusions. That is, the AI doesn’t provide reasons for its conclusions and so the experts relying on these systems can’t either.

AI systems that do not provide reasons in support of their conclusions are known as “black box” AI. In contrast to these are so-called “explainable AI”. This kind of AI system is under development and likely to be rapidly adopted within the healthcare field. Why is this so? Imagine visiting the doctor and receiving a cancer diagnosis. When you ask the doctor, “Why do you think I have cancer?” they reply only with a blank stare or reply, “I just know.” Would you find this satisfying or reassuring? Probably not, because you have been provided neither reason nor explanation. A diagnosis is not just a conclusion about a patient’s health but also the facts that lead up to that conclusion. There are certain reasons that the doctor might give you that you would reject as reasons that can support a cancer diagnosis.

For example an AI designed at Stanford University system being trained to help diagnosis tuberculosis used non-medical evidence to generate its conclusions. Rather than just taking into account the images of patients’ lungs, the system used information about the type of X-ray scanning device when generating diagnoses. But why is this a problem? If the information about what type of X-ray machine was used has a strong correlation with whether a patient  has tuberculosis shouldn’t that information be put to use? That is, don’t doctors and patients want to maximize the number of correct diagnoses they make? Imagine your doctor telling you, “I am diagnosing you with tuberculosis because I scanned you with Machine X, and people who are scanned by Machine X are more likely to have tuberculosis.” You would not likely find this a satisfying reason for a diagnosis. So if an AI is making diagnoses based on such facts this is a cause for concern.

A similar problem is discussed in philosophy of law when considering whether it is acceptable to convict people on the basis of statistical evidence. The thought experiment used to probe this problem involves a prison yard riot. There are 100 prisoners in the yard, and 99 of them riot by attacking the guard. One of the prisoners did not attack the guard, and was not involved in planning the riot. However there is no way of knowing specifically of each prisoner whether they did, or did not, participate in the riot. All that is known that 99 of the 100 prisoners participated. The question is whether it is acceptable to convict each prisoner based only on the fact that it is 99% likely that they participated in the riot.

Many who have addressed this problem answer in the negative—it is not appropriate to convict an inmate merely on the basis of statistical evidence. (However, David Papineau has recently argued that it is appropriate to convict on the basis of such strong statistical evidence.) One way to understand why it may be inappropriate to convict on the basis of statistical evidence alone, no matter how strong, is to consider the difference between circumstantial and direct evidence. Direct evidence is any evidence which immediately shows that someone committed a crime. For example, if you see Robert punch Willem in the face you have direct evidence that Robert committed battery (i.e., causing harm through touch that was not consented to). If you had instead walked into the room to see Willem holding his face in pain and Robert angrily rubbing his knuckles, you would only have circumstantial evidence that Robert committed battery. You must infer that battery occurred from what you actually witnessed.

Here’s the same point put another way. Given that you saw Robert punch Willem in the face, there is a 100% chance that Robert battered Willem—hence it is direct evidence. On the other hand, given that you saw Willem holding his face in pain and Robert angrily rubbing his knuckles, there is a 0% – 99% chance that Robert battered Willem. The same applies to any prisoner in the yard during the riot: given that they were in the yard during the riot, there is at best a 99% chance that the prisoner attacked the guard. The fact that a prisoner was in the yard at the time of the riot is a single piece of circumstantial evidence in favor of the conclusion that that prisoner attacked the guard. A single piece of circumstantial evidence is not usually taken to be sufficient to convict someone—further corroborating evidence is required.

The same point could be made about diagnoses. Even if 99% of people examined by Machine X have tuberculosis, simply being examined by Machine X is not a sufficient reason to conclude that someone has tuberculosis. Not reasonable doctor would make a diagnosis on such a flimsy basis, and no reasonable court would convict someone on the flimsy basis in the prison yard riot case above. Black box AI algorithms might not be basing diagnoses or decisions about law enforcement on such a flimsy basis. But because this sort of AI system doesn’t provide its reasons, there is no way to tell what makes its accurate conclusions correct, or its inaccurate conclusions incorrect. Any domain like law or medicine where the reasons that underlie a conclusion are crucially important is a domain in which explainable AI is a necessity, and in which black box AI must not be used.

Impeachment Hearings and Changing Your Mind

image of two heads with distinct collections of colored cubes

The news has been dominated recently by the impeachment hearings against Donald Trump, and as has been the case throughout Trump’s presidency, it seems that almost every day there’s a new piece of information that is presented by some outlets as a bombshell revelation, and by others as really no big deal. While the country at this point is mostly split on whether they think that Trump should be impeached, there is still a lot of evidence left to be uncovered in the ongoing hearings. Who knows, then, how Americans will feel once all the evidence has been presented.

Except that we perhaps already have a good idea of how Americans will feel even after all the evidence has been presented, since a recent poll reports that the majority of Americans say that they would not change their minds on their stance towards impeachment, regardless of what new evidence is uncovered. Most Americans, then, seem to be “locked in” to their views.

What should we make of this situation? Are Americans just being stubborn, or irrational? Can they help themselves?

There is one way in which these results are surprising, namely that the survey question asks whether one could imagine any evidence that would change one’s mind. Surely if, say, God came down and decreed that Trump should or should not be impeached then one should be willing to change one’s mind. So when people are considering the kind of evidence that could come out in the hearings, they are perhaps thinking that they will be presented with evidence of a similar kind to what they’ve seen already.

A lack of imagination aside, why would people say that they could not conceive of any evidence that could sway them? One explanation might be found with the way that people tend to interpret evidence presented by those who disagree with them. Let’s say, for example, that I am already very strongly committed to the belief that Trump ought to be impeached. Couldn’t those who are testifying in his defense present some evidence that would convince me otherwise? Perhaps not: if I think that Trump and those who defend him are untrustworthy and unscrupulous then I will interpret whatever they have to say as something that is meant to mislead me. So it really doesn’t matter what kind of evidence comes out, since short of divine intervention all of the evidence that comes out will be such that it supports my belief. And of course my opposition will think in the same way. So no wonder so many of us can’t imagine being swayed.

While this picture is something of an oversimplification, there’s reason to think that people do generally interpret evidence in this way. Writing at Politico, psychologist Peter Coleman describes what he refers to as “selective perception”:

Essentially, the stronger your views are on an issue like Trump’s impeachment, the more likely you are to attend more carefully to information that supports your views and to ignore or disregard information that contradicts them. Consuming more belief-consistent information will, in turn, increase your original support or disapproval for impeachment, which just fortifies your attitudes.

While Coleman recognizes that those who are most steadfast in their views are unlikely to change their minds over the course of the impeachment hearings, there is perhaps still hope for those who are not so locked-in. He describes a “threshold effect”, where people can change their minds suddenly, sometimes even coming to hold a belief that is equally strong but on the opposite side of an issue, once an amount of evidence they possess passes a certain threshold. What could happen, then, is that over the course of the impeachment procedures people may continue to hold their views until the accumulated evidence simply becomes too overwhelming, and they suddenly change their minds.

Whether this is something that will happen given the current state of affairs remains to be seen. What is still odd, though, is that while the kinds of psychological effects that Coleman discusses are ones that describe how we form our beliefs, we certainly don’t think that this is how we should form our beliefs. If these are processes that work in the background, ones that we are subject to but don’t have much control over, then it would be understandable and perhaps (in certain circumstances) even forgivable that we should generally be stubborn when it comes to our political beliefs. But the poll is not simply asking what one’s beliefs are, but what one could even conceivably see oneself believing. Even if it is difficult for us to change our minds about issues that we have such strong views about, surely we should at least aspire to be the kind of people who could conceive of being wrong.

One of the questions that many have asked in response to the poll results is whether the hearings will accomplish anything, given that people seem to have made up their minds already. Coleman’s cautious optimism perhaps gives us reason to think that minds could, in fact, be swayed. At the same time it is worth remembering that being open-minded does not mean that you are necessarily wrong, or that you will not be vindicated as having been right all along. At the end of the day, then, it is difficult not to be pessimistic about the possibility of progress in such a highly polarized climate.

Harvey Weinstein and Addressing Hollywood’s Unacceptable Reality

A photo of the Hollywood sign at sunset.

On October 5, The New York Times released a report detailing various instances of sexual assault perpetrated by Hollywood director and executive, Harvey Weinstein, on many of his female colleagues. The allegations span over a period of 30 years, as Weinstein’s power in the film industry protected him from consequences. “Movies were his private leverage,” the report reads, as Weinstein often offered promotions and bonuses to his female colleagues in exchange for sexual acts, and silenced those who spoke out with payments that ranged between $80,000 and $150,000.

Continue reading “Harvey Weinstein and Addressing Hollywood’s Unacceptable Reality”

Moral Obligations and Tinfoil Hats: The Ethics of Conspiracy

On, March 4th, 2017, Donald Trump claimed, without evidence, that Barack Obama wiretapped the phones at Trump Tower during the presidential election.  This is not the first baseless claim that Trump has made about the former president.  As the American population is well aware, Trump was one of the most vocal participants in the birther movement.  Even after Obama made his birth certificate public, proving that he was born in Hawaii in 1961, Trump said, in an interview with ABC News, ”Was it a birth certificate?  You tell me.  Some people say that was not his birth certificate.  Maybe it was, maybe it wasn’t.  I’m saying I don’t know.  Nobody knows.”

Continue reading “Moral Obligations and Tinfoil Hats: The Ethics of Conspiracy”