Pretend you are driving your car on an interstate highway. Now envision yourself getting the car up to a speed of 85mph, then closing your eyes (actually close your eyes now) and keep them closed for a whole minute. You will want to open your eyes after a few seconds because of the fear of not knowing what lies ahead. Is there a curve? A semi-trailer in front of you? Or, is the lane ahead blocked for construction? This thought experiment illustrates in part that a major ingredient of fear is uncertainty.
Humans want to know the future, even if that future will happen within the next few minutes.
Evolution has provided us with the need to predict so that we have a better chance of survival. The more we know, the better chance we have to anticipate, and thus, to survive. It’s also true when we want to know who is ahead in political polls prior to an election. Fears about the economy, possible international conflict, a raise in taxes, and the assurance of the continuation of Social Security and health care all play a role in deciding whom we will support on election day. There may be a fear that your values and the party that best represents them will not be elected to lead the nation, and this serves as a motivation to vote your ticket.
The 2020 presidential election is over, but the dust has only just settled. State and national political polling for the presidential election began in earnest as soon as the Democratic nominee was apparent, and continued right up to the night before election day. At the same time many voters were asking the question, “Can we trust the polls this time?” Undoubtedly many were recalling the debacle of the 2016 election polling that predicted a relatively easy win for Hillary Clinton, but when they awoke the next morning, Donald Trump had been elected president. The Pew Research Center suggested that the question should rather be, “Which poll should we trust?” However, I suggest that another question should be considered as well: “Is it ethical to have public presidential election polling at all?”
Many ethical questions arise when we consider this type of public polling: (1) Does the polling sample reflect an accurate picture of the electorate? If it does, the veracity, or truthfulness, of the results can be more trustworthy. If it doesn’t, then the polling is skewed and results unreliable. (2) If the polling is skewed, how will voter behavior be affected? Could citizens be casting their votes based on false information? Research tells us that if a voter thinks their candidate will win, the less likely that person will vote. How does this affect one’s autonomous choices? (3) If the polling predictions are not accurate, what are the psychological effects on the electorate and candidates that may lead to negative outcomes when the election is over? The benefits and costs of having polling information available to the electorate must be considered in such high-stakes activities like voting for the president of the United States. It’s imperative that any published polling be scientifically based.
The gold standard of scientific research sampling has long been established as random sampling where everyone has an equal chance of being chosen to participate. When all have an equal probability of being chosen, the sample will generally reflect the population from which it was drawn. The problem then becomes one of reaching more people, so as to even the odds that the sample is representative of the population. Technological advances have allowed us to do this, but in so doing have changed the face of public polling in ways that may compromise its outcomes.
Today, the internet serves as the tool of choice for pollsters. However, not everyone has internet access and therefore some selection bias in the sample will exist when this type of polling is used. Furthermore, the average American has 1.75 email accounts which only increases the possibility that the same person may receive more than one request for information, thereby decreasing the chance for a larger sample, so the chance for mismeasurement increases. It is understood that there is always sampling bias, and pollsters do attempt to make it as small as possible, keeping it at a manageable 2.5-3.5%. This bias is called margin of error (MOE). For example, if a poll has an MOE of 3%, and candidate A is at 48% in the poll while candidate B sits at 46%, candidate A does not have a two- point lead as some media personnel report; it’s a dead heat statistically. But, if the MOE is not pointed out at the time of reporting, the electorate receives false information. It’s even more important to look at this effect in swing states than in the national picture due to the fact that the Electoral College, not the popular vote, actually determines who the next president will be.
To alleviate the selection bias problem nationally and in swing states, many polling companies are using opt-in, or non-probability panels from which to collect data. This method uses ads to attract persons to participate in the polling. Since these ads may only appeal to, or even be seen by, certain demographics, the probability of establishing a representative sample diminishes. Some panels formed by answering these ads are there solely because of a modest monetary award for participation. One way that pollsters try to solve these potential drawbacks is to establish panels of up to 4,000 persons who have been chosen based on stratification criteria, so that the sample looks as much like the population as possible given the restraints mentioned here. For example, if the population is 76% Caucasian and 14% African-American, approximately 76% of the non-probability panel would be Caucasian and approximately 14 % would be African-American.
Similar stratification methods would also be used for other demographics; however, how many demographics are accounted for when these decisions are made? The Gallup and New York Times/Siena College polls account for 8 and 10 demographics (weights) respectively, while the Pew Research Center accounts for 12 such demographics. Statistically, the more weights that are applied, the more the sample accurately represents the population from which it was drawn. The panels formed in this way stand for a certain amount of time, and are repeatedly polled during the political campaign.
Apart from these worries, there are other potential obstacles to consider. Take positive herding, for example — a term used by social psychologists to explain the phenomenon that positive ratings of an idea or person, generate more positive ratings. So, if candidate A continues to amass perceived higher polling numbers, the chance that those being polled later will align themselves with candidate A increases. Voters are even more likely to exhibit this behavior if we have not made a prior commitment to any response. So this affects the independent voter more than other voters, but that is the voter both parties must typically have in order to win an election. And this is compounded by the fact that the repeated public polling during a presidential campaign may increase this social phenomenon, and skew the polling results that could lead to unwary independent voters deciding for whom to vote.
Are the respondents on these panels answering the same questions as respondents on other panels? Questions posed to participants from different researchers are not standardized; that is, not every participant in every poll is answering the same questions. When the data are presented publicly, and polling data are compared, we may be comparing apples with oranges. If candidate A polls at 48% in poll A, and 43% in Poll B, we must consider on what issue candidate A is being polled? If candidate A is being polled on likeability in both polls, it must be asked: likeability based on what, candidate A’s stand on the economy or on immigration, or …? The information becomes amorphous.
How can the potential voter know what the data are measuring without understanding the make-up of the sample, the questions asked, and which method of polling was used to obtain that data? To get the complete picture is an ominous task even for a statistician, and certainly cannot be expected of most of the population to discover on their own. We rely on the media, and on the polling companies themselves to provide that information. While polling companies do publish this information on their websites, most voters do not have the time (or the inclination) to peruse the data, even when they know that such websites exist. Even if they make that effort, would most be able to understand the information shared?
The 2020 election has given us a real look at some psychological effects of polling on the American population. Former Vice-President Biden was reportedly set for an 8-point margin of victory (nationally) according to the New York Times/Sienna College final poll the night before the election. As late as 2 a.m. on November 4th, Biden held a slight lead nationally, but it was a dead heat in the swing states, and those states were leaning toward Trump at the time. In the final analysis Biden won those states after the mail-in ballots were counted. National polling did not publish survey results of mail-in voters vs election-day voters, yet the different modes of voting predicted the outcome. Psychologically, Trump supporters were primed to believe that ballots were “found and dumped” the day following the election.
It took 11 days for the results to be “finalized,” while the nation was in turmoil. Trump’s campaign used this time to start questioning the results; his supporters believed that the election had been stolen; after all, Biden won by only 3.4 points nationally after mail-in votes were tallied further adding fuel to the fire for Trump supporters. Recounts in swing states and counties were called for in the swing states based on this information, since the margin of victory ranged from less than one percentage point to just over one percentage point, but remained constant after the count was completed.
So what does all this say about the ethics of public polling? As can be seen, the numbers that get reported are based on a number of assumptions, and any model is only as good as the assumptions on which it is based. Are the policies of the candidate (the economy, immigration, etc.) measured the same in each poll? Was the positive herding phenomenon a factor in responses? Were media personnel diligent in pointing out MOEs as they reported polling results? In all these cases, one’s voting autonomy can be affected because the data’s veracity is in question.
But the problem isn’t necessarily with our ability to read the data as much as with our choice to circulate that polling data in the first place. Uncertainty produces fear in humans; we often alleviate this fear through prediction, and polls provide that predictive factor. That information, however, provides only a perception of the reality.