Technology

Search Engines and Data Voids

By Kenneth Boyd

27 Dec 2019

photograph of woman at computer, Christmas tree in background

“Stressed young blonde woman at computer desk at home” by VitalikRadko (via depositphotos)

If you’re like me, going home over the holidays means dealing with a host of computer problems from well-meaning but not very tech-savvy family members. While I’m no expert myself, it is nevertheless jarring to see the family computer desktop covered in icons for long-abandoned programs, browser tabs that read “Hotmail” and “how do I log into my Hotmail” side-by-side, and the use of default programs like Edge (or, if the computer is ancient enough, Internet Explorer) and search engines like Bing.

And while it’s perhaps a bit of a pain to have to fix the same computer problems every year, and it’s annoying to use programs that you’re not used to, there might be more substantial problems afoot. This is because according to a recent study from Stanford’s Internet Observatory, Bing search results “contain an alarming amount of disinformation.” That default search engine that your parents never bothered changing, then, could actually be doing some harm.

While no search engine is perfect, the study suggests that, at least in comparison to Google, Bing lists known disinformation sites in its top results much more frequently (including searches for important issues like vaccine safety, where a search for “vaccines autism” returns “six anti-vax sites in its top 50 results”). It also presents results from known Russian propaganda sites much more frequently than Google, places student-essay writing sites in its top 50 results for some search terms, and is much more likely to “dredge up gratuitous white-supremacist content in response to unrelated queries.” In general, then, while Bing will not necessarily present one only with disinformation – the site will still return results for trustworthy sites most of the time – it seems worthwhile to be extra vigilant when using the search engine.

But even if one commits to simply avoiding Bing (at least for the kinds of searches that are most likely to be connected to disinformation sites), problems can arise when Edge is made a default browser (which uses Bing as its default search engine), and when those who are not terribly tech-savvy don’t know how to use a different browser, or else aren’t aware of the alternatives. After all, there is no particular reason to think that results from different search engines should be different, and given that Microsoft is a household name, one might not be inclined to question the kinds of results their search engine provides.

How can we combat these problems? Certainly a good amount of responsibility falls on Microsoft themselves for making more of an effort to keep disinformation sites out of their search results. And while we might not want to say that one should never use Bing (Google knows enough about me as it is), there is perhaps some general advice that we could give in order to try to make sure that we are getting as little disinformation as possible when searching.

For example, the Internet Observatory report posits that one of the reasons why there is so much more disinformation in search results from Bing as opposed to Google is due to how the engines deal with “data voids.” The idea is the following: for some search terms, you’re going to get tons of results because there’s tons of information out there, and it’s a lot easier to weed out possible disinformation sites about these kinds of results because there are so many more well-established and trusted sites that already exist. But there are also lots of search terms that have very few results, possibly because they are about idiosyncratic topics, or because the search terms are unusual, or just because the thing you’re looking for is brand new. It’s when there are these relative voids of data about a term that makes results ripe for manipulation by sites looking to spread misinformation.

For example, Michael Golebiewski and danah boyd write that there are five major types of data voids that can be most easily manipulated: breaking news, strategic new terms (e.g. when the term “crisis actor” was introduced by Sandy Hook conspiracy theorists), outdated terms, fragmented concepts (e.g. when the same event is referred to by different terms, for example “undocumented” and “illegal aliens”), and problematic queries (e.g. when instead of searching for information about the “Holocaust” someone searches for “did the Holocaust happen?”). Since there tends to be comparatively little information about these topics online, those looking to spread disinformation can create sites that exploit these data voids.

Golebiewski and boyd provide an example in which the term “Sutherland Springs, Texas” was a much more popular search than it had ever previously been in response to news reports of an active shooting in November of 2017. However, since there was so little information online about Sutherland Springs prior to the event, it was more difficult for search engines to determine out of the new sites and posts which should be sent to the top of the search results and which should be sent to the bottom of the pile. This is the kind of data void that can be exploited by those looking to spread disinformation, especially when it comes to search engines like Bing that seem to struggle with distinguishing trustworthy sites from the untrustworthy.

We’ve seen that there is clearly some responsibility on Bing itself to help with the flow of disinformation, but we perhaps need to be more vigilant when it comes to trusting sites with regards to the kinds of terms Golebiewski and boyd describe. And, of course, we could try our best to convince those who are less computer-literate in our lives to change some of their browsing habits.

Search Engines and Data Voids

Should We Expect Fairness from AI? The Case of VAR

Hearing Voices: Social Media and Echo Chambers

Should Algorithmic Power Come With Responsibility?

Should You Thank Your AI?

Search Engines and Data Voids

Should We Expect Fairness from AI? The Case of VAR

Hearing Voices: Social Media and Echo Chambers

Should Algorithmic Power Come With Responsibility?

Should You Thank Your AI?

Receive a weekly digest of our best content!