“We rely so much on Google these days”, says Dr Frank Hopfgartner, Senior Lecturer at the Information School and Investigator on the ‘Do You See What I See’ project. “Google has a search engine market share of over 90% worldwide.”
Undertaken by Dr Hopfgartner with several fellow members of the Cyprus Center for Algorithmic Transparency (CyCAT) - which was profiled in the research magazine Inform II in 2019 (page 29) - this project aimed to discover the differences in the search results that Google provides to users in different parts of the world. Google states that their mission is to “organize the world’s information and make it universally accessible and useful”. This project asked: is that true? Does everyone everywhere have equal access to the same information? And if not, what impact might that have?
“CyCAT for me was very interesting because algorithmic transparency and bias is a very timely topic and one which is receiving a lot of attention”, says Dr Hopfgartner of his own involvement with the Center. “My prior work was based on user profiling and personalisation; I was looking at the methods and now CyCAT is looking at the societal consequences and what we can do about them.”
The COVID-19 pandemic, whilst challenging for all, did create a fortuitous environment for this kind of research. “We didn’t initially think about COVID as a subject for this study”, explains Dr Hopfgartner, “but then we realised that it was the one thing that was on everyones’ minds, which gave us a unique opportunity; it doesn’t happen too often that there’s a single topic that the whole world talks about, apart from maybe the World Cup or the Olympics.”
The project team wanted to compare what Google shows to users of their search engine when they searched for topics related to COVID-19 from different geographical locations, as well as from the same location but using different languages.
"It doesn’t happen too often that there’s a single topic that the whole world talks about, apart from maybe the World Cup or the Olympics.”
The first issue with gaining any insight from this kind of research is that without working directly with Google, it’s impossible to know how their algorithms work. For this reason, the search engine had to be treated as a “black box”; a system whose internal workings are hidden. The team could put search queries in and analyse the output, drawing conclusions from there.
The second issue was knowing what people actually search for, without access to Google’s log files. The solution to this lay in crowdsourcing. The team used a crowdsourcing platform to assign two tasks to 400 crowd workers - 100 each from the UK, Italy, Spain and Germany. As one of the crowd workers, your first task was to imagine you had to create a photo diary of what happened during the pandemic; a history book for future generations. Your second task was to create a similar photo diary, this time of what habits were developed during the pandemic. To complete both of these tasks, the crowd workers used Google’s image search function, with the researchers then collecting the search terms that were used.
The search terms were split into five categories which emerged as the most common themes: “stay at home”, “personal protection”, “healthcare”, “pandemic general” and “society impact”. The researchers then put these terms into the Google image search themselves and analysed the images that came back by running them through CLARIFAI, an AI tool designed to create tags based on what it ‘sees’ in images (for example, given an image of a beach at a holiday resort, CLARIFAI would return tags like “sand”, “water”, “people”, “outside” etc). Alongside these tags, the team also looked at the URL of each image that was returned in the search and ascertained where the servers that hosted these sites were based - essentially, from which country the images were being shown - to see how many were local to the user and how many were foreign.
“The results from different countries varied significantly”, Dr Hopfgartner says. “For example the results from Spain and Italy were closer than the results from Spain and the UK were.”
“Given that there are these differences in how search results are shown in different countries, we concluded that this might actually influence how we see the world”
Searches conducted in the UK showed almost no results from Spanish, Italian or German servers, but searches conducted in all three other countries did show a fair amount of results from the UK.
People from different countries were clearly seeing different results when searching for the same thing, but there was some overlap. Searching location-specific terms like “lockdown protest London” or “thank you NHS” showed upwards of 90% of the same results wherever you searched from. However, searching more general terms like “COVID 2020”, “COVID social” or “COVID lockdown” only showed 2% overlap between countries.
Even where there was overlap, there were some surprises. Searching “how to get taste back” gave an 87% overlap between locations, despite there being far from a universal answer to this question in medical science.
The team also looked at how much overlap there was in results split into the broad categories of search terms, as well as how many local vs foreign results were returned in each country. Again, there were wide variations in both cases.
“Given that there are these differences in how search results are shown in different countries, we concluded that this might actually influence how we see the world”, says Dr Hopfgartner. “For example, if a UK Google user is seeing a vast majority of results that are from the UK, they have no idea what is going on in, say, Italy.”
When we rely on one search engine - or one company - to collate our information for us, it’s easy to assume that they are doing so completely blindly and objectively, but that may not always be the case.
“If we look back on this time in the future we may realise that we had no idea what was going on across the world because we were using Google as a filtering lens that narrows our viewpoint”, says Dr Hopfgartner.
There are also concerns about the creation or exacerbation of a so-called ‘digital divide’, where people who regularly use search engines may develop a certain view of the world, whilst those who don’t (or can’t) use them develop a totally different one.
“We need to remember that there may be more information out there that we don’t get if we rely on just one such service.”
One of the main focuses of the CyCAT project as a whole is aimed at combating some of these issues. The team has developed a system that interfaces with search engines and highlights potential biases to users. For example, the tool might tell you that most of your search results are coming from the UK and give you the option to filter your results to show you only sites hosted in Spain. This greater transparency and control over your search results is what CyCAT is aiming at, and work on perfecting this tool is ongoing.
With the ‘Do You See What I See’ project having wrapped up in December of 2021, a research paper is now published showing the team’s findings. As scrutiny on fairness and transparency in our tech increases, work like this is becoming more and more important, with consequences for the everyday lives of many of us who use services like Google’s search engine on a daily basis.
“It can be an eye-opener to realise that when we use Google to gather information, they are a gatekeeper”, concludes Dr Hopfgartner. “We need to remember that there may be more information out there that we don’t get if we rely on just one such service.”
- Richard Spencer
Comments