Facebook is providing academic researchers with a massive data haul revealing how political ads during last year’s U.S. elections were targeted to people on the platform. However, researchers have been held up by an arduous process to access the data and worry the information is insufficient to provide meaningful analysis of how Facebook’s ad platform was used —and potentially misused — leading up to the election. “You need to see what’s going on in order to know how to regulate something,” said Orestis Papakyriakopoulos, a Ph.D. at Princeton University’s Center for Information Technology Policy, an interdisciplinary center for research to help inform government policy addressing how digital technologies affect society. Papakyriakopoulos said he has applied to access the new Facebook 2020 political ad data, made available in February, but has yet to receive it.
Facebook has offered the data detailing 1.65 million ads related to political, electoral and social issues served on the site between August 3, 2020, and Nov. 3, the day of the U.S. presidential election, to approved academic researchers through its Facebook Open Research and Transparency platform, or FORT.
It includes a data set showing the types of targeting methods advertisers used, such as location, demographics, interests, connections to other people or through custom targeting to specific lists of people supplied to Facebook by the advertiser — a targeting tactic employed heavily by political advertisers that want to aim ads at specific voters and lists of possible donors.
A post on the Facebook site about the data notes, “Wherever applicable, we indicate whether the targeting options were selected for inclusion or exclusion targeting.”
But details of the targeting are limited, as are other aspects of the data. For example, Facebook has instituted a minimum threshold on the number of impressions, or times a single ad was served, to be included in the data set. The limit is meant to protect people’s privacy but can have the effect of concealing political ad activity on the platform.
Much like advertisers who have long lamented a lack of transparency for measuring ad exposure and brand safety on social platforms including Facebook, academic researchers say the restrictions Facebook places on how they can tap into data and its level of intricacy is par for the course with social media platform data access in general.
In his general experience, Papakyriakopoulos said, access to data from social media platforms for academic research follows a pattern of “conscious obscurity” by design. This frustrates researchers like him who want to understand the nuances of ad campaigns served to people on Facebook, a firm whose business is built on delivering ads based on refined targeting options and makes a point of promoting what it considers to be benefits of personalized advertising.
As governments in the U.S and overseas consider new laws or reforms to old ones to address the negative impacts of social media on society, privacy, children’s mental health and elections, academic researchers want to provide analysis based on hard data to help inform those policies. “This is just a first step at providing data to study Facebook’s impact but we are constantly evolving our products to meet researchers’ needs while putting privacy first,” said a Facebook spokesperson.‘Not planning on applying‘The limits on Facebook’s data details are enough to dissuade Laura Edelson, a Ph.D. candidate in computer science at NYU’s Tandon School of Engineering studying online political communication, from even applying to access it. “I don’t have access to FORT and I’m not planning on applying for access right now,” she told Digiday in an email. The primary reason for her disinterest is a threshold Facebook set for whether a particular ad is included in its ad targeting data set, excluding ads with fewer than 100 impressions. The company said the restriction “is one of several steps we have taken to protect users’ privacy.”
The constraint may seem insignificant, but Edelson said it is a big deal. Political campaigns could use Facebook’s minute ad targeting parameters to reach very small groups of people in precise geographic areas to fuel the spread of false information about an issue or candidate or suppress voting among specific groups.
Leading up to the election in September 2020, anti-disinformation organization Avaaz found ads from political groups on the right and left on Facebook featuring misleading information. Because of Facebook’s impressions threshold in its new data set, researchers could be prevented from evaluating findings like those or determining whether there were other examples of ad-enabled disinformation.
“Any conclusions drawn would be inherently skewed, and it would be difficult to say anything about ad targeting as a whole,” Edelson told Digiday.
Edelson has run into another roadblock when trying to analyze Facebook’s political ad data. Her team at the NYU Ad Observatory had built a tool to facilitate analysis of Facebook’s political ad information available through the company’s ad library API, a separate data set from what is being made available in the new ad targeting data set.
Last October as Election Day loomed, Facebook demanded that they shut down the project and delete the data associated with it because it allegedly violated the company’s terms of service regarding data scraping. Facebook confirmed that they sent the letter to NYU, but did not provide Digiday with further detail on the status of enforcing its demand regarding the project. After the election, the NYU team working on the project released its own 2020 election data gleaned through its Ad Observatory tool, but the data is no longer available. In discussing her work involving Facebook and its data, Edelson told Protocol the company deserved credit for making “a lot more data available than anyone else.”
A clean room or a walled data garden?
There are other constraints associated with this new round of Facebook data. For example, while an ad might have been delivered to someone within five miles of an exact address or latitude and longitude, Facebook would widen how it reflects those targeting parameters in the data provided to academic researchers by revealing only that it was aimed within five miles of the city that more specific location is in. In other words, a precise target like a numeric latitude-longitude in San Jose, or a place like “Acme Park (+1 mile)” would be reflected in the data merely as “San Jose (+1 mile).” By widening those location parameters shown in the data, details on the precise ways political advertisers aimed their ads could be obscured, limiting the value of the insights the analysis might provide.
And researchers complain of an arduous process requiring them to jump through bureaucratic hoops to get this new 2020 election ad data from Facebook that aren’t in place when it comes to researchers accessing data via an API or data archive — or through unsanctioned data scraping tools. They’re required to go through a verification process with school administration and sign a data agreement with the company. “It’s a time-consuming process,” said Papakyriakopoulos, noting that he doesn’t expect to get the data anytime in the next month.
Another restriction: Researchers cannot download the data. Instead, they must analyze it within the confines of a web app constructed by Facebook. That, said Papakyriakopoulos, limits the range of analysis he and other researchers can conduct using the data. By downloading the data set, researchers could apply different types of analysis and coding Facebook might not make available in its web interface. After this article was published, Facebook confirmed to Digiday that it does not allow researchers to download the data, but it does let them upload other data sets into its system to further analyze and expand on what they can learn from the information. The company also said researchers can employ any code written in R or Python, popular data science programming languages.
There are obvious concerns if Facebook were to let the data flow places beyond its control, though. Not only could it be shared with people who are not approved to use it, it could potentially be tampered with in a way that reveals information about people’s identities or political campaign secrets Facebook wants to ensure stay hidden.
Facebook argues the amount of information associated with the FORT data and other information it has provided to academic researchers in the past — terabytes and terabytes of the stuff — is far too unwieldy for most to ingest and use on their own local machines. Plus, the company emphasizes that because the research is conducted inside Facebook’s environment, the company provides compute power, the fuel necessary for the analytics engine to operate, for free.
“The Facebook Open Research and Transparency (FORT) platform allows academic researchers to access data in a ‘virtual clean room,’ built with validated privacy and security protections,” said the Facebook spokesperson. Companies including Facebook and Google have established so-called clean room environments to safeguard privacy and security for data sharing between advertisers and publishers, too.
The data limitations are similar to ones placed on another data set Papakyriakopoulos worked with as part of Facebook’s Social Science One initiative, a collaboration with academic researchers and institutions launched in 2018. That project supplied data about URLs shared by people on Facebook, along with data showing the number of people who liked the posts and their demographics, but it was aggregated at a country level. He and a team of researchers at The Technical University of Munich aimed to investigate misinformation spread during Germany’s 2017 elections using the data.
“My team left the project,” he said, noting that the data Facebook provided his research group was not adequate for answering the questions the group put forth in its project proposal. “The project was based on false expectations,” he said. Facebook said its FORT data access effort is an expansion of the Social Science One initiative.
Bad memories and privacy blundersPeople in the academic research community trying to understand the impact of social media on elections and democratic institutions believe the public has a right to data that lives on platforms like Facebook. But the companies themselves have any number of concerns when it comes to handing over the keys to their data, from worries about protecting intellectual property to exposing information about the advertisers they rely on to support their businesses. Facebook rarely points to those concerns when it comes to explaining the reasons behind the blockades to data access it erects, however. Instead, the company focuses its justification for obstacles to data access on privacy. “We are committed to providing more transparency into the societal impact of Facebook’s products while protecting people’s privacy,” said the company spokesperson.
The elephant in the room though is Cambridge Analytica.
The 2016 political ad targeting scandal, which at its core involved the use of data by Cambridge Analytica, a political consultancy that originally derived data used for psychographic ad targeting from Facebook data scraped for academic research, only reinforced concerns that emerged from previous data blunders involving supposedly-anonymized Netflix and AOL data that researchers reidentified, said Eric Goldman, a professor at Santa Clara University School of Law who focuses on tech and internet law. “The internet community got the message that trying to be proactive in helping researchers was only going to backfire,” he said. “If they’re going to get tarred as privacy-invasive, no amount of payoff is worth it.”
The fallout from the Cambridge Analytica debacle, which led the Federal Trade Commission to slap a record-high fine of $5 billion on Facebook over data privacy violations, has “had a huge effect on my work,” said Rebecca Britt, an associate professor at The University of Alabama’s College of Communication and Information Sciences, who studies how people use social media to communicate about health issues. Today, in part because of Facebook’s rules prohibiting data scraping and a lack of a public Facebook API for the type of data she needs, she said, “It is very difficult to work with Facebook and Instagram data,” which she said also affects her fellow researchers in their efforts to combat misinformation.
Ultimately, said Britt of Facebook, “They’re not really a researchers-friendly space.”