[House Hearing, 117 Congress]
[From the U.S. Government Publishing Office]
THE DISINFORMATION BLACK BOX:
RESEARCHING SOCIAL MEDIA DATA
=======================================================================
HEARING
BEFORE THE
SUBCOMMITTEE ON INVESTIGATIONS
AND OVERSIGHT
OF THE
COMMITTEE ON SCIENCE, SPACE,
AND TECHNOLOGY
HOUSE OF REPRESENTATIVES
ONE HUNDRED SEVENTEENTH CONGRESS
FIRST SESSION
__________
SEPTEMBER 28, 2021
__________
Serial No. 117-31
__________
Printed for the use of the Committee on Science, Space, and Technology
[GRAPHIC NOT AVAILABLE IN TIFF FORMAT]
Available via the World Wide Web: http://science.house.gov
__________
U.S. GOVERNMENT PUBLISHING OFFICE
45-497PDF WASHINGTON : 2022
-----------------------------------------------------------------------------------
COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY
HON. EDDIE BERNICE JOHNSON, Texas, Chairwoman
ZOE LOFGREN, California FRANK LUCAS, Oklahoma,
SUZANNE BONAMICI, Oregon Ranking Member
AMI BERA, California MO BROOKS, Alabama
HALEY STEVENS, Michigan, BILL POSEY, Florida
Vice Chair RANDY WEBER, Texas
MIKIE SHERRILL, New Jersey BRIAN BABIN, Texas
JAMAAL BOWMAN, New York ANTHONY GONZALEZ, Ohio
MELANIE A. STANSBURY, New Mexico MICHAEL WALTZ, Florida
BRAD SHERMAN, California JAMES R. BAIRD, Indiana
ED PERLMUTTER, Colorado DANIEL WEBSTER, Florida
JERRY McNERNEY, California MIKE GARCIA, California
PAUL TONKO, New York STEPHANIE I. BICE, Oklahoma
BILL FOSTER, Illinois YOUNG KIM, California
DONALD NORCROSS, New Jersey RANDY FEENSTRA, Iowa
DON BEYER, Virginia JAKE LaTURNER, Kansas
CHARLIE CRIST, Florida CARLOS A. GIMENEZ, Florida
SEAN CASTEN, Illinois JAY OBERNOLTE, California
CONOR LAMB, Pennsylvania PETER MEIJER, Michigan
DEBORAH ROSS, North Carolina JAKE ELLZEY, TEXAS
GWEN MOORE, Wisconsin VACANCY
DAN KILDEE, Michigan
SUSAN WILD, Pennsylvania
LIZZIE FLETCHER, Texas
------
Subcommittee on Investigations and Oversight
HON. BILL FOSTER, Illinois, Chairman
ED PERLMUTTER, Colorado JAY OBERNOLTE, California,
AMI BERA, California Ranking Member
GWEN MOORE, Wisconsin VACANCY
SEAN CASTEN, Illinois VACANCY
C O N T E N T S
September 28, 2021
Page
Hearing Charter.................................................. 2
Opening Statements
Statement by Representative Bill Foster, Chairman, Subcommittee
on Investigations and Oversight, Committee on Science, Space,
and Technology, U.S. House of Representatives.................. 9
Written Statement............................................ 10
Statement by Representative Jay Obernolte, Ranking Member,
Subcommittee on Investigations and Oversight, Committee on
Science, Space, and Technology, U.S. House of Representatives.. 11
Written Statement............................................ 12
Statement by Representative Eddie Bernice Johnson, Chairwoman,
Committee on Science, Space, and Technology, U.S. House of
Representatives................................................ 14
Written Statement............................................ 14
Witnesses:
Dr. Alan Mislove, Professor and Interim Dean, Khoury College of
Computer Sciences, Northeastern University
Oral Statement............................................... 15
Written Statement............................................ 18
Ms. Laura Edelson, Ph.D. Candidate and Co-Director of
Cybersecurity for Democracy at New York University
Oral Statement............................................... 24
Written Statement............................................ 26
Dr. Kevin Leicht, Professor, University of Illinois Urbana-
Champaign Department of Sociology
Oral Statement............................................... 34
Written Statement............................................ 36
Discussion....................................................... 44
Appendix: Additional Material for the Record
Statement submitted by Representative Bill Foster, Chairman,
Subcommittee on Investigations and Oversight, Committee on
Science, Space, and Technology, U.S. House of Representatives
Imran Ahmed, Chief Executive Officer, Center for Countering
Digital Hate............................................... 64
Visuals submitted by Ms. Laura Edelson, Ph.D. Candidate and Co-
Director of Cybersecurity for Democracy at New York University. 73
Letter submitted by Accountable Tech, et al.
``Facebook's Stonewalling of Research into its Role in the
Capitol Insurrection''..................................... 80
THE DISINFORMATION BLACK BOX:
RESEARCHING SOCIAL MEDIA DATA
----------
TUESDAY, SEPTEMBER 28, 2021
House of Representatives,
Subcommittee on Investigations and Oversight,
Committee on Science, Space, and Technology,
Washington, D.C.
The Subcommittee met, pursuant to notice, at 10:02 a.m.,
via Zoom, Hon. Bill Foster [Chairman of the Subcommittee]
presiding.
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Foster. Well, the hearing will now come to order.
Without objection, the Chair is authorized to declare recess at
any time. And, before I deliver my opening remarks, I wanted to
note that today the Committee is meeting virtually. I want to
announce a couple of reminders to the Members about the conduct
of the hearing. First, Members should keep their video feed on
for as long as they are present in the hearing. Members are
responsible for their own microphones. Please also keep your
microphones muted, unless you're speaking. And finally, if
Members have documents that they wish to submit for the record,
please e-mail them to the Committee Clerk, whose e-mail address
was circulated prior to the hearing.
Well, good morning, and welcome to our Members and our
panelists. We've--I especially appreciate your willingness to
have the hearing rescheduled to a time when nothing is
happening in Washington, D.C. and Congress. But thank you all
for joining us for this hearing on researcher access to social
media data. For years experts have been raising the alarm about
how misinformation and disinformation spreads unabated on
social media platforms. Long before ``fake news'' was an
epithet aimed at influencing--anything conflicting with
someone's own worldview, it described falsehoods presented
maliciously as fact in order to influence opinions. The problem
of misinformation is not a new one, but social media has fanned
the flames, and it is now difficult to imagine political and
social discourse untouched by its influence.
The damage caused by misinformation reaches far beyond our
phone and computer screens. Lies on social media have spawned
riots and ethnic cleansing, and thousands of deaths around the
world, and lies about the 2020 election inspired thousands to
invade our Capitol on July--on January 6 in an attempt to
disrupt our Constitution, and stop the certification of valid
election results, resulting in five deaths. Lies about the
severity of COVID-19 prevented millions of Americans from
taking the disease seriously, resulting in needless infections
and needless deaths. Vaccine disinformation is discouraging
Americans from receiving safe and effective COVID-19 vaccines,
extending the pandemic, and allowing new variants to
proliferate.
For years we have seen the harmful effects of anti-vaccine
rhetoric, causing the re-emergence of diseases like measles
that had been eliminated by vaccines. In fact, in July the
Surgeon General declared that misinformation on social media
was a public health hazard. Much of this misinformation, in
fact, appears to be generated and amplified by our enemies, who
recognize the damage that it does to our country. It is
therefore imperative that the Science Committee address it as
we would any other threat to public health, by ensuring that we
have the best and brightest minds researching the problem so
that we can base future policy on the best available evidence.
Unfortunately, it's extremely difficult for researchers to
gain sufficient access to social media data. Companies do make
some information public, but it is largely through interfaces
that they control, meaning that researchers can only see what
the companies want them to see, and access can be cutoff at any
time. Today we will hear from our witnesses about the research
they are able to conduct in this environment. They will tell us
about the limitations of the existing tools, and what data they
believe can and should be made public so that we can have a
better understanding of how social media users interact with
misinformation, and how that impacts their behavior online and
offline. We will hear about how mis- and disinformation is
delivered to social media users through the black box of the
algorithm, drawing eyes to the sensationalist content that
inspires the most user engagement, regardless of truth.
We on the Science Committee understand that the very real
limitations to full data transparency by social media is a real
problem. Platforms will argue that some information should be
protected as trade secrets, much as the computerized financial
trading firms prize the opacity behind their sometimes abusive
trading algorithms. At the same time, social media users are
entitled to privacy, particularly of personally identifiable
information. However, these concerns cannot be broad excuses to
shield social media companies from a full outside accounting of
how their platforms may be endangering public health and
safety. We simply cannot leave social media unstudied. It is as
influential a force on the social fabric of the 21st century as
any other.
But as it stands, advertisers on these platforms often
enjoy more access to data than academic researchers looking to
access the impact of promoted posts. I believe that this--in
this hearing we can have a constructive launching point to
explore how the Science Committee can contribute to this
conversation. We must strike a balance between protecting user
privacy and confidential business information, while also
acknowledging that objective, independent research is necessary
to understand how these platforms influence modern society.
We've solved this problem for electronic trading and financial
services. We are solving this problem for academic access to
electronic health records, and we must solve this problem here.
I look forward to hearing from our panelists about how we
can support their important work of shining a light onto the
disinformation black box that is poisoning our public
discourse.
[The prepared statement of Chairman Foster follows:]
Good morning, and welcome to our members and our panelists.
Thank you for joining us for this hearing on researcher access
to social media data. For years, experts have been raising the
alarm about how misinformation and disinformation spreads
unabated on social media platforms. Before ``fake news'' was an
epithet, aimed at anything conflicting with someone's
worldview, it described falsehoods presented maliciously as
fact in order to influence opinions. The problem of
misinformation is not a new one, but social media has fanned
the flames, and it is now difficult to imagine political and
social discourse untouched by its influence.
The damage caused by misinformation reaches far beyond our
phone and computer screens. Lies about the 2020 election
inspired thousands to invade the Capitol on January 6 in an
attempt to stop the certification of the election, resulting in
five deaths. Lies about the severity of COVID-19 prevented
millions of Americans from taking the disease seriously,
resulting in needless infections and deaths. Vaccine
disinformation is discouraging Americans from receiving safe
and effective COVID-19 vaccines, extending the pandemic and
allowing new variants to proliferate. For years we have seen
the harmful effects of anti-vaccine rhetoric, causing the re-
emergence of diseases like measles that had been eliminated by
vaccines.
In July, the Surgeon General declared that misinformation
on social media is a public health hazard. It is therefore
imperative that the Science Committee address it as we would
any other threat to public health--by ensuring that we have the
brightest minds researching the problem so we can base future
policy on the best available science.
Unfortunately, it is extremely difficult for researchers to
gain sufficient access to social media data. Companies do make
some information public, but it is largely through interfaces
they control, meaning that researchers can only see what
companies want them to. And access can be cut off at any time.
Today, we will hear from our witnesses about the research they
are able to conduct in this environment. They will tell us
about the limitations of the existing tools, and what data they
believe can and should be made public so we can have a better
understanding of how social media users interact with
misinformation and how that impacts their behavior on- and
offline. We will hear about how mis- and disinformation is
delivered to social media users through the ``black box'' of
the algorithm, drawing eyes to sensationalist content that
inspires user engagement regardless of the truth.
We on the Science Committee understand the very real
limitations to full data transparency by social media
companies. Platforms will argue that some information should be
protected as trade secrets. In addition, social media users are
entitled to privacy, particularly of personally identifiable
information. However, these concerns cannot be broad excuses to
shield social media companies from a full outside accounting of
how their platforms may be endangering public health and
safety. We cannot simply leave social media unstudied. It is as
influential a force on the social fabric of the 21st century as
any other. But as it stands, advertisers on these platforms
often enjoy more access to data than academic researchers
looking to assess the impact of promoted posts. I believe that
this hearing can be a constructive launching point to explore
how the Science Committee can contribute to this conversation.
We must strike a balance between protecting user privacy and
confidential business information, while also acknowledging
that objective, independent research is necessary to understand
how these platforms influence modern society.
I look forward to hearing from our panelists about how we
can support their important work shining a light into the
disinformation black box poisoning our discourse.
I now yield to Ranking Member Obernolte for his opening
statement.
Chairman Foster. And I now yield it to Ranking Member
Obernolte for his opening statement.
Mr. Obernolte. Well thank you very much, Chairman Foster,
and thank you to our witnesses for being here at this very
important hearing, and what will prove, I'm sure, to be a
fascinating hearing on combatting the spread of misinformation
on social media.
We live in an amazing world, a world where we are presented
with a selected, curated newsfeed that only includes the things
that we're personally interested in. And that's informed by
algorithms that companies like social media have come up with
to foster user engagement, and to maximize our interest in the
information that's being provided. But, unfortunately, as the
Chairman pointed out, that's also catalyzed the spread of
misinformation. Combatting that spread is something that has
been a societal problem for hundreds of years now, but it's
exacerbated by the fact that information now spreads so easily,
and that that information is personalized to each one of us. So
the information that I see in the morning is not the same thing
that--the information that other people see in the morning,
and, unfortunately, that can hide and mask the spread of this
misinformation.
And if you want a perfect example of how that can be
problematic, you can look at a hearing that this Subcommittee
held a couple of weeks ago on the origins of COVID. And one of
the--for me, the very surprising outcomes of that Committee
hearing was the fact that, although we had competing theories
about the spread of COVID, that any theory other than natural
zoonotic origin had been rejected early in the crisis as
misinformation, and had been labeled a conspiracy theory, and
that the social media companies had actively suppressed the
spread of that information. And now, with the benefit of
hindsight, and the discovery of new data, we've discovered that
competing theories are not only possible, but indeed plausible,
and in fact, you know, might end up being the successful
theory.
So no one can deny that this effort to combat the spread of
misinformation has severely hampered our ability to identify
the origins of COVID. And it just illustrates how these two
ideas are in tension, right? On one hand, we want to be a
society that honors the exercise of free speech, but that is
fundamentally intentioned with the idea that we also have an
obligation to stop the spread of misinformation. So I'm hopeful
that some of our panelists today will be able to talk some more
about where that moral boundary is.
And figuring out, as a society, how to balance those two
competing interests, I think, is critical. Because on the one
hand, as recent events have shown, we all have a vested
interest in trying to figure out how to stop the spread of
misinformation. But on the other hand, history has shown us
repeatedly that if we allow censorship to take the place of
misinformation, that will take us down a very dark path as a
society. So we have to find this middle ground, this balance in
between the two, and I'm confident that we can.
And I'm also confident that the social media companies,
like Facebook, and Instagram, and Twitter, are going to be
critical to helping us solve this problem, because they have
the expert knowledge in the way their algorithms work, they
have the expert knowledge in the way that this information
spreads, and what catalyzes peoples' interest in news about the
world around them. And so I think, definitely, that they're
going to need a seat at the table. We're going to need to tap
all of our available sources of information, which certainly
includes them, but also includes the independent researchers
we're going to hear from today, and I'm very thankful that
they're out there, gathering this information, to give us a
holistic view of this problem. So I am looking very much
forward to the hearing, and looking forward to asking questions
afterwards. Thank you, Mr. Chairman. I yield back.
[The prepared statement of Mr. Obernolte follows:]
Good morning. Thank you, Chairman Foster, for convening
this hearing. And thanks to our witnesses for appearing before
us today.
Misinformation is not a new phenomenon. Disinformation
campaigns have been used throughout history to spread state
propaganda and influence geopolitics. It is no secret that
misinformation has the ability to change hearts and minds and
influence perceptions. What is new is the impact that modern
advances in information and communications technologies have
had on the ability of misinformation to spread. It is easier
now than ever before to reach global audiences, communicate
instantaneously with friends and family around the world, and
follow every move of politicians, athletes, and Hollywood stars
alike.
The same technologies that facilitate and democratize
global access to information also enable the dissemination of
information at a scale and speed like we have never experienced
before in human history. This has made it more difficult to
determine the accuracy, provenance, and objective truth of the
information we consume. There is more information presented to
individual consumers than ever before, and from myriad
different sources.
The tremendous growth in the popularity of social media
platforms over the past decade has resulted in the consumption
of information that is more personalized than ever before. The
information we read and view online is now perfectly tailored
to each of our own individual preferences, biases, and beliefs.
We each receive an individualized, curated feed of information
every time we visit our social media platform of choice. And it
would not be a stretch to say that, at times, we are each
drinking from our own individual information firehoses.
In this golden age of information, there are many
outstanding questions about how we can assess and ultimately
combat the spread of falsehoods, untruths, ``fake news,'' and
misinformation. I'm pleased that each of the witnesses
testifying before us today has undertaken research to learn
more about how misinformation spreads, and what we can do to
combat it. This is an admirable goal, and we in Congress must
take steps to facilitate further research on this important
topic. But these efforts cannot be undertaken without ensuring
appropriate constraints, limitations, and safeguards are in
place.
The need for data transparency and access is inherently in
tension with the protection of user privacy. We must endeavor
to strike a healthy balance between data transparency on the
one hand, and the protection and preservation of individual
privacy on the other.
We must also respect and protect the intellectual property
rights of the platforms whose data researchers seek to access
and analyze. Social media and technology platforms have
invested significantly in the development of their processes,
technologies, and algorithms, which in many ways is what
distinguishes the user experience of one platform from that of
the others. Each platform is in a race to do it better, faster,
and for less than their competitors. And they rightfully take
great pains to police and protect their trade secrets from
public disclosure. An appropriate balance must be reached
between the intellectual property rights of platforms and the
desire to access and analyze their technologies, processes,
data, and algorithms for the public benefit. I'm not suggesting
that it's an easy balance to strike, but merely asserting that
we must keep this in mind as we work forward.
There is no doubt that misinformation can have harmful and
even deadly real-world consequences. State-sponsored actors
from Russia and China have recently engaged, and continue to
engage, in coordinated disinformation campaigns. From Russia's
efforts to foment discord and chaos around American elections,
to China's efforts to lay blame for COVID-19 at the feet of the
American government, state-sponsored disinformation campaigns
have real consequences.
While social media platforms have rightfully taken steps to
thwart the spread of misinformation, they must also protect
against overcorrection that results in censorship. Competing
hypotheses about the origins of COVID-19 are a compelling
example. For almost a year, the suggestion that COVID-19 could
have originated from anything other than natural zoonosis was
summarily dismissed as conspiracy theory by traditional and
social media alike. However, data now suggests that other
hypotheses are in fact more plausible, and only recently did
mainstream and social media platforms cease to censor these
theories. The censorship of competing explanations has
unquestionably impeded important efforts to investigate the
virus' origins.
Similarly, we must also leave room in our social and
political discourse for parody, satire, and commentary. An
appropriate balance is necessary to ensure that such commentary
is not discouraged or inappropriately discarded as conspiracy
theory or misinformation. Just as misinformation can have real-
world consequences, so too can overcorrection that leads to
censorship of public debate about different ideas.
Combatting misinformation is not an easy endeavor. And the
many researchers looking at how misinformation spreads online
and how to successfully thwart it should be praised for their
efforts. But if we ever expect to truly solve this problem,
then we must recognize that the social media platforms must
have a seat at the table. We cannot expect them to go it alone,
and we should likewise not expect to stop the spread of harmful
misinformation without them.
We must also endeavor to determine how to balance our
societal goal of minimizing the spread of misinformation with
the competing goal of the avoidance of censorship. This balance
is critical because, as history has so often shown, to empower
our media with the unchecked ability to censure would lead our
country down a very dark path.
I look forward to learning more from our witnesses about
how we can work to combat the spread of misinformation on
social media, while simultaneously protecting users' privacy,
platforms' intellectual property, preventing overcorrection,
and preserving public discourse.
Thank you, Chairman Foster, for convening this hearing. And
thanks again to our witnesses for appearing before us today. I
look forward to our discussion.
I yield back the balance of my time.
Chairman Foster. Thank you. And we are honored to have the
Full Committee Chairwoman, Ms. Johnson, with us today. The
Chair now recognizes the Chairwoman for an opening statement.
Chairwoman Johnson. Well, thank you very much, Mr.
Chairman, and let me say good morning, and greet our panelists,
and thank you for holding this hearing. The topic will only
grow in relevance as social media becomes all the more
ingrained in our lives. And worryingly, these issues will
become more dangerous with every topic that becomes hotly
politicized.
Disinformation has been a public health threat for decades.
Experts estimate that 330,000 deaths from AIDS (acquired
immunodeficiency syndrome) in the early 2000's can be
attributed to disinformation about the connection between HIV
(human immunodeficiency virus) and AIDS. The fact of human-
caused climate change, with decades of empirical evidence and
expert consensus behind it, has nevertheless become a subject
of great debate. Monied interests fan the flames of doubt as
oceans rise and forests burn. And now, as we conduct this
hearing virtually due to a surge in COVID-19, conspiracy
theorists and malicious actors spread lies about the severity
of the pandemic. Laymen speculate wildly about the vaccine's
safety, drowning out expert voices. Social media offers fertile
ground for these falsehoods, and unfounded claims that can
spread across the globe in the blink of an eye.
We must not leave the black box of social media
disinformation unexamined. Navigating the difficulties in
extending access to data will not be easy, but failing to do so
will have devastating consequences. This current moment is a
grave example of the stakes at hand. We will not beat the
pandemic without increased vaccine uptake, and every day social
media users are dissuaded from getting the shot after seeing
deeply misinformed posts. People are making decisions for the
health and safety of themselves, their families, and their
communities based on abject falsehoods, and researchers
determined to mitigate the damage are unable to access critical
data on how these lies spread.
I am pleased to join you and others, Chairman Foster, in
welcoming our witnesses today. They are doing important
research into how misinformation circulates on land--online and
impacts our real-world health and safety. I look forward to
your testimony. I yield back.
[The prepared statement of Chairwoman Johnson follows:]
Good afternoon to our panelists, and thank you to Chairman
Foster for holding this hearing. This topic will only grow in
relevance as social media becomes all the more ingrained in our
lives. And worryingly, these issues will become more dangerous
with every topic that becomes hotly politicized.
Disinformation has been a public health threat for decades.
Experts estimate that 330,000 deaths from AIDS in the early
2000s can be attributed to disinformation about the connection
between HIV and AIDS. The fact of human-caused climate change,
with decades of empirical evidence and expert consensus behind
it, has nonetheless become a subject of great debate. Monied
interests fan the flames of doubt as oceans rise and forests
burn. And now, as we conduct this hearing virtually due to a
surge in COVID-19 cases, conspiracy theorists and malicious
actors spread lies about the severity of the pandemic. Laymen
speculate wildly about the vaccine's safety, drowning out
expert voices. Social media offers fertile ground for these
falsehoods, and unfounded claims can spread across the globe in
the blink of an eye.
We must not leave the black box of social media
disinformation unexamined. Navigating the difficulties in
extending access to data will not be easy, but failing to do so
will have devastating consequences. This current moment is a
grave example of the stakes at hand. We will not beat this
pandemic without increased vaccine uptake, and every day,
social media users are dissuaded from getting the shot after
seeing deeply misinformed posts. People are making decisions
for the health and safety of themselves, their families, and
their communities based on abject falsehoods. And researchers
determined to mitigate the damage are unable to access crucial
data on how these lies spread.
I'm pleased to join Chairman Foster in welcoming our
witnesses today. They are doing important research into how
misinformation circulates online and impacts our real-world
health and safety. I look forward to hearing your testimony.
Chairman Foster. Thank you. And if there are Members who
wish to submit additional opening statements, your statements
will be added to the record at this point.
And at this time I'd like to introduce our witnesses. Our
first witness is Dr. Alan Mislove. Dr. Mislove is a Professor,
the Interim Dean at--and Interim Dean at the Khoury College of
Computer Sciences at Northeastern University. His primary field
of interest concerns distributed systems and networks, with a
focus on using social networks to enhance the security,
privacy, and efficiency of newly emerging systems.
Voice. This is----
Chairman Foster. He is also a core faculty member of the
Cybersecurity and Privacy Institute, which forges global
partnerships with experts in industry, government, and
academia.
After Dr. Mislove is Ms. Laura Edelson. Ms. Edelson is a
Ph.D. candidate in Computer Science at NYU's (New York
University's) Tandon School of Engineering. Laura studies
online political communication, and develops methods to
identify inauthentic content and activity. Her research has
informed reporting on social media ad spending in several
national papers, including the New York Times. Prior to
rejoining academia, Ms. Edelson was a software engineer for
Palantir and FactSet, with a focus on applied machine learning
and big data.
Our final witness is Dr. Kevin Leicht. Dr. Leicht is a
Professor and former Head of the Sociology Department at the
University of Illinois Urbana-Champaign, and Director of the
Iowa Social Science Research Center at the University of Iowa.
That's some commute. He previously served as a Program Officer
for the Sociology and Resource Implementations for Data
Intensive Research--the Data Intensive Research Program at the
National Science Foundation. He has written extensively on
issues related to economic development, globalization, and
political sociology.
As our witnesses should know, they each have five minutes
for your spoken testimony. Your written testimony will be
included in the record for the hearing. When you all have
completed your spoken testimony, we will begin with questions.
Each Member will have five minutes to question the panel. And
now we will start with Dr. Mislove. Dr. Mislove provides his
testimony. Proceed.
TESTIMONY OF DR. ALAN MISLOVE,
PROFESSOR AND INTERIM DEAN,
KHOURY COLLEGE OF COMPUTER SCIENCES,
NORTHEASTERN UNIVERSITY
Dr. Mislove. Chairman Foster, Chairwoman Johnson, Ranking
Member Obernolte, and distinguished Members of the
Subcommittee, thank you for the opportunity to appear before
you today. My name is Alan Mislove. I'm a Professor and Interim
Dean at the Khoury College of Computer Sciences at Northeastern
University. My research is on algorithmic auditing. I develop
methodologies that allow me to study large online platforms,
such as those operated by social media companies, to better
understand how they work, how they may be abused, and what
impacts they are having on users. Importantly, I conduct my
research independently, without companies' permission, and
without insider access to data. Put simply, I have no more
access to these platforms than any of you do.
This is a significant challenge. It is difficult to develop
the technologies that enable my work, especially because
companies are resistant to external accountability, and a work
and legal environment that makes such research carry non-
trivial risk. As social media platforms mediate an increasingly
large fraction of online communication, independent research
such as this is critical. Even in the best of worlds,
understanding how these platforms are impacting end users and
society is too big a task for the platforms themselves. Though
much remains to be done, my group and collaborators have been
successful at studying a variety of such platforms, identifying
alarming behaviors, and working with platforms to make
improvements. Thus, I am well-positioned to provide input on
what can currently be measured, and what is needed going
forward to ensure we fully understand the impact that platforms
are having.
So that you can appreciate how we conduct our research, we
typically study platforms using one of two approaches. We can
recruit cohorts of users who agree to donate their data, or we
can run our own experiments on the platforms, for example, by
becoming an advertiser. Unfortunately, both of these approaches
that we have today have significant limitations. Running our
own experiments is often expensive in terms of time and money,
requires significant expertise, and is beyond the capabilities
of many researchers and regulators. Worse, platforms often
actively try to prevent such data collection, have suspended
researchers' accounts, and have threatened litigation for
ethical research in the public interest, with a notable
exception--example being one of my fellow witnesses.
Platforms may say that researchers can rely on aggregated
data that they provide, but this statement is misleading at
best. Social medial platforms have been very hesitant to
release any data, and have often only released aggregated
coarse-grain data in the face of scandal and public backlash.
Often, even accessing the data they do release can be
challenging. In many cases, data sets require approval from the
platform to be able to access, and cannot be shared with other
researchers. Moreover, recent events have shown that platforms
cannot be trusted to provide even correct aggregated data. It
was recently revealed that Facebook neglected to include data
from half of the U.S. population, one of the data sets it
provided, calling numerous studies that relied on that data set
into question.
The upshot is that currently no regulations exist that
require platforms to make data available, and platforms are
actively attacking independent researchers' ability to study
their impacts. In effect, researchers are relying on platforms'
goodwill to allow studies to be run at all, a situation that is
becoming less and less tenable as platforms become more
entrenched. Thus, my key message is that researchers need
Congress to enshrine into law requirements for platforms to
make data available. Mandating such transparency requires
nuance, but is both feasible and urgent.
In particular, I want to convey three key considerations
for how to shape such requirements. First, social media
platforms sit inside broader sociotechnical systems, and the
data that regulations requires be made available must be
comprehensive enough to recognize the complexity of such
systems. For example, platforms are typically funded via
advertising, and any transparency requirement should cover both
organic and paid content. Second, social media platforms allow
numerous types of content to be exchanged, and one-size-fits-
all approaches to the kind of metadata that must be made
available are unworkable. Instead, the kind of data required to
be released must be tailored to the particular type of content.
Ads, pages, shared URLs (Uniform Resource Locators), and so
forth all have different types of metadata that need to be
shared. Third, transparency over who sees the content is
crucial to understand platforms' impact. While existing data
have focused primarily on the content itself, making aggregate
data on the demographics of who is being shown the content is
equally as important, as it's necessary to be able to
understand the platforms' impact on end users.
In summary, social media platforms do not currently have
the proper incentives to allow research on their platforms, and
have been observed to be actively hostile to important ethical
research that is in the public interest. At the same time that
platforms' power and influence is reaching new heights, our
ability, as independent researchers, to understand the impacts
that they are having is being reduced each day. We need
Congress's help to enable researchers to have sufficient access
to data and social media platforms in order to ensure that the
benefits of these platforms do not come at a cost that is too
high for society to bear. Thank you again, and I look forward
to your questions.
[The prepared statement of Dr. Mislove follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Foster. Thank you. And next is Ms. Edelson.
TESTIMONY OF MS. LAURA EDELSON,
PH.D. CANDIDATE AND CO-DIRECTOR
OF CYBERSECURITY FOR DEMOCRACY
AT NEW YORK UNIVERSITY
Ms. Edelson. Good afternoon--good morning, Chairman Foster,
Chairwoman Johnson, Ranking Member Obernolte, and the Members
of the Subcommittee. My name is Laura Edelson. I'm a Ph.D.
candidate in Computer Science, where I also co-lead the
Cybersecurity for Democracy Project, and I'm a Belfer Fellow
with the Anti-Defamation League. As cybersecurity researchers,
my team and I study systemic vulnerabilities in online
platforms that expose people to misleading and false claims,
from fake COVID cures, to voting disinformation, to investment
scams, primarily on Facebook. Our ultimate goal is to develop
workable solutions to digital mis- and disinformation. Members
of this Committee will understand that in order to do this we
need concrete data, and the ability to engage in rigorous
scientific inquiry of that data. And lack of data is currently
the most serious barrier to the work of misinformation
researchers. Twitter is the only major social media platform
that allows most researchers access to public data on their
platform, albeit at a high financial cost.
In 2016 Facebook got a--bought a company called CrowdTangle
that offered access to public Facebook data, and it still
operates their offering. However, very few researchers are
allowed to access this tool. It's primarily offered as a
business intelligence product. Most other platforms, including
YouTube and TikTok, simply offer nothing. In the face of this
black box, some researchers, including my team, Mozilla, and
news outlets like The Markup, have attempted to crowdsource
data about what happens on social media. We have been met in
some cases with outright hostility from the platforms we study.
This summer, after months of legal threats, Facebook cutoff my
team's access to their data. We are far from the only research
team that's been stopped. Algorithm Watch in Germany was forced
to shutter their work entirely after Facebook threatened legal
action against them. Many other researchers who would like to
study at this--study Facebook at this point are frozen out.
They simply can't afford a legal battle with one of the most
powerful corporations in the world.
We had used the data we got from Facebook to support the
finding in our most recent study that posts from disinformation
sources got six times more engagement than that of factual
news, to identify security vulnerabilities that we reported to
Facebook, and to monitor Facebook's own public-facing ad
library for political ads. Every day that my team can't access
the data we need to do our work puts us further behind in a
race to find answers. And make no mistake, the harm being
caused by misinformation and hate online is very real.
In 2019 journalist Jeremy Merrill reported that
conservative retirees were targeted with misleading claims in
Facebook ads, and then guided to sites to convince them to
trade in their retirement funds for precious metals with a
company called metals.com. In the summer of 2020, an advertiser
on Facebook called Protect My Vote ran ads discrediting mail-in
balloting that were aimed at African-American voters in the
Upper Midwest. A report from the Anti-Defamation League found
that exposure to videos from extremist or white supremacist
channels on YouTube remains common, with one in 10 study
participants being exposed. And nearly 40 percent of Latinx
respondents said that they'd seen material that makes them
think that the COVID vaccine is not safe or effective,
according to a study earlier this year.
But I know I don't need to remind any of you who
experienced the invasion of the Capitol on January 6 of the
high costs of misinformation to our social fabric. Facebook
particularly is a selective megaphone. Their own internal
research has shown that the way they have built their algorithm
disproportionately promotes misinformation and extreme content.
To study these issues, all researchers need access to much more
data than Facebook or most other platforms provide. Facebook
should strengthen CrowdTangle by adding data about user
platforms, and broaden access to it so that researchers from
all institutions can use it. Other companies, like Google and
TikTok, should make public data about their platforms
accessible to researchers as soon as possible.
Facebook needs to reinstate my team's accounts immediately
so that we can resume our work. And while we hope this will
happen soon, we must also acknowledge the platform's attempts
at voluntary transparency have failed. It's time for Congress
to act to ensure that researchers and the public have access to
data that we need to protect ourselves from online
misinformation. I believe we will look back and see this moment
in history as a turning point when the costs of disinformation
and hate online became too great to ignore, and we stepped up
and took action.
Previous generations of Americans have taken on public
health crises like cancer or drunk driving, and science has
helped us to meet tough challenges like this, and this helped
us to save lives, and to make the lives we save more enjoyable
and fulfilling. Science can help us now, but only if we provide
researchers the data that we need to study and describe the
problems we face. In closing, I want to thank the Committee for
their attention to these issues, and also for the opportunity
to share my experience and perspective.
[The prepared statement of Ms. Edelson follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Foster. Thank you. And after Ms. Edelson is Dr.
Leicht.
TESTIMONY OF DR. KEVIN LEICHT, PROFESSOR,
UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN
DEPARTMENT OF SOCIOLOGY
Dr. Leicht. Yes, thank you, and thank you for--to the
Committee for inviting me. My name's Kevin Leicht. I'm a
Professor of Sociology at the University of Illinois Champaign-
Urbana, and I have assembled a multi-disciplinary team that is
studying how misinformation spreads through social media
platforms, and what effect labeling has on dampening the spread
of that misinformation. What my group of social scientists,
computer scientists, and journalists, and business professors
has found is that consistent labeling by social media platforms
about COVID-19 severity, transmission, vaccinations, and cures
is somewhat effective at preventing the spread of social--
suspect social media posts. But because of Facebooks
algorithms, and their lack of access to them, we can't really
tell whether reduced sharing of suspect posts is due to
Facebook's algorithm or changes in actual user behavior, and
this unsatisfying outcome is probably why I was invited to
speak to you today.
Though we know quite a bit about how misinformation
spreads, so we know it's not necessarily spread by nefarious
individuals on the dark web, and we know what types of people
are susceptible to consuming this information, we also know
that combatting this information is harder the more
misinformation is repeated, so it becomes harder and harder to
stop. But, as our prior two testifiers have said, social media
platforms keep their data to themselves, and they discuss--do
research internally that is not disclosed. The platforms do
offer places to download such data, but much of the research
happens in lab settings where researchers tightly control what
people see, which is not--which is valuable, but is not really
what happens in the real world of social media consumption.
With the black box algorithms that social media platforms
use, users get vastly different exposures to different types of
informations--different types of information, and we are left
studying what users do with bits of information without knowing
exactly what the stimulus is that's prompting them to share
this misinformation. There are deficiencies in the tools needed
to do this research, the data availability, which has already
been discussed, and there's an overall lack of coordination in
the study of social media information and data collection.
The data availability part, as our prior presenters have
said, is important for independent research. The simple answer
to this problem, when I talk to outsiders, is I say this. We
didn't trust The Tobacco Institute to tell us about the safety
of smoking. We probably shouldn't rely on social media
companies for research on what social media does. That research
needs to be done independently. They have a built-in conflict
of interest with regard to this research, as their purpose is
to draw attention and eyes, and the information that draws
attention and eyes sells advertising. The biggest gap that we
see in doing research is in the data and algorithms, or the
black box the social media companies use to determine what end
users see. And at some level we need access not only to the
data, but to the black box.
There are some things the Federal Government can do, I
think, to help social media researchers, and allow independent
social media research to be done, which I think is vitally
important. The strategy my group thinks of would combine action
by the Federal Government to compel the social media companies
to share data, contributions by the social media providers
themselves, help from private foundations, and help from
private Federal science funders. The Federal Government could
require the platforms to provide data to research groups who
are investigating public interest questions about
misinformation incidents, prevalence, and consequences, and
this data sharing could take many forms.
There could be central--we could see the creations, for
example, of central data depositories like we have in
astronomy, for example, or in other social science areas, where
there are depositories that act as a basic infrastructure for
studying social media information, so people don't have to
reinvent the wheel every time they want to study social media
information, collect their own misinformation, deal with--or
deal with the possible legal consequences, and everything else.
And the access to this data could be through some sort of cloud
computing format, with strict human subjects protocols, so many
more researchers would have access, and they wouldn't have to
jump through the hoops that our group has had to jump through
here. And with that, I'll conclude my remarks. Thank you.
[The prepared statement of Dr. Leicht follows:]
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
Chairman Foster. Well, thank you. And at this point we will
begin our first round of questions. The Chair will now
recognize himself for five minutes.
I'd like to start out my questions by entering a statement
for the record prepared by the Center for Countering Digital
Hate, which studies how dangerous content spreads online and
harms society at large, whether it be offensive hate speech, or
misinformation aimed to change people's beliefs and behaviors
for the worst. So I'd ask unanimous consent for that statement
to be entered into the record. Hearing none, so ordered. And
now on to my questions.
Dr. Mislove, in your research you have purchased ads on
Facebook, and used the performance metrics to gain insight into
the algorithm that determines who actually sees the ads. You
noted in your testimony that your team has spent over $25,000
running ads. Frankly, it strikes me odd that researchers are in
the position of having to pay the subject of their study in
order to gain sufficient access to crucial data. But--so,
first, how do the metrics available to you as a paying
advertiser differ from those that are available to researchers
who are not paying for privileged access, or what forced you to
go and decide to spend money here?
Dr. Mislove. So thank you for the question, Chairman
Foster. You're precisely right, we have used the advertising
system as a methodology. The reason is that if you are not
using that, and using the publicly facing data, the most useful
thing is what's called the ad library. Laura alluded to that.
That only gives extremely coarse-grained statistics on active
ads. You can't look back, you get no idea of the breakdown of
who's actually seeing the ad, their gender, and the delivery
location.
When you become an advertiser, you get access to much more
fine-grained data. For every ad you run, Facebook gives you
very detailed information about how much money is being spent
on that ad, the demographic makeup of who is being shown it,
and that is what we use as the basis of understanding the
delivery algorithm itself. In other words, the decisions that
Facebook is making about which users get to see which ads.
Chairman Foster. Yes. You--do you ever worry that Facebook
knows who you are, and might sort of give you a warped view of
the way they treat advertisers?
Dr. Mislove. That's a fantastic question, and we do. We
actually sometimes use multiple accounts, some of which don't
reveal to Facebook, to make sure that our--we're seeing
consistent behavior across those accounts. But yes, we do
worry----
Chairman Foster. You worry about it, OK.
Dr. Mislove. We do worry about it.
Chairman Foster [continuing]. The ridesharing companies a
while ago, you know, got caught doing things like that.
Dr. Mislove. Precisely.
Chairman Foster. Is there an ethical or privacy-related
reason to share more data with paying advertisers than
researchers?
Dr. Mislove. There's no user privacy related reason to do
so. The statistics we get back do not tell us anything about
the actual people who see our ads. Again, it's just fraction of
men, fraction of women. Facebook will claim that there is, and
I think Laura may have some words about that, but that--when
they say that, they're protecting the privacy of advertisers,
not the privacy of end users.
Chairman Foster. OK. Is there an agreed-upon list that's--
about what sort of information, you know, that is available to
advertisers now that should just be--automatically be available
to researchers? Would that be a reasonable, you know, mandate
for social media generally?
Dr. Mislove. I don't know that such a list exists right
now, but it would not be difficult to develop exactly such a
list. There are big--they already release certain metrics, and
I would argue that there are a number of others that we already
have access to in various ways that could become the basis of
such a list.
Chairman Foster. Yes. Dr. Edelson, in your testimony you
mentioned CrowdTangle and Twitter's Firehose API (application
programming interface) being primarily business analytic tools.
Are--so are businesses getting access to information that
researchers aren't when these tools are being, you know,
throttled, or not made available to researchers? And how does
their intent--their design intent, as business analytic tools,
limit their usefulness to researchers?
Ms. Edelson. Thank you for the question. So, in short, yes.
We've found CrowdTangle to be quite a rich tool for studying
user engagement, and it was quite illuminating for that
purpose. But as researchers, you know, we don't just want to be
able to, you know, come to this conclusion, misinformation is
very engaging. We would also like to be able to understand--to
start to be able to understand how we could stop that, how we
could design systems to make misinformation maybe less
engaging. And in order to do that, one of the things that we
really need is impression data. This is something that would be
really, really crucial to actually start getting to solutions,
and it's something that Facebook doesn't make available through
CrowdTangle.
If I could just speak very quickly to the prior question
about ad data? I actually published--I have a pre-print of a
paper that's available that has a--that is a technical standard
for what data could be made available about ads. I'd be happy
to forward that on to you. It's going to be published in the
next couple of months formally.
Chairman Foster. Thank you, I appreciate that. And I'll now
recognize Ranking Member Obernolte for five minutes.
Mr. Obernolte. Thank you very much, and thank you to our
witnesses. It's been a very interesting hearing. Let me start
with Ms. Edelson. You had something in your written testimony
that you didn't have time to bring up in your oral testimony,
in which you made some recommendations about things that can be
done to facilitate access to information, and one of the things
that you proposed was to create a legal safe harbor for
researchers in working with this data. And I wanted to give you
a platform to elaborate a little bit on that, but if you could
also, as you talk about that, if you could talk about whether
or not that legal safe harbor should also apply to the
platforms themselves, since, ostensibly, they would be giving
you that data that create liability for them as well?
Ms. Edelson. Thank you for the question. I think it's a
very good one. So the researcher safe harbor proposal that the
Knight First Amendment Group and I have called for would
provide legal protections to researchers like me who engage in
direct study of platforms by using it. I think that's the
general thrust. There are many excellent researchers who do
really important ethnographic work, I'm thinking particularly
of Joan Donovan out of Harvard, who does really good work
studying militia groups, how they recruit, other extremist
groups like this, and this would provide cover to these
researchers for their work so that--you know, again, within
bounds, that they handle data responsibly, that their work is
overseen by institutional review boards (IRBs), that is within
ethical boundaries.
As to whether platforms themselves would need legal cover,
I think in general I would need to go back and talk to the
lawyers about that. To my knowledge, in general, we're covering
data that is generally accessible, so I actually don't know if
that would be required.
Mr. Obernolte. Interesting. OK. You brought up the
institutional review boards, which is something else I have a
question about, just because when I got my doctorate, you know,
my research was qualitative, and only involved interviews, and
yet my IRB gave me a hard time about that data. I can't imagine
what yours did to you.
I have a question for Dr. Mislove. In your oral testimony
you discussed the fact that running experiments on the platform
is beyond the capabilities of a lot of researchers, and yet
that seems to be the only way that we can get really unbiased
data, because even if we ask the platform owners for data, you
know, we have, you know, a concern that the data that we're
going to get back is biased in some way, just the same way that
if you ask a cigarette manufacturer whether or not tobacco use
was safe, you know, you wouldn't necessarily trust the veracity
of that data.
I have a further concern about this, though, and--as a
computer scientist myself, you know, a lot of these algorithms
are interconnected. You know, you can't create a fake user and,
you know, run some tests about, you know, what liking this
does, or what not liking that does, and to see what kind of--
how the algorithm works without affecting real users' pages,
right? Because all of that data feeds back into it. So we've
kind of got this quantum mechanical situation where the act of
observing the system is influencing the behavior of the system.
As a researcher, how do you combat that?
Dr. Mislove. Those are fantastic questions, thank you for
them. So to address your--sort of the first question about sort
of the ability to study these, we are able to do it, but the
limitation is when we run--when we become an advertiser, we're
only really able to say what happens to our own ads, right? So
we--it's much harder for us to go beyond that and say, OK, this
is the kind of effects we're seeing on other advertisers' ads.
So we really can speak to the algorithm a bit, but we can't
speak to sort of its impacts on users in many cases.
To your second question around sort of the feedback loops,
and these sorts of quantum mechanical effects, as you described
them, that's exactly right, and we think very much about that.
To give you one example, one of the things we worried about is
how much--like, teasing out how much of the effects we see are
due to the users who engage with the content, versus the
algorithms that actually, you know, choose who to deliver it
to, and Kevin had alluded to that in his testimony.
We have come up with a number of cute tricks to be able to
sort of tease those out in many cases, where we can sort of
make sure that ads show up as the same to individual users, so
we know the users can't react any differently, but the
algorithm will see them differently. So there are ways, in
certain cases, we can get around that, but it's something we
take into account every time.
Mr. Obernolte. Very interesting. Well, Mr. Chair, it looks
like the clock is malfunctioning, which I guess is a good thing
for me, but I'm just going to ask one final question, and open
it up to the whole panel to answer. You know, the end goal here
you know, of the research you're doing, I think, is not only to
understand how misinformation spreads, but to enable us to
reach some kind of societal solution to halting the spread of
misinformation without suppressing free speech.
And I think we can get there, right? We've done that in
other venues. You can't yell fire in a crowded theatre. You
know, that's not--recognized as something that's not
infringement on people's free speech because of its potential
to cause harm. And I think we're going to reach some kind of
standard with that as pertains to online misinformation. And I
think, just like we did there, it's going to revolve around the
intent of the poster of that misinformation. But I'm wondering
if you could weigh in, and anyone would like to, about what you
think that ultimate solution is going to look like.
Dr. Leicht. Can I take a stab at that one? One of the
things my group thinks about in that regard, about how to
balance the relationship between controlling misinformation and
censorship, is to think about simply coming up with more
effective labels. So people can post and basically spread
anything that they want, so the communication itself is not
censored, but one of the things that stops misinformation from
spreading as often, the cognitive interference of actually
labeling this and say, are you sure you want to spread this or
not? But I actually think even something like that is going to
have to be fairly conservative, so there will be some types
misinformation that are simply not in the public interest to
control, or necessarily stop the spread of, and others that's
more vital for, say, the public health, or the public safety.
So that would sort of be my group's way of dealing with this
conundrum.
Mr. Obernolte. Ms. Edelson, go ahead.
Ms. Edelson. So one of my very recent studies, one of our
key findings was that misinformation outperformed factual
content on Facebook. But the real meat of this was this was
true for every partisan category, so far right misinformation
outperforms far right factual content. Far left misinformation
outperforms far left factual content. So I think we all want to
get to a place where misinformation isn't prioritized, it is
not in a fast lane against factual content, and we can do this
without discriminating based on viewpoint, or suppressing--you
know, suppressing certain opinions, or certainly suppressing
facts, as you've spoken to. I think what we need to get to is a
place where engagement--user--you know, user interactions is
not the driving force of what content is promoted.
Mr. Obernolte. Right. OK. Well, thank you very much. It's
been really interesting. I look forward to the rest of the
questions. Despite what's on the clock, I'm sure I'm out of
time, so, Mr. Chair, I'll yield back.
Chairman Foster. Thank you. And I guess, if there's Member
interest, we can certainly entertain a second round of
questions here, because this is--I can't imagine a more
important subject, actually, right now. And so I'll now
recognize my colleague from Illinois, Mr. Casten, for five
minutes.
Mr. Casten. Thank you, Mr. Chairman, and thanks to our
witnesses here. This is really fascinating. The--about three
years ago, relevant that this was before COVID, and I feel
somewhat prescient in an angry way, Mark Zuckerberg testified
before Financial Services Committee, and I asked him in the
first instance whether they would suppress anti-vaccine
information if it came from Jenny McCarthy's Facebook page, and
then separately whether they would suggest--suppress
information from the American Nazi Party if it came from Art
Jones's Facebook page. Art Jones, at the time, had just won the
Republican nomination to run for Congress in Illinois's 3d
congressional District. His answers were unsatisfactory, and
seemed to suggest that the content of the information was one
question, the speaker was another.
I mention that because the recent Wall Street Journal
reporting that they are, in fact, whitelisting certain high-
profile people suggests that this problem has not been solved.
And I'd like to start just with Ms. Edelson, because it sounds
like you've spent a lot of time thinking about this. Do you see
a disparate approach to information protocols depending on the
speaker in your research as we sit right now? Sorry, I think
you're muted.
Ms. Edelson. There certainly currently exists, you know, as
we all now know, two separate systems on Facebook, where some
speakers are effectively not moderated at all, and then there's
everyone else. I think this is almost entirely backwards,
because what Facebook has set up is a situation where these
speakers who have the widest reach are free to spread, you
know, whatever lies they choose, and it will take a long time
for Facebook to act, and often Facebook won't act at all.
I think that we do--that--you know, this is where I think
there is a difference in how we think about content moderation
versus how we think about content promotion. I think that
speakers that have a bigger audience should have a bigger
responsibility to ensure that the information that the
platforms spread on their behalf to their audiences is factual.
Mr. Casten. Yes. That--I think we're all fond of the
framing that freedom of speech and freedom of reach are two
separate things, and I think sometimes we allow them to amplify
horrible messages that would go away if we just limited it to
freedom of speech.
My next question, I want to start with Mr. Mislove, but I--
if we have time, I'd love all of your thoughts on this. I
totally agree with your idea that we should have this data
shared and available for research. At the same time, there's an
implicit premise behind that that says that the data we provide
on social media platforms does not belong to us, and the
custodian of that data is now the firm that has the data. And
the--I personally have been rather persuaded by Roger McNamee
in his writing, that if we gave--if we essentially made sure
that everybody is the custodian of your own data, and all of
your own metadata, and all of that data was portable, we would
essentially end up with a much healthier social media
environment because the--essentially there wouldn't be this
walled garden, and the conflict of interest where the company
that has information about where you traveled last week, who
you were with, what you bought, had that information to share.
And I realize that's a long list, and gets a little bit
beyond the purview of this Committee, but if we were to wave a
wand tomorrow and change the premise such that everybody owned
their own data, that they could opt into sharing that data, and
the metadata around their data, so that they truly had
portability so that they could still say, I actually find it
useful that this device knows where I am, and where I want to
go, and can have all the automated--if we were to do all that,
does that change the environment that you would have where
essentially we would have to get sort of permission for the
data from the public, rather from the companies, that we have,
without really questioning, assume that they're the custodians
of the data? So, Mr. Mislove, start with you, because I see
you're nodding your head so vigorously, but I welcome all of
your thoughts on that question.
Dr. Mislove. That's a great question. I'm sure my other
panelists will have similar thoughts. So one is that--what
you're talking about is essentially sort of democratizing the
ownership of data, which there have been a number of proposals
to do in--at least in the computer science research literature.
It--you know, those sorts of things have some technical
challenges, but I think those are solvable. But I think one way
you could move toward that is give users legal rights over the
data to--that these companies already have on them. So, for
example, Facebook allows you to extract your data from the
site, but there's many things they don't provide you. We have
some information on that. And this--if you allowed users to
have the legal right to say, give me all of your data on me,
that would enable many more research studies, because you could
then get users to contribute their data themselves, with
consent and so forth. So the--you know, do--what you're saying,
I think there's a number of different ways to tackle it, but it
would make significant progress toward enabling researchers to
be able to study these systems.
Mr. Casten. And I realize we're out of time, and we may
come back, but I would be curious if that changes, because now
every individual user would have to consent to sharing the data
with you to do their research, as opposed to saying to
Facebook, just give me the data.
Dr. Mislove. Absolutely. It--I mean, in some ways it would
make it more challenging, but at the same time we've done those
sorts of studies. Like, we've recruited users of--you know,
Laura has a whole study where she did exactly that. So there--
you know, there's precedent for doing it, and it's something
we're used to doing.
Mr. Casten. OK. Well, I'm out of time. I yield back. Thank
you.
Chairman Foster. Thank you. I'll now recognize my colleague
from Colorado, Mr. Perlmutter, for five minutes.
Mr. Perlmutter. Thank you, Mr. Chair. A couple comments,
and then some questions. So one, I have to applaud our Ranking
Member, and our Chair, and to the panel, it is the Science
Committee, and between the two of them, they are able to weave
in quantum mechanics, and usually the Theory of Relativity,
into every panel. So--and I just--I want to congratulate the
Ranking Member on getting quantum mechanics into this panel.
So--No. 1.
No. 2, to the Ranking Member--and, you know, I guess the
concern I have, and the general concern that you've raised
about misinformation and censorship, I think in this day and
age I'm very concerned about The Big Lie, about Joseph
Goebbels, and the ability to promote, and promulgate, and
propagate The Big Lie. And--so I'll start with you, Ms.
Edelson. And, you know, obviously the Anti-Defamation League is
something always concerned about the truth. So you said in your
op-ed in the Times, ``In the course of our overall research,
we've been able to demonstrate that extreme, unreliable news
sources get more engagement, user interaction on Facebook, at
the expense of accurate posts and reporting. What's more, our
work shows that the archive of political ads that Facebook
makes available to researchers is missing more than 100,000
ads.'' Can you elaborate on those two sentences, first about--
you know, and you've talked about it a little bit, but how do
you know that this misinformation really is able to spread
farther and faster than accurate stuff? You're muted.
Ms. Edelson. So the way that we know that is we use
Facebook's own tools. We use Facebook's own business
intelligence tools for understanding how content spreads, how
it engages, because that is very much what Facebook wants its
users to do. It wants its user to create content, to create--as
engaging content as possible, because that is Facebook's
business model. It is a user engagement maximization engine,
and then it sells that engagement to advertisers. So we used
those tools to study, you know, what Facebook told us their
users interacted with the most, and that is what we found.
I want to be clear about one thing. I don't think Facebook
chooses to promote--it has not sat down and made the choice, we
will promote misinformation. What it has done is it has chosen
to promote the most engaging content. And when its own internal
research told it that the most engaging content was
misinformation, it was the most polarizing content, it was
hateful content, it didn't do anything about it. It was a
conscious choice not to take steps that would increase the
quality of its information ecosystem, but would also decrease
engagement. And the reason why is ads. Ads are Facebook's
business, and, you know, one of the reasons that that finding,
you know, that finding that there are many, many, many ads and
advertisers who slip through the cracks is that Facebook isn't
willing to make its ad platform more secure, more trustworthy,
because that would make its ad experience worse, and it would
cost it money.
Mr. Perlmutter. So let me just stop you because I've got
all scientists on here, or engineers, except I'm the lawyer,
and at some point it moves from unintentional to intentional.
And--so that would be my argument. And so I want to turn to
Professor Leicht for a second. So--and a number of you brought
up, you know, would you trust the information that you might
get from The Tobacco Institute. And here--so--now, Ms. Edelson
is relying on their tools. I mean, how would you approach this
thing? Would it be any different than she has, to try to figure
out what's going on here? I mean, she's used their own tools to
prove a case against them.
Dr. Leicht. Yes. Well, I would trust her research in part
because the tools are what--the tools are in integral part of
their business model. So if the tools don't work somehow, or
don't promote more engagement, then the company doesn't make as
much money. So unless, through the tools, they are somehow
feeding her false information that's specifically bespoke and
just sent to her, I would be inclined to trust that. But it is
another situation where we are basically trusting them, but on
the other hand, some of what she's getting access to is sort of
behind the wall, or behind the veil, and so--and it's tied to
how they make money, so I tend to trust that.
Mr. Perlmutter. Thank you. I yield back, Mr. Chair.
Chairman Foster. All right. And I guess we now have time
for a second round of questions, so I will recognize myself for
five minutes.
The first question--you know, how do you publish
information here, where the tools that you use are likely to be
altered or abolished underneath, you know, your feet? And so,
you know, scientific reproducibility, it's the touchstone of
everything, seems to be hard to get to. And some of you touched
on that in your testimony. I'm just wondering what--the
conflicts you see there, and reasonable solutions to them. I
think any one of you, just----
Dr. Mislove. Just to clarify, did you mean that--how do we
study this system when it's changing constantly, and it--you
know, our access could be revoked at any moment?
Chairman Foster. Correct. And that the access may not be
granted to someone who wants to reproduce your results.
Dr. Mislove. Um-hum. Yes. No, that--you're absolutely
right, that's a real problem for us. When we act as an
advertiser, we keep logs of everything, so we have--we get
copies of all of the data on our ads, because, like, our
accounts could be shut down at any moment, and as a result,
we'd lose access to our scientific data. But it is challenging
because there are other, you know, features of the platform
that one can only access when one has been in the platform for
a long time, and so we have access to some of those. And that
would mean that other groups would have significant trouble
being able to reproduce our results. That's why I think a more
sustainable solution would be one where the platforms are
required to make data available, so then the other researchers
could analyze that data in a way Professor Leicht talked about,
and reproduce any analysis that comes out.
Ms. Edelson. I just wanted to quickly follow onto Professor
Mislove's testimony, because there are also some really
perverse incentives here. So, for example, Dr. Mislove is the
absolute expert in Facebook--in ad--the Facebook advertiser
view, but my team engaged in a little bit of that once
ourselves. We found a security vulnerability in the Facebook
advertising process. I can't say too much about this,
unfortunately, because it is a security vulnerability, but we
reported it to Facebook, and when we did report that to
Facebook, Facebook terminated our advertiser account, so we
couldn't continue that work. And that's--I know I'm not the
only person that that's happened to.
Chairman Foster. Yes. So--some of your work involved
basically making a Chrome add-on, and so--Facebook had some bad
experience with add-on tools, with Cambridge Analytica and so
on, so I can understand how they're a little bit reticent to
let people make add-ons of various kinds.
Ms. Edelson. Actually, this was totally separate from that.
Chairman Foster. No, I understand it was a different
mechanism, but it's sort of a similar approach, where someone
claims to be doing research, and in fact are--is doing
something much more nefarious. And so I can--you know, they--it
will cost them money to do due diligence on people that claim
that they're doing research. And so it--you know, that's--it's
just one of the many tensions we're under on this. Do you think
the best solution is actually not to have to rely on, you know,
essentially spyware that people opt into on their browser, and
just say--and just provide, under controlled circumstances,
direct access to the huge data base of all user engagement?
Ms. Edelson. I mean, frankly, yes. I think moving toward a
world where platforms do not, you know, do not have the--are
not the final authority on who gets to study them, that's
probably a much healthier environment. I mean, you know,
tobacco companies--I forget who made this analogy earlier, but
tobacco companies don't get to decide who does research on
smoking, and the idea that social media companies get to decide
who studies them is perverse.
Chairman Foster. Yes. Dr. Leicht?
Dr. Leicht. If I could add to that, the way social media
dialog is taken hold of in American society, you know, social
media posts, and the sharing of, is really a public record of
our communication with each other, so it's an awful lot like
other forms of public records about communication with each
other that we store in places like that Library of Congress, or
something. So historians, someday, are going to look back at
this era, and they're not going to have a very good perception
of what's going on because they're not going to have any access
to any of the original social media posts that a lot of our
discussions were based on, and that's going to not be a good
situation at all.
Chairman Foster. Yes. Dr. Mislove?
Dr. Mislove. Yes, I'll just add on to that to say that
the--it would--to echo Ms. Edelson's point, that the current
ways that these platforms make data available often allow you
to find the malicious actors on their platforms, for example
the purveyors of misinformation, right? But they don't allow
you to look at the role that the platform itself plays in
amplifying that information. So, specifically, we try to study
the algorithm, and the data made available via the ad library
and other tools don't allow us to tease out what the algorithm
is doing versus the malicious actors. So having a regime where
Congress would require all data to be released to be able to be
studied would allow us to tease out both the malicious actors,
as well as the role of the platform itself.
Chairman Foster. Thank you. And my time is now up. I'll
recognize Representative Obernolte for five minutes.
Mr. Obernolte. Thank you very much, Chairman Foster. So,
you know, for the second round, I'd like to take us, like, up
to 30,000 feet. We've been talking about, you know, the
specific subject matter of this hearing is how do we eliminate
the barriers to data to allow researchers to conduct research
into the way that misinformation spreads on social media,
right? But the big goal here is to try and figure out how to
stop the spread of misinformation, which a lot of people have
raised different examples of how it's been destructive over the
last couple of years. And I have to say, I am not optimistic
about this. I'm a pessimist. Ms. Edelson, you were talking
about the fact that maybe Facebook hasn't--has not deliberately
chosen to provide misinformation, and I know Congressman
Perlmutter was skeptical about that. I'm skeptical too, and I
don't think it's ever going to be reasonable to think that the
data that you're getting voluntarily out of these platforms is
going to be unbiased. I mean, there's too big a commercial
incentive there.
So I'd like to talk about the business model. And let me
also say that, you know, there's been testimony that perhaps
a--some kind of framework around users owning their own
personal data would solve this problem, and I have to tell you
emphatically, I don't think it will. I was--when I was in the
California legislature, I was deeply involved in the drafting
of the California Consumer Privacy Act, so I know a lot about
it, but the problem here is not data, and its connection to
users. The problem is that these companies have a business
model that's based around user engagement. And, you know, they
can't even articulate to you, probably, in some senses, how
that works, because if you're--if it's a machine learning kind
of thing, that's--the goal of--it has the goal of maximizing
user engagement, you know, you might not even know that it's
promoting this information because, you know, we don't get that
kind of information back out of these algorithms. So I'm very
skeptical that this is going to allow us to solve the problem.
And I'm wondering your thoughts on this question. You know,
should we be focusing more about--on the model. You know, this
model where Facebook and Twitter provide you this service for
free, and if you don't know how it's being monetized, if you
don't know what the product is then the product is you, right?
That's what economists say, and that's what it is. They're
selling this user engagement. And the reason why you can't pay
a monthly subscription fee to Facebook to avoid their
advertising is that people would be horrified if they knew what
it would cost you, how much money they're making off of each
user. So how--he's the question to you. How do we avoid that? I
mean, do we outlaw business models like this? Do we need more
transparency? What's the ethical way of dealing with this
issue?
Ms. Edelson. That's a great question, and I think the meat
of what you're asking is how much is this a systemic issue? And
I think the answer is you're right, there is probably an
inherent systemic problem with platforms that--whose business
model is built around maximizing user engagement. I think--you
know, I hear the tobacco company analogy a lot. I think I
personally prefer maybe a pharmaceutical company analogy,
because there are good things that come out of social media
too, but there are certainly a lot of problems that can happen.
You know, social media addiction is a very real thing. I
think that we may be going toward a world where, you know, we
can acknowledge that there are good things about social media,
and there are risks, and there are harms, and some of these
risks and harms are particularly acute for the youngest users.
And I think, in a framework like that, you know, we probably
need some regulation for this industry, in the same way that we
have regulation for pharmaceutical companies, we have
regulation for banks. I think, in Chairman Foster's testimony,
he--you know, he spoke about this analogy as well, and I think
it's an apt one, you know, where we--I think this is something
that's important for society, but we all need much better
auditing and transparency of how these platforms function.
Mr. Obernolte. Thank you. Any other thoughts about whether
or not we need to focus more around maximizing user engagement
as a business model? Dr. Leicht?
Dr. Leicht. I--that--I wanted to say, another way of
attacking the business model, or of making the business model a
little bit more benign, might be to allow more competition for
social media in the first place. So social media is dominated
by a very small number of companies that sort of dominate the
entire landscape, and if there were more competition over users
themselves, and users' attention, then the abuse of the users
could probably be reduced somewhat, or I would think it would
be--at least be possible that would happen. So that might be
one direction to go as well, if the business model itself can't
be directly attacked.
Mr. Obernolte. Sure. I've thought about that too, that
maybe--you know, similar to e-mail. You know, when I send you
an e-mail, you and I don't both have to be on Gmail for you to
read what I'm saying, and so maybe we need to think about
social media a different way. When I post something, maybe it
goes out to everybody, and it's out there in the metaverse,
and, you know, if you choose to look at it on Facebook, that's
your choice. But I don't know that that solves the bigger
problem.
But--I mean, I really think that we, as a society, need to
look at this, and also realize, and this is the reason I'm
pessimistic--realize that, because there is such a strong
commercial incentive, that no matter what we do, it's going to
be an uphill battle. I mean, it's like counterfeit tax stamps
on cigarettes, right? The commercial incentive for doing that
is so strong that no matter how much resource you devote to
enforcement, you're still going to have the problem. And, you
know, I think that's the ethical situation we find ourselves in
with social media. Anyway, my time's expired. I'd love to
continue with the conversation, but thanks, everyone, for being
here, and thanks for the fascinating discussion.
Chairman Foster. And, in fact, it appears as though there
are enough interested Members with questions that I would
entertain a third round, so if you want to get your--get with
your staff and if you're interested and let me know, and we'll
consider that. I will now recognize Representative Casten for
five minutes.
Mr. Casten. Thank you, pleasure to be back. Professor
Leicht, in your testimony you said that the companies have a
conflict of interest with regards to researching and policing
their own content because the goal of social media companies is
attention and engagement, and if extreme content produces that
attention and engagement, that means more profit. We saw
recently that Facebook's own--I think Facebook's own internal
analysis was that the majority of people who join hate groups
on Facebook join at the recommendation of a Facebook algorithm.
Now, I realize I'm going to ask you speculate a little bit,
but, to the extent that engaging with extreme content drives
engagement on the site, can one reasonably assume that Facebook
and other social media companies, either by individual or
algorithmically, know where the extreme content is, know the
consequences of the extreme content, and are actively
encouraging you to engage with it?
Dr. Leicht. That is certainly possible. I think they--I
think that the truth is, because a lot of the sharing is done
by the algorithm itself, much as Representative Obernolte said,
they probably don't, you know, personally know that this is
happening, but they don't really do anything to stop it. So
they certainly--so in that sense, especially in the extremist
cases, you could be heading toward a--the situation
Representative Perlmutter was talking about, where there's sort
of almost active negligence here.
Mr. Casten. Yes. And I guess, you know, there's a liability
question there, but in a lot of other venues, you know, if I
had a high speed trading fund that was actively profiting from,
you know, that I was anticipating, you know, I don't know,
Russian invasions of Crimea, whether or not I did that or the
algorithm did that, I might be concerned about the reputational
damage that would come from my fund trading on such
information, right? But let me----
Dr. Leicht. Certainly true, yes.
Mr. Casten. Let me then take that to a more specific
question, because that's a general question, but let's be very
specific. A couple weeks ago we recognized the 20th anniversary
of 9/11, and among the things we recognized was the complete
heroes on Flight 93 who, in a largely pre-internet era, on a
plane, within 10 minutes were able to deduce that there was
about to be a terrorist attack on the United States Capitol and
got together to stop it. Is it reasonable to assume that in the
more recent attack on the U.S. Capitol, given how much was
being amplified on Facebook, that a bunch of smart computer
nerds at Facebook had knowledge a priori of what was being
organized? Because those 40 people on 93 figured it out.
Dr. Leicht. I think it's possible. It's also possible that
nobody at Facebook actually bothered to pay attention to what
their algorithms were recommending. So whether there was
deliberate promotion or deliberate--or--a better description
would be, I suppose, benign neglect of what the algorithm was
doing. In either case, there's--there are invidious problems
there, you know, whether----
Mr. Casten. You know, I guess----
Dr. Leicht [continuing]. An actual person was involved or
not.
Mr. Casten. I guess we get into a question--and I see Dr.
Mislove and Ms. Edelson raising their hands, so let me just--
but I do want to make--just make clear that sometimes we get
caught in our own knickers when we say, sure, something is
immoral, but it's not illegal, so it must be OK. For my money,
if I had the capability to anticipate that there was going to
be an attack on the U.S. Capitol and I didn't give a damn,
there has to be some responsibility there. Shame on us if it's
not illegal, but my goodness, don't look the other way. Ms.
Edelson, I know you--I saw you wanting to comment there.
Ms. Edelson. Yes. I'm sorry, this is really--I worked on
Wall Street on 9/11. That's--that was a bad day. That was a
really, really bad day. And I remember the morning of January 6
because I told my team that morning that I thought it was going
to be a bad day, because this is, you know, this is what I live
and breathe. I look at this stuff every day, and it's awful.
I don't know if anyone at Facebook knew it was going to be
a bad day. I don't work there. But one of the things we do know
is that their internal research has been telling them about the
extremist problem for years. They knew that their algorithm was
promoting hateful and extremist content. They knew that there
were fixes. They knew that those fixes might come at the cost
of user engagement, and they chose not to put those fixes into
place. So as to whether anyone knew on January 6, I don't know,
but they knew about the problem, they knew how to fix it, and
they chose not to.
Mr. Casten. Thank you. I yield back, unless the Chair would
like to allow Dr. Mislove to comment.
Chairman Foster. I'll--yes. If you can give a 30 second----
Dr. Mislove. I'll just add on that the fact that--like,
the--your question goes at the heart of this hearing, which is
that we--that--it's a question that we don't know the answer
to, and as researchers, as outsiders, we don't have the ability
to answer. So that--so, essentially, it's really pointing out
exactly why, you know, legislation in this area really is
needed. I will say that what we do know is that when we have
run political ads, we became a political advertiser and ran
that, we do see exactly the echo chambers that you--that could
lead to these sorts of things. When we run ads, they deliver
more right wing messages toward more right wing users, and vice
versa for left wing messages. So we know the algorithm has
these effects, and it's incredibly important that we understand
how those are playing out in the ways that you're alluding to.
Mr. Casten. Thank you. I yield back.
Chairman Foster. Thank you. And I'll recognize Mr.
Perlmutter for five minutes.
Mr. Perlmutter. All right. Well, that exchange was
particularly sobering. Sean, nice questions. I think you
mentioned one thing about reputational damage, and Professor
Leicht, you know, talked about the market control that these
companies have. If you're a monopolist, it's hard to have
reputational damage. I mean, you've got it. You--you're it. It
doesn't matter. There's nobody else to go to. So my question is
much more--kind of baseline, for me. In the introduction, I
don't know if it was Bill that talked about it, or one of the
panelists, talked about sort of the ability to study Twitter
versus the ability to study some of the others, particularly
Facebook. Can somebody explain that to me? That it was
expensive for Twitter, but at least it was possible. So I just
open it to the panelists.
Ms. Edelson. So Twitter has a--what's called the Firehose
API. You can buy access to, you know, all of Twitter--well, a
fraction of it, and there are researchers who do this, but it
is quite expensive to use. There are also some--Members of this
Committee will appreciate the replicability issues that we
face, because there are some issues with data portability, but
this is why Twitter is the best study platform. Alan?
Dr. Mislove. And we have historically gotten access to
exactly that Firehose API, which is really useful, and Twitter
deserves credit for making that available. I will note, though,
that it is an incomplete view. It doesn't cover many of the ad
targeting information that we've talked about in this hearing,
it doesn't cover delivery information, and so forth. It really
lets you get a view of a random fraction of the public content
shared on Twitter.
Ms. Edelson. And then CrowdTangle has a view to public
pages and groups on Facebook and Instagram, and there is both a
web portal and an API. That's what folks who ingest large
volumes of data, such as I used to do, use. And then, for
platforms like YouTube, we really don't have anything. There's
just--that really is a black box. TikTok is a black box.
Mr. Perlmutter. OK. Thank you. I yield back, Mr. Chair.
Chairman Foster. Thank you. And it's my----
Mr. Perlmutter. And this----
Chairman Foster [continuing]. Understanding----
Mr. Perlmutter. This has been--I just want to say, this has
been fascinating. I've got to leave, but if we have some follow
up hearing at some point, I think it would be fantastic. So
thanks to the panelists.
Chairman Foster. Thank you. And, let's see, I--it's my
understanding that Representative Obernolte, and potentially
Mr. Casten, are up for another round of questions. Is that--all
right, all right, well, then I think that's a quorum for that,
and we'll proceed. Let's see.
So when you think about, you know, data portability
standards, imagine that you're some startup social media firm.
Putting all of this apparatus on top of you is going to be a
huge operational cost. And so, you know, it seems likely that
we're going to have to make this--OK, until you've got a
million users, or something like that, to have a very light
touch on this. But at some point we're going to have to scale
the mandates here. And--so one way to make that less of a
burden is to actually, from the start, have data portability
and access standards that they can design their software
around, so from the start they can know that when we get big,
our data layout and so on is compatible with that. Is that
something that's been thought about? And just, you know, any of
you can grab onto that.
Ms. Edelson. So----
Chairman Foster. Otherwise there's a danger that we'll just
squeeze everyone but the big players out of the business with a
bunch of burdensome requirements.
Ms. Edelson. So I, along with some other researchers at
Mozilla and with the Wesleyan Media Project, as I mentioned, we
published a technical standard for universal ad transparency.
There's a pre-print that's available right now, I'd be happy to
send it to you. We will be publishing it more formally soon.
When we looked at this issue, what we actually found is that we
think it will be less expensive for platforms to comply with
just general data access than it would be for them to have to
build the large public web portals that companies like Facebook
and, to a lesser extent, Google do provide for ads. Because
just shipping data is not actually that expensive, as long as
there is a standard format that they can comply with.
There's a different question here if we're talking about
other forms of non-ad data, organic data, because the volumes
of data get really, really large. The recommendation--so I--
this is something that I am working on developing a technical
standard for. I think our recommendation will likely not
require an archive. I think the recommendation that we'll be
making in a paper I'm developing is for public access, so we
could come to a place where there is programmatic access to the
same content that is publicly available, and meet some other
thresholds. And that is given, again, to--you know, to
researchers who have registered for our program.
And I think, again, as long as there is a standard in
place, complying wouldn't be terribly expensive. I do think
there is a competitiveness concern, so I do think that probably
there's going to be a minimum size threshold that goes into
place, but I think you are right that the research community
needs to do more here.
Chairman Foster. And when you talk about, you know, sort
of--people's right to have access to their data, one of the big
problems there is that a lot of the data is purchased from
third parties, and so what you're going to have to get to is
sort of an identifier for people, some unique identifier for
people, that they can stand up and say here, you know, this is
Bill Foster, you know, here's my--whatever my identifier is,
and everyone who has passed around data on me will have a duty
to respond to that request. And if they've sold it to someone,
or if you purchased it, you're going to have to maintain sort
of the chain of custody of who sold the data to who, to who, to
who, and keep that identifier around, and keep up a response--
you know, a duty to respond to that sort of request, either for
access to your own data, or deletion of that data.
And has--have people tried to write down such a system? How
that would work, how you'd pretend--how you'd avoid things like
identity fraud, and people stealing your entire data set by
claiming they were you? Has--have people attempted that sort
of--to design systems like that?
Dr. Mislove. I can speak a little bit to this, if it--if
that was to the panel. The--we've actually done a decent amount
of work looking at the data broker industry, which is sort of
where these concerns that you're bringing up are sort of the
most acute. In fact, many of the data brokers have actually
partnered, historically, with social media platforms for the
purposes of ad targeting, so that I could target people on
Facebook using data-broker derived attributes. And so the
upside of all of that is that the--in terms of the unique
identifier, they're--the industry is already doing this. They
need to join the Facebook identifiers with the Experian
identifiers, and, you know, we know that they're able to do it,
even though the information about how exactly they did it is
public.
But the--in terms of sort of the identity theft, you know,
concerns you raise, that is absolutely a real concern. I will
say that there is a little bit of transparency on the data
broker industry, that, you know, like, there are certain sites
where you can go to see a limited snapshot of your data, and on
those sites they have identity verification procedures in
place. So I'm not concerned that that's not a solvable problem,
that, you know, this has already been solved in other contexts,
and so, if there were regulation in this area, I think that
would--you know, the technical problems wouldn't be the ones
that would come first.
Chairman Foster. Thank you. I will now recognize
Representative Obernolte for five minutes.
Mr. Obernolte. Thank you, Chairman Foster. Dr. Leicht, if I
could ask you about something that was in your written
testimony that I found very interesting? You were talking about
how research indicates that one of the primary catalysts for
the spread of misinformation is our inability as humans to
process an overabundance of information. And so I wonder if you
could elaborate on that for a minute, and then maybe throw out
some possible solutions to that problem?
Dr. Leicht. Yes. So I--well, unfortunately, that's a
problem of the end user. So there's some research that suggests
that a lot of misinformation is spread not necessarily because
a person is intending to spread misinformation, but because
they're bombarded with so much information they're not spending
time to cognitively process what they see, so they just forward
on posts that look interesting or attractive. And that's--you
know, that, I think, is a problem that psychologists have been
talking about for years, not only with social media, but in
other areas where we're just overloaded with information all
the time, and so our ability to process it isn't very good.
One of the solutions to that seems to be to sort of
interrupt the automatic process that seems to go on when we
read social media sites. So one of the promises of labeling is
that--I mean, if you're reading a set of social media posts,
and then you come upon something that is labeled, that actually
jars you out of this tendency to want to immediately share
something gets you to think about whether you want to share it
or not, and so it actually slows the process down. And that's a
way, then, to get people to think about a specific thing
they're reading, and not necessarily this specific thing as one
of 200 things I'm reading, and they're all the same. So this is
going to be a pervasive problem that is going to be very hard
to deal with, but some forms of labeling may help interrupt the
process so that just automatic sharing, using essentially our
brain stems, is stopped.
Mr. Obernolte. Interesting. So, I mean, what you're talking
about is kind of a supply side solution to the problem, where
social----
Dr. Leicht. Yes.
Mr. Obernolte [continuing]. Media companies would be--you
know, would be interjecting this in a--you know, in a
deliberate effort to combat the spread of misinformation. But
I'm wondering if there might be a demand-side solution. And,
Dr. Mislove, maybe I'll ask you about this. You know, is part
of the solution perhaps increasing our technological literacy?
So, you know, in other words, when--you know, we know that
alcohol addiction is a problem in society, right? So we solved
that problem, you know, to the extent that we have solved it--
we solve it with education, right? If you know you've got
alcoholism that runs in your family because they're--the
genetic component, you know, if you know that alcoholism can
occur, you know, perhaps that you're a little bit more careful
about monitoring how many drinks you take, right?
And so I'm wondering if there's--isn't an educational
component, like we make people aware of this phenomenon, of how
misinformation spreads. You know, we make people aware that
you've got confirmation bias, and so that makes you--when you
read a piece of misinformation that fits right into your
worldview, you're more likely to believe it. You know, and then
that way maybe we encourage people to verify the veracity of
something before they share it. I mean, is there anything to
that, or, you know, or does it have to be a supply side
solution?
Dr. Mislove. I'll--I think it's a great question. I'll
admit it's not my area, so I am truly speculating here, and
I'll defer to some of my other--the other panelists to perhaps
provide a more detailed answer, but I would think so, and I
think--I'll point you to--I know Twitter has recently done a
number of things where, if you go to retweet something, but you
haven't clicked the link, it will ask you, are you sure you
want to do that? Maybe you should read the article first. And
so it seems like those are----
Mr. Obernolte. Maybe you should go to Snopes as well.
Dr. Mislove. Maybe you should go to Snopes as well. So I
think those are inching in the direction of what you're talking
about, but some of my--some of the other witnesses may have a
more detailed answer.
Mr. Obernolte. Sure. Anyone else?
Ms. Edelson. The only thing I'll say is that I suspect some
kind of demand-side solution, as you refer to it, is going to
be necessary, but we don't know what that will look like. It
could come in a wide range of forms, and this is actually one
of the reasons we need data, because we really do need to start
working on solutions, and we need an answer to that question.
Mr. Obernolte. OK. Well, thanks everyone. It's been a
really fascinating hearing, and thank you, Mr. Foster, for
catalyzing this whole discussion. I've really enjoyed it. I
yield back.
Chairman Foster. Thank you. And we'll now recognize
Representative Casten.
Mr. Casten. Thank you, and I echo that this has just been
fascinating, and I'm sorry you didn't have the Full Committee,
and everybody participating, but I'm actually kind of glad
because we've gotten to follow up, and go into a little bit
more depth than we usually do.
Ms. Edelson, shortly after you released your results, which
found that people who rely on Facebook for information have
substantially lower vaccination rates than those who rely on
other sources, Facebook cutoff your access to data. I think
your research said that people who rely exclusively on Facebook
for news, 25 percent of them do not intend to get vaccinated.
Now, I understand, and I appreciate in your text--I think
you said Facebook is using privacy as a pretext to squelch
research that it considers inconvenient, and that--I worry
sometimes that that sounds like, well, we don't do some
research, how much does that really matter? With--I realize
we're all math and science nerds here, at least since Mr.
Perlmutter has not been able to continue, but at core this is
an epidemiological question, right? If we know that certain
behaviors increase the rate of spread of a communicable
disease, the rate of contraction of communicable disease, there
are consequences. And we--you do epidemiology right, people
live. You do it wrong, people die. Can you speak at all to the
consequences of your inability to do what is at core
epidemiological research?
Ms. Edelson. So I just want to first say the study you're
referencing, although it certainly aligns with my work, was
done by David Lazer. Excellent work, that I can recommend. But,
yes, I think you're right. Misinformation--I'm willing to say
this. This misinformation is killing people. We have had a safe
and effective vaccine for COVID for a long time now. We're back
over 2,000 deaths a day. Facebook is not the only reason this
is happening, but it's certainly contributing, because of
exactly that study you cite, and that I personally keep in
mind.
Right now there is vaccine misinformation that is
widespread and easily available on Facebook. I know this
because I have colleagues who still do have access to Facebook
who find it and try to report it every day. And it's really,
really hard for those folks, because they do not feel like the
platforms are their allies in this. And, again, this is
something that Facebook's own research has pointed to, and
Facebook has just chosen not to fix.
Mr. Casten. Feel like we're back where we were in the last
line of questioning. They know they are causing harm, and
choosing not to act. I see a lot of head nods. I'm just getting
depressed, so I'm reluctant to ask any more questions. But, Dr.
Leicht, Dr. Mislove, anything you'd like to add there?
Dr. Mislove. Yes. I mean, I'll just very briefly echo
exactly everything Ms. Edelson said, and say that, you know,
essentially what you're trying to get at is, you know, how do
we fix this? And we've talked to this--in this hearing about a
number of, you know, supply side, demand-side, and so forth,
but ultimately I feel like, as a scientist, you know, I need to
be able to diagnose the problem before I can, you know,
understand how to design fixes that will address the problem,
and currently we don't have the tools able to do that. We don't
know the--you know, how much of the role that the platform is
playing, versus the malicious actors that were referred to
earlier.
And so I think, for me, you know, sort of going with the
phrase, you know, sunlight's the best disinfectant, just being
able to understand it can then enable us to develop, you know,
mitigations, regulations, whatever it is that would address the
issues that we're seeing.
Ms. Edelson. Just to follow up with that, if the platforms
wanted to do one thing today to help start to deal with this
problem, reinstating my account, broadening access to
CrowdTangle, would be the most immediate steps they could take,
because there are many researchers who want to find answers.
They want to be part of the solution, and Facebook is just
refusing any help.
Mr. Casten. At the risk of being crass, it would seem to be
the bare minimum to demonstrate that they give a damn. Thank
you all. This has been truly fascinating, and I yield back.
Chairman Foster. Thank you, and, before we bring the
hearing to a close, I just want to also thank our witnesses for
testifying before the Committee today. The record will remain
open for two weeks for additional statements from the Members,
and for any additional questions the Committee may ask of the
witnesses. The witnesses are now formally excused, and the
hearing is now adjourned.
[Whereupon, at 11:35 a.m., the Subcommittee was adjourned.]
Appendix
----------
Additional Material for the Record
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
[all]