- THE DISINFORMATION BLACK BOX: RESEARCHING SOCIAL MEDIA DATA

[House Hearing, 117 Congress]
[From the U.S. Government Publishing Office]


                     THE DISINFORMATION BLACK BOX:
                     RESEARCHING SOCIAL MEDIA DATA

=======================================================================

                                HEARING

                               BEFORE THE

                     SUBCOMMITTEE ON INVESTIGATIONS
                             AND OVERSIGHT

                                 OF THE

                      COMMITTEE ON SCIENCE, SPACE,
                             AND TECHNOLOGY
                        HOUSE OF REPRESENTATIVES

                    ONE HUNDRED SEVENTEENTH CONGRESS

                             FIRST SESSION

                               __________

                           SEPTEMBER 28, 2021

                               __________

                           Serial No. 117-31

                               __________

 Printed for the use of the Committee on Science, Space, and Technology
 
[GRAPHIC NOT AVAILABLE IN TIFF FORMAT]


       Available via the World Wide Web: http://science.house.gov
       
                              __________

                    U.S. GOVERNMENT PUBLISHING OFFICE                    
45-497PDF                 WASHINGTON : 2022                     
          
-----------------------------------------------------------------------------------   

              COMMITTEE ON SCIENCE, SPACE, AND TECHNOLOGY

             HON. EDDIE BERNICE JOHNSON, Texas, Chairwoman
ZOE LOFGREN, California              FRANK LUCAS, Oklahoma, 
SUZANNE BONAMICI, Oregon                 Ranking Member
AMI BERA, California                 MO BROOKS, Alabama
HALEY STEVENS, Michigan,             BILL POSEY, Florida
    Vice Chair                       RANDY WEBER, Texas
MIKIE SHERRILL, New Jersey           BRIAN BABIN, Texas
JAMAAL BOWMAN, New York              ANTHONY GONZALEZ, Ohio
MELANIE A. STANSBURY, New Mexico     MICHAEL WALTZ, Florida
BRAD SHERMAN, California             JAMES R. BAIRD, Indiana
ED PERLMUTTER, Colorado              DANIEL WEBSTER, Florida
JERRY McNERNEY, California           MIKE GARCIA, California
PAUL TONKO, New York                 STEPHANIE I. BICE, Oklahoma
BILL FOSTER, Illinois                YOUNG KIM, California
DONALD NORCROSS, New Jersey          RANDY FEENSTRA, Iowa
DON BEYER, Virginia                  JAKE LaTURNER, Kansas
CHARLIE CRIST, Florida               CARLOS A. GIMENEZ, Florida
SEAN CASTEN, Illinois                JAY OBERNOLTE, California
CONOR LAMB, Pennsylvania             PETER MEIJER, Michigan
DEBORAH ROSS, North Carolina         JAKE ELLZEY, TEXAS
GWEN MOORE, Wisconsin                VACANCY
DAN KILDEE, Michigan
SUSAN WILD, Pennsylvania
LIZZIE FLETCHER, Texas
                                 ------                                

              Subcommittee on Investigations and Oversight

                  HON. BILL FOSTER, Illinois, Chairman
ED PERLMUTTER, Colorado              JAY OBERNOLTE, California,
AMI BERA, California                   Ranking Member
GWEN MOORE, Wisconsin                VACANCY
SEAN CASTEN, Illinois                VACANCY
                         
                         
                         C  O  N  T  E  N  T  S

                           September 28, 2021

                                                                   Page

Hearing Charter..................................................     2

                           Opening Statements

Statement by Representative Bill Foster, Chairman, Subcommittee 
  on Investigations and Oversight, Committee on Science, Space, 
  and Technology, U.S. House of Representatives..................     9
    Written Statement............................................    10

Statement by Representative Jay Obernolte, Ranking Member, 
  Subcommittee on Investigations and Oversight, Committee on 
  Science, Space, and Technology, U.S. House of Representatives..    11
    Written Statement............................................    12

Statement by Representative Eddie Bernice Johnson, Chairwoman, 
  Committee on Science, Space, and Technology, U.S. House of 
  Representatives................................................    14
    Written Statement............................................    14

                               Witnesses:

Dr. Alan Mislove, Professor and Interim Dean, Khoury College of 
  Computer Sciences, Northeastern University
    Oral Statement...............................................    15
    Written Statement............................................    18

Ms. Laura Edelson, Ph.D. Candidate and Co-Director of 
  Cybersecurity for Democracy at New York University
    Oral Statement...............................................    24
    Written Statement............................................    26

Dr. Kevin Leicht, Professor, University of Illinois Urbana-
  Champaign Department of Sociology
    Oral Statement...............................................    34
    Written Statement............................................    36

Discussion.......................................................    44

              Appendix: Additional Material for the Record

Statement submitted by Representative Bill Foster, Chairman, 
  Subcommittee on Investigations and Oversight, Committee on 
  Science, Space, and Technology, U.S. House of Representatives
    Imran Ahmed, Chief Executive Officer, Center for Countering 
      Digital Hate...............................................    64

Visuals submitted by Ms. Laura Edelson, Ph.D. Candidate and Co-
  Director of Cybersecurity for Democracy at New York University.    73

Letter submitted by Accountable Tech, et al.
    ``Facebook's Stonewalling of Research into its Role in the 
      Capitol Insurrection''.....................................    80

 
                     THE DISINFORMATION BLACK BOX:
                     RESEARCHING SOCIAL MEDIA DATA

                              ----------                              


                      TUESDAY, SEPTEMBER 28, 2021

                  House of Representatives,
      Subcommittee on Investigations and Oversight,
               Committee on Science, Space, and Technology,
                                                   Washington, D.C.

    The Subcommittee met, pursuant to notice, at 10:02 a.m., 
via Zoom, Hon. Bill Foster [Chairman of the Subcommittee] 
presiding.
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]

    Chairman Foster. Well, the hearing will now come to order. 
Without objection, the Chair is authorized to declare recess at 
any time. And, before I deliver my opening remarks, I wanted to 
note that today the Committee is meeting virtually. I want to 
announce a couple of reminders to the Members about the conduct 
of the hearing. First, Members should keep their video feed on 
for as long as they are present in the hearing. Members are 
responsible for their own microphones. Please also keep your 
microphones muted, unless you're speaking. And finally, if 
Members have documents that they wish to submit for the record, 
please e-mail them to the Committee Clerk, whose e-mail address 
was circulated prior to the hearing.
    Well, good morning, and welcome to our Members and our 
panelists. We've--I especially appreciate your willingness to 
have the hearing rescheduled to a time when nothing is 
happening in Washington, D.C. and Congress. But thank you all 
for joining us for this hearing on researcher access to social 
media data. For years experts have been raising the alarm about 
how misinformation and disinformation spreads unabated on 
social media platforms. Long before ``fake news'' was an 
epithet aimed at influencing--anything conflicting with 
someone's own worldview, it described falsehoods presented 
maliciously as fact in order to influence opinions. The problem 
of misinformation is not a new one, but social media has fanned 
the flames, and it is now difficult to imagine political and 
social discourse untouched by its influence.
    The damage caused by misinformation reaches far beyond our 
phone and computer screens. Lies on social media have spawned 
riots and ethnic cleansing, and thousands of deaths around the 
world, and lies about the 2020 election inspired thousands to 
invade our Capitol on July--on January 6 in an attempt to 
disrupt our Constitution, and stop the certification of valid 
election results, resulting in five deaths. Lies about the 
severity of COVID-19 prevented millions of Americans from 
taking the disease seriously, resulting in needless infections 
and needless deaths. Vaccine disinformation is discouraging 
Americans from receiving safe and effective COVID-19 vaccines, 
extending the pandemic, and allowing new variants to 
proliferate.
    For years we have seen the harmful effects of anti-vaccine 
rhetoric, causing the re-emergence of diseases like measles 
that had been eliminated by vaccines. In fact, in July the 
Surgeon General declared that misinformation on social media 
was a public health hazard. Much of this misinformation, in 
fact, appears to be generated and amplified by our enemies, who 
recognize the damage that it does to our country. It is 
therefore imperative that the Science Committee address it as 
we would any other threat to public health, by ensuring that we 
have the best and brightest minds researching the problem so 
that we can base future policy on the best available evidence.
    Unfortunately, it's extremely difficult for researchers to 
gain sufficient access to social media data. Companies do make 
some information public, but it is largely through interfaces 
that they control, meaning that researchers can only see what 
the companies want them to see, and access can be cutoff at any 
time. Today we will hear from our witnesses about the research 
they are able to conduct in this environment. They will tell us 
about the limitations of the existing tools, and what data they 
believe can and should be made public so that we can have a 
better understanding of how social media users interact with 
misinformation, and how that impacts their behavior online and 
offline. We will hear about how mis- and disinformation is 
delivered to social media users through the black box of the 
algorithm, drawing eyes to the sensationalist content that 
inspires the most user engagement, regardless of truth.
    We on the Science Committee understand that the very real 
limitations to full data transparency by social media is a real 
problem. Platforms will argue that some information should be 
protected as trade secrets, much as the computerized financial 
trading firms prize the opacity behind their sometimes abusive 
trading algorithms. At the same time, social media users are 
entitled to privacy, particularly of personally identifiable 
information. However, these concerns cannot be broad excuses to 
shield social media companies from a full outside accounting of 
how their platforms may be endangering public health and 
safety. We simply cannot leave social media unstudied. It is as 
influential a force on the social fabric of the 21st century as 
any other.
    But as it stands, advertisers on these platforms often 
enjoy more access to data than academic researchers looking to 
access the impact of promoted posts. I believe that this--in 
this hearing we can have a constructive launching point to 
explore how the Science Committee can contribute to this 
conversation. We must strike a balance between protecting user 
privacy and confidential business information, while also 
acknowledging that objective, independent research is necessary 
to understand how these platforms influence modern society. 
We've solved this problem for electronic trading and financial 
services. We are solving this problem for academic access to 
electronic health records, and we must solve this problem here.
    I look forward to hearing from our panelists about how we 
can support their important work of shining a light onto the 
disinformation black box that is poisoning our public 
discourse.
    [The prepared statement of Chairman Foster follows:]

    Good morning, and welcome to our members and our panelists. 
Thank you for joining us for this hearing on researcher access 
to social media data. For years, experts have been raising the 
alarm about how misinformation and disinformation spreads 
unabated on social media platforms. Before ``fake news'' was an 
epithet, aimed at anything conflicting with someone's 
worldview, it described falsehoods presented maliciously as 
fact in order to influence opinions. The problem of 
misinformation is not a new one, but social media has fanned 
the flames, and it is now difficult to imagine political and 
social discourse untouched by its influence.
    The damage caused by misinformation reaches far beyond our 
phone and computer screens. Lies about the 2020 election 
inspired thousands to invade the Capitol on January 6 in an 
attempt to stop the certification of the election, resulting in 
five deaths. Lies about the severity of COVID-19 prevented 
millions of Americans from taking the disease seriously, 
resulting in needless infections and deaths. Vaccine 
disinformation is discouraging Americans from receiving safe 
and effective COVID-19 vaccines, extending the pandemic and 
allowing new variants to proliferate. For years we have seen 
the harmful effects of anti-vaccine rhetoric, causing the re-
emergence of diseases like measles that had been eliminated by 
vaccines.
    In July, the Surgeon General declared that misinformation 
on social media is a public health hazard. It is therefore 
imperative that the Science Committee address it as we would 
any other threat to public health--by ensuring that we have the 
brightest minds researching the problem so we can base future 
policy on the best available science.
    Unfortunately, it is extremely difficult for researchers to 
gain sufficient access to social media data. Companies do make 
some information public, but it is largely through interfaces 
they control, meaning that researchers can only see what 
companies want them to. And access can be cut off at any time. 
Today, we will hear from our witnesses about the research they 
are able to conduct in this environment. They will tell us 
about the limitations of the existing tools, and what data they 
believe can and should be made public so we can have a better 
understanding of how social media users interact with 
misinformation and how that impacts their behavior on- and 
offline. We will hear about how mis- and disinformation is 
delivered to social media users through the ``black box'' of 
the algorithm, drawing eyes to sensationalist content that 
inspires user engagement regardless of the truth.
    We on the Science Committee understand the very real 
limitations to full data transparency by social media 
companies. Platforms will argue that some information should be 
protected as trade secrets. In addition, social media users are 
entitled to privacy, particularly of personally identifiable 
information. However, these concerns cannot be broad excuses to 
shield social media companies from a full outside accounting of 
how their platforms may be endangering public health and 
safety. We cannot simply leave social media unstudied. It is as 
influential a force on the social fabric of the 21st century as 
any other. But as it stands, advertisers on these platforms 
often enjoy more access to data than academic researchers 
looking to assess the impact of promoted posts. I believe that 
this hearing can be a constructive launching point to explore 
how the Science Committee can contribute to this conversation. 
We must strike a balance between protecting user privacy and 
confidential business information, while also acknowledging 
that objective, independent research is necessary to understand 
how these platforms influence modern society.
    I look forward to hearing from our panelists about how we 
can support their important work shining a light into the 
disinformation black box poisoning our discourse.
    I now yield to Ranking Member Obernolte for his opening 
statement.

    Chairman Foster. And I now yield it to Ranking Member 
Obernolte for his opening statement.
    Mr. Obernolte. Well thank you very much, Chairman Foster, 
and thank you to our witnesses for being here at this very 
important hearing, and what will prove, I'm sure, to be a 
fascinating hearing on combatting the spread of misinformation 
on social media.
    We live in an amazing world, a world where we are presented 
with a selected, curated newsfeed that only includes the things 
that we're personally interested in. And that's informed by 
algorithms that companies like social media have come up with 
to foster user engagement, and to maximize our interest in the 
information that's being provided. But, unfortunately, as the 
Chairman pointed out, that's also catalyzed the spread of 
misinformation. Combatting that spread is something that has 
been a societal problem for hundreds of years now, but it's 
exacerbated by the fact that information now spreads so easily, 
and that that information is personalized to each one of us. So 
the information that I see in the morning is not the same thing 
that--the information that other people see in the morning, 
and, unfortunately, that can hide and mask the spread of this 
misinformation.
    And if you want a perfect example of how that can be 
problematic, you can look at a hearing that this Subcommittee 
held a couple of weeks ago on the origins of COVID. And one of 
the--for me, the very surprising outcomes of that Committee 
hearing was the fact that, although we had competing theories 
about the spread of COVID, that any theory other than natural 
zoonotic origin had been rejected early in the crisis as 
misinformation, and had been labeled a conspiracy theory, and 
that the social media companies had actively suppressed the 
spread of that information. And now, with the benefit of 
hindsight, and the discovery of new data, we've discovered that 
competing theories are not only possible, but indeed plausible, 
and in fact, you know, might end up being the successful 
theory.
    So no one can deny that this effort to combat the spread of 
misinformation has severely hampered our ability to identify 
the origins of COVID. And it just illustrates how these two 
ideas are in tension, right? On one hand, we want to be a 
society that honors the exercise of free speech, but that is 
fundamentally intentioned with the idea that we also have an 
obligation to stop the spread of misinformation. So I'm hopeful 
that some of our panelists today will be able to talk some more 
about where that moral boundary is.
    And figuring out, as a society, how to balance those two 
competing interests, I think, is critical. Because on the one 
hand, as recent events have shown, we all have a vested 
interest in trying to figure out how to stop the spread of 
misinformation. But on the other hand, history has shown us 
repeatedly that if we allow censorship to take the place of 
misinformation, that will take us down a very dark path as a 
society. So we have to find this middle ground, this balance in 
between the two, and I'm confident that we can.
    And I'm also confident that the social media companies, 
like Facebook, and Instagram, and Twitter, are going to be 
critical to helping us solve this problem, because they have 
the expert knowledge in the way their algorithms work, they 
have the expert knowledge in the way that this information 
spreads, and what catalyzes peoples' interest in news about the 
world around them. And so I think, definitely, that they're 
going to need a seat at the table. We're going to need to tap 
all of our available sources of information, which certainly 
includes them, but also includes the independent researchers 
we're going to hear from today, and I'm very thankful that 
they're out there, gathering this information, to give us a 
holistic view of this problem. So I am looking very much 
forward to the hearing, and looking forward to asking questions 
afterwards. Thank you, Mr. Chairman. I yield back.
    [The prepared statement of Mr. Obernolte follows:]

    Good morning. Thank you, Chairman Foster, for convening 
this hearing. And thanks to our witnesses for appearing before 
us today.
    Misinformation is not a new phenomenon. Disinformation 
campaigns have been used throughout history to spread state 
propaganda and influence geopolitics. It is no secret that 
misinformation has the ability to change hearts and minds and 
influence perceptions. What is new is the impact that modern 
advances in information and communications technologies have 
had on the ability of misinformation to spread. It is easier 
now than ever before to reach global audiences, communicate 
instantaneously with friends and family around the world, and 
follow every move of politicians, athletes, and Hollywood stars 
alike.
    The same technologies that facilitate and democratize 
global access to information also enable the dissemination of 
information at a scale and speed like we have never experienced 
before in human history. This has made it more difficult to 
determine the accuracy, provenance, and objective truth of the 
information we consume. There is more information presented to 
individual consumers than ever before, and from myriad 
different sources.
    The tremendous growth in the popularity of social media 
platforms over the past decade has resulted in the consumption 
of information that is more personalized than ever before. The 
information we read and view online is now perfectly tailored 
to each of our own individual preferences, biases, and beliefs. 
We each receive an individualized, curated feed of information 
every time we visit our social media platform of choice. And it 
would not be a stretch to say that, at times, we are each 
drinking from our own individual information firehoses.
    In this golden age of information, there are many 
outstanding questions about how we can assess and ultimately 
combat the spread of falsehoods, untruths, ``fake news,'' and 
misinformation. I'm pleased that each of the witnesses 
testifying before us today has undertaken research to learn 
more about how misinformation spreads, and what we can do to 
combat it. This is an admirable goal, and we in Congress must 
take steps to facilitate further research on this important 
topic. But these efforts cannot be undertaken without ensuring 
appropriate constraints, limitations, and safeguards are in 
place.
    The need for data transparency and access is inherently in 
tension with the protection of user privacy. We must endeavor 
to strike a healthy balance between data transparency on the 
one hand, and the protection and preservation of individual 
privacy on the other.
    We must also respect and protect the intellectual property 
rights of the platforms whose data researchers seek to access 
and analyze. Social media and technology platforms have 
invested significantly in the development of their processes, 
technologies, and algorithms, which in many ways is what 
distinguishes the user experience of one platform from that of 
the others. Each platform is in a race to do it better, faster, 
and for less than their competitors. And they rightfully take 
great pains to police and protect their trade secrets from 
public disclosure. An appropriate balance must be reached 
between the intellectual property rights of platforms and the 
desire to access and analyze their technologies, processes, 
data, and algorithms for the public benefit. I'm not suggesting 
that it's an easy balance to strike, but merely asserting that 
we must keep this in mind as we work forward.
    There is no doubt that misinformation can have harmful and 
even deadly real-world consequences. State-sponsored actors 
from Russia and China have recently engaged, and continue to 
engage, in coordinated disinformation campaigns. From Russia's 
efforts to foment discord and chaos around American elections, 
to China's efforts to lay blame for COVID-19 at the feet of the 
American government, state-sponsored disinformation campaigns 
have real consequences.
    While social media platforms have rightfully taken steps to 
thwart the spread of misinformation, they must also protect 
against overcorrection that results in censorship. Competing 
hypotheses about the origins of COVID-19 are a compelling 
example. For almost a year, the suggestion that COVID-19 could 
have originated from anything other than natural zoonosis was 
summarily dismissed as conspiracy theory by traditional and 
social media alike. However, data now suggests that other 
hypotheses are in fact more plausible, and only recently did 
mainstream and social media platforms cease to censor these 
theories. The censorship of competing explanations has 
unquestionably impeded important efforts to investigate the 
virus' origins.
    Similarly, we must also leave room in our social and 
political discourse for parody, satire, and commentary. An 
appropriate balance is necessary to ensure that such commentary 
is not discouraged or inappropriately discarded as conspiracy 
theory or misinformation. Just as misinformation can have real-
world consequences, so too can overcorrection that leads to 
censorship of public debate about different ideas.
    Combatting misinformation is not an easy endeavor. And the 
many researchers looking at how misinformation spreads online 
and how to successfully thwart it should be praised for their 
efforts. But if we ever expect to truly solve this problem, 
then we must recognize that the social media platforms must 
have a seat at the table. We cannot expect them to go it alone, 
and we should likewise not expect to stop the spread of harmful 
misinformation without them.
    We must also endeavor to determine how to balance our 
societal goal of minimizing the spread of misinformation with 
the competing goal of the avoidance of censorship. This balance 
is critical because, as history has so often shown, to empower 
our media with the unchecked ability to censure would lead our 
country down a very dark path.
    I look forward to learning more from our witnesses about 
how we can work to combat the spread of misinformation on 
social media, while simultaneously protecting users' privacy, 
platforms' intellectual property, preventing overcorrection, 
and preserving public discourse.
    Thank you, Chairman Foster, for convening this hearing. And 
thanks again to our witnesses for appearing before us today. I 
look forward to our discussion.
    I yield back the balance of my time.

    Chairman Foster. Thank you. And we are honored to have the 
Full Committee Chairwoman, Ms. Johnson, with us today. The 
Chair now recognizes the Chairwoman for an opening statement.
    Chairwoman Johnson. Well, thank you very much, Mr. 
Chairman, and let me say good morning, and greet our panelists, 
and thank you for holding this hearing. The topic will only 
grow in relevance as social media becomes all the more 
ingrained in our lives. And worryingly, these issues will 
become more dangerous with every topic that becomes hotly 
politicized.
    Disinformation has been a public health threat for decades. 
Experts estimate that 330,000 deaths from AIDS (acquired 
immunodeficiency syndrome) in the early 2000's can be 
attributed to disinformation about the connection between HIV 
(human immunodeficiency virus) and AIDS. The fact of human-
caused climate change, with decades of empirical evidence and 
expert consensus behind it, has nevertheless become a subject 
of great debate. Monied interests fan the flames of doubt as 
oceans rise and forests burn. And now, as we conduct this 
hearing virtually due to a surge in COVID-19, conspiracy 
theorists and malicious actors spread lies about the severity 
of the pandemic. Laymen speculate wildly about the vaccine's 
safety, drowning out expert voices. Social media offers fertile 
ground for these falsehoods, and unfounded claims that can 
spread across the globe in the blink of an eye.
    We must not leave the black box of social media 
disinformation unexamined. Navigating the difficulties in 
extending access to data will not be easy, but failing to do so 
will have devastating consequences. This current moment is a 
grave example of the stakes at hand. We will not beat the 
pandemic without increased vaccine uptake, and every day social 
media users are dissuaded from getting the shot after seeing 
deeply misinformed posts. People are making decisions for the 
health and safety of themselves, their families, and their 
communities based on abject falsehoods, and researchers 
determined to mitigate the damage are unable to access critical 
data on how these lies spread.
    I am pleased to join you and others, Chairman Foster, in 
welcoming our witnesses today. They are doing important 
research into how misinformation circulates on land--online and 
impacts our real-world health and safety. I look forward to 
your testimony. I yield back.
    [The prepared statement of Chairwoman Johnson follows:]

    Good afternoon to our panelists, and thank you to Chairman 
Foster for holding this hearing. This topic will only grow in 
relevance as social media becomes all the more ingrained in our 
lives. And worryingly, these issues will become more dangerous 
with every topic that becomes hotly politicized.
    Disinformation has been a public health threat for decades. 
Experts estimate that 330,000 deaths from AIDS in the early 
2000s can be attributed to disinformation about the connection 
between HIV and AIDS. The fact of human-caused climate change, 
with decades of empirical evidence and expert consensus behind 
it, has nonetheless become a subject of great debate. Monied 
interests fan the flames of doubt as oceans rise and forests 
burn. And now, as we conduct this hearing virtually due to a 
surge in COVID-19 cases, conspiracy theorists and malicious 
actors spread lies about the severity of the pandemic. Laymen 
speculate wildly about the vaccine's safety, drowning out 
expert voices. Social media offers fertile ground for these 
falsehoods, and unfounded claims can spread across the globe in 
the blink of an eye.
    We must not leave the black box of social media 
disinformation unexamined. Navigating the difficulties in 
extending access to data will not be easy, but failing to do so 
will have devastating consequences. This current moment is a 
grave example of the stakes at hand. We will not beat this 
pandemic without increased vaccine uptake, and every day, 
social media users are dissuaded from getting the shot after 
seeing deeply misinformed posts. People are making decisions 
for the health and safety of themselves, their families, and 
their communities based on abject falsehoods. And researchers 
determined to mitigate the damage are unable to access crucial 
data on how these lies spread.
    I'm pleased to join Chairman Foster in welcoming our 
witnesses today. They are doing important research into how 
misinformation circulates online and impacts our real-world 
health and safety. I look forward to hearing your testimony.

    Chairman Foster. Thank you. And if there are Members who 
wish to submit additional opening statements, your statements 
will be added to the record at this point.
    And at this time I'd like to introduce our witnesses. Our 
first witness is Dr. Alan Mislove. Dr. Mislove is a Professor, 
the Interim Dean at--and Interim Dean at the Khoury College of 
Computer Sciences at Northeastern University. His primary field 
of interest concerns distributed systems and networks, with a 
focus on using social networks to enhance the security, 
privacy, and efficiency of newly emerging systems.
    Voice. This is----
    Chairman Foster. He is also a core faculty member of the 
Cybersecurity and Privacy Institute, which forges global 
partnerships with experts in industry, government, and 
academia.
    After Dr. Mislove is Ms. Laura Edelson. Ms. Edelson is a 
Ph.D. candidate in Computer Science at NYU's (New York 
University's) Tandon School of Engineering. Laura studies 
online political communication, and develops methods to 
identify inauthentic content and activity. Her research has 
informed reporting on social media ad spending in several 
national papers, including the New York Times. Prior to 
rejoining academia, Ms. Edelson was a software engineer for 
Palantir and FactSet, with a focus on applied machine learning 
and big data.
    Our final witness is Dr. Kevin Leicht. Dr. Leicht is a 
Professor and former Head of the Sociology Department at the 
University of Illinois Urbana-Champaign, and Director of the 
Iowa Social Science Research Center at the University of Iowa. 
That's some commute. He previously served as a Program Officer 
for the Sociology and Resource Implementations for Data 
Intensive Research--the Data Intensive Research Program at the 
National Science Foundation. He has written extensively on 
issues related to economic development, globalization, and 
political sociology.
    As our witnesses should know, they each have five minutes 
for your spoken testimony. Your written testimony will be 
included in the record for the hearing. When you all have 
completed your spoken testimony, we will begin with questions. 
Each Member will have five minutes to question the panel. And 
now we will start with Dr. Mislove. Dr. Mislove provides his 
testimony. Proceed.

                 TESTIMONY OF DR. ALAN MISLOVE,

                  PROFESSOR AND INTERIM DEAN,

              KHOURY COLLEGE OF COMPUTER SCIENCES,

                    NORTHEASTERN UNIVERSITY

    Dr. Mislove. Chairman Foster, Chairwoman Johnson, Ranking 
Member Obernolte, and distinguished Members of the 
Subcommittee, thank you for the opportunity to appear before 
you today. My name is Alan Mislove. I'm a Professor and Interim 
Dean at the Khoury College of Computer Sciences at Northeastern 
University. My research is on algorithmic auditing. I develop 
methodologies that allow me to study large online platforms, 
such as those operated by social media companies, to better 
understand how they work, how they may be abused, and what 
impacts they are having on users. Importantly, I conduct my 
research independently, without companies' permission, and 
without insider access to data. Put simply, I have no more 
access to these platforms than any of you do.
    This is a significant challenge. It is difficult to develop 
the technologies that enable my work, especially because 
companies are resistant to external accountability, and a work 
and legal environment that makes such research carry non-
trivial risk. As social media platforms mediate an increasingly 
large fraction of online communication, independent research 
such as this is critical. Even in the best of worlds, 
understanding how these platforms are impacting end users and 
society is too big a task for the platforms themselves. Though 
much remains to be done, my group and collaborators have been 
successful at studying a variety of such platforms, identifying 
alarming behaviors, and working with platforms to make 
improvements. Thus, I am well-positioned to provide input on 
what can currently be measured, and what is needed going 
forward to ensure we fully understand the impact that platforms 
are having.
    So that you can appreciate how we conduct our research, we 
typically study platforms using one of two approaches. We can 
recruit cohorts of users who agree to donate their data, or we 
can run our own experiments on the platforms, for example, by 
becoming an advertiser. Unfortunately, both of these approaches 
that we have today have significant limitations. Running our 
own experiments is often expensive in terms of time and money, 
requires significant expertise, and is beyond the capabilities 
of many researchers and regulators. Worse, platforms often 
actively try to prevent such data collection, have suspended 
researchers' accounts, and have threatened litigation for 
ethical research in the public interest, with a notable 
exception--example being one of my fellow witnesses.
    Platforms may say that researchers can rely on aggregated 
data that they provide, but this statement is misleading at 
best. Social medial platforms have been very hesitant to 
release any data, and have often only released aggregated 
coarse-grain data in the face of scandal and public backlash. 
Often, even accessing the data they do release can be 
challenging. In many cases, data sets require approval from the 
platform to be able to access, and cannot be shared with other 
researchers. Moreover, recent events have shown that platforms 
cannot be trusted to provide even correct aggregated data. It 
was recently revealed that Facebook neglected to include data 
from half of the U.S. population, one of the data sets it 
provided, calling numerous studies that relied on that data set 
into question.
    The upshot is that currently no regulations exist that 
require platforms to make data available, and platforms are 
actively attacking independent researchers' ability to study 
their impacts. In effect, researchers are relying on platforms' 
goodwill to allow studies to be run at all, a situation that is 
becoming less and less tenable as platforms become more 
entrenched. Thus, my key message is that researchers need 
Congress to enshrine into law requirements for platforms to 
make data available. Mandating such transparency requires 
nuance, but is both feasible and urgent.
    In particular, I want to convey three key considerations 
for how to shape such requirements. First, social media 
platforms sit inside broader sociotechnical systems, and the 
data that regulations requires be made available must be 
comprehensive enough to recognize the complexity of such 
systems. For example, platforms are typically funded via 
advertising, and any transparency requirement should cover both 
organic and paid content. Second, social media platforms allow 
numerous types of content to be exchanged, and one-size-fits-
all approaches to the kind of metadata that must be made 
available are unworkable. Instead, the kind of data required to 
be released must be tailored to the particular type of content. 
Ads, pages, shared URLs (Uniform Resource Locators), and so 
forth all have different types of metadata that need to be 
shared. Third, transparency over who sees the content is 
crucial to understand platforms' impact. While existing data 
have focused primarily on the content itself, making aggregate 
data on the demographics of who is being shown the content is 
equally as important, as it's necessary to be able to 
understand the platforms' impact on end users.
    In summary, social media platforms do not currently have 
the proper incentives to allow research on their platforms, and 
have been observed to be actively hostile to important ethical 
research that is in the public interest. At the same time that 
platforms' power and influence is reaching new heights, our 
ability, as independent researchers, to understand the impacts 
that they are having is being reduced each day. We need 
Congress's help to enable researchers to have sufficient access 
to data and social media platforms in order to ensure that the 
benefits of these platforms do not come at a cost that is too 
high for society to bear. Thank you again, and I look forward 
to your questions.
    [The prepared statement of Dr. Mislove follows:]
    [GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
    
    Chairman Foster. Thank you. And next is Ms. Edelson.

                TESTIMONY OF MS. LAURA EDELSON,

                PH.D. CANDIDATE AND CO-DIRECTOR

                 OF CYBERSECURITY FOR DEMOCRACY

                     AT NEW YORK UNIVERSITY

    Ms. Edelson. Good afternoon--good morning, Chairman Foster, 
Chairwoman Johnson, Ranking Member Obernolte, and the Members 
of the Subcommittee. My name is Laura Edelson. I'm a Ph.D. 
candidate in Computer Science, where I also co-lead the 
Cybersecurity for Democracy Project, and I'm a Belfer Fellow 
with the Anti-Defamation League. As cybersecurity researchers, 
my team and I study systemic vulnerabilities in online 
platforms that expose people to misleading and false claims, 
from fake COVID cures, to voting disinformation, to investment 
scams, primarily on Facebook. Our ultimate goal is to develop 
workable solutions to digital mis- and disinformation. Members 
of this Committee will understand that in order to do this we 
need concrete data, and the ability to engage in rigorous 
scientific inquiry of that data. And lack of data is currently 
the most serious barrier to the work of misinformation 
researchers. Twitter is the only major social media platform 
that allows most researchers access to public data on their 
platform, albeit at a high financial cost.
    In 2016 Facebook got a--bought a company called CrowdTangle 
that offered access to public Facebook data, and it still 
operates their offering. However, very few researchers are 
allowed to access this tool. It's primarily offered as a 
business intelligence product. Most other platforms, including 
YouTube and TikTok, simply offer nothing. In the face of this 
black box, some researchers, including my team, Mozilla, and 
news outlets like The Markup, have attempted to crowdsource 
data about what happens on social media. We have been met in 
some cases with outright hostility from the platforms we study. 
This summer, after months of legal threats, Facebook cutoff my 
team's access to their data. We are far from the only research 
team that's been stopped. Algorithm Watch in Germany was forced 
to shutter their work entirely after Facebook threatened legal 
action against them. Many other researchers who would like to 
study at this--study Facebook at this point are frozen out. 
They simply can't afford a legal battle with one of the most 
powerful corporations in the world.
    We had used the data we got from Facebook to support the 
finding in our most recent study that posts from disinformation 
sources got six times more engagement than that of factual 
news, to identify security vulnerabilities that we reported to 
Facebook, and to monitor Facebook's own public-facing ad 
library for political ads. Every day that my team can't access 
the data we need to do our work puts us further behind in a 
race to find answers. And make no mistake, the harm being 
caused by misinformation and hate online is very real.
    In 2019 journalist Jeremy Merrill reported that 
conservative retirees were targeted with misleading claims in 
Facebook ads, and then guided to sites to convince them to 
trade in their retirement funds for precious metals with a 
company called metals.com. In the summer of 2020, an advertiser 
on Facebook called Protect My Vote ran ads discrediting mail-in 
balloting that were aimed at African-American voters in the 
Upper Midwest. A report from the Anti-Defamation League found 
that exposure to videos from extremist or white supremacist 
channels on YouTube remains common, with one in 10 study 
participants being exposed. And nearly 40 percent of Latinx 
respondents said that they'd seen material that makes them 
think that the COVID vaccine is not safe or effective, 
according to a study earlier this year.
    But I know I don't need to remind any of you who 
experienced the invasion of the Capitol on January 6 of the 
high costs of misinformation to our social fabric. Facebook 
particularly is a selective megaphone. Their own internal 
research has shown that the way they have built their algorithm 
disproportionately promotes misinformation and extreme content. 
To study these issues, all researchers need access to much more 
data than Facebook or most other platforms provide. Facebook 
should strengthen CrowdTangle by adding data about user 
platforms, and broaden access to it so that researchers from 
all institutions can use it. Other companies, like Google and 
TikTok, should make public data about their platforms 
accessible to researchers as soon as possible.
    Facebook needs to reinstate my team's accounts immediately 
so that we can resume our work. And while we hope this will 
happen soon, we must also acknowledge the platform's attempts 
at voluntary transparency have failed. It's time for Congress 
to act to ensure that researchers and the public have access to 
data that we need to protect ourselves from online 
misinformation. I believe we will look back and see this moment 
in history as a turning point when the costs of disinformation 
and hate online became too great to ignore, and we stepped up 
and took action.
    Previous generations of Americans have taken on public 
health crises like cancer or drunk driving, and science has 
helped us to meet tough challenges like this, and this helped 
us to save lives, and to make the lives we save more enjoyable 
and fulfilling. Science can help us now, but only if we provide 
researchers the data that we need to study and describe the 
problems we face. In closing, I want to thank the Committee for 
their attention to these issues, and also for the opportunity 
to share my experience and perspective.
    [The prepared statement of Ms. Edelson follows:]
    [GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
    
    Chairman Foster. Thank you. And after Ms. Edelson is Dr. 
Leicht.

           TESTIMONY OF DR. KEVIN LEICHT, PROFESSOR,

            UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN

                    DEPARTMENT OF SOCIOLOGY

    Dr. Leicht. Yes, thank you, and thank you for--to the 
Committee for inviting me. My name's Kevin Leicht. I'm a 
Professor of Sociology at the University of Illinois Champaign-
Urbana, and I have assembled a multi-disciplinary team that is 
studying how misinformation spreads through social media 
platforms, and what effect labeling has on dampening the spread 
of that misinformation. What my group of social scientists, 
computer scientists, and journalists, and business professors 
has found is that consistent labeling by social media platforms 
about COVID-19 severity, transmission, vaccinations, and cures 
is somewhat effective at preventing the spread of social--
suspect social media posts. But because of Facebooks 
algorithms, and their lack of access to them, we can't really 
tell whether reduced sharing of suspect posts is due to 
Facebook's algorithm or changes in actual user behavior, and 
this unsatisfying outcome is probably why I was invited to 
speak to you today.
    Though we know quite a bit about how misinformation 
spreads, so we know it's not necessarily spread by nefarious 
individuals on the dark web, and we know what types of people 
are susceptible to consuming this information, we also know 
that combatting this information is harder the more 
misinformation is repeated, so it becomes harder and harder to 
stop. But, as our prior two testifiers have said, social media 
platforms keep their data to themselves, and they discuss--do 
research internally that is not disclosed. The platforms do 
offer places to download such data, but much of the research 
happens in lab settings where researchers tightly control what 
people see, which is not--which is valuable, but is not really 
what happens in the real world of social media consumption.
    With the black box algorithms that social media platforms 
use, users get vastly different exposures to different types of 
informations--different types of information, and we are left 
studying what users do with bits of information without knowing 
exactly what the stimulus is that's prompting them to share 
this misinformation. There are deficiencies in the tools needed 
to do this research, the data availability, which has already 
been discussed, and there's an overall lack of coordination in 
the study of social media information and data collection.
    The data availability part, as our prior presenters have 
said, is important for independent research. The simple answer 
to this problem, when I talk to outsiders, is I say this. We 
didn't trust The Tobacco Institute to tell us about the safety 
of smoking. We probably shouldn't rely on social media 
companies for research on what social media does. That research 
needs to be done independently. They have a built-in conflict 
of interest with regard to this research, as their purpose is 
to draw attention and eyes, and the information that draws 
attention and eyes sells advertising. The biggest gap that we 
see in doing research is in the data and algorithms, or the 
black box the social media companies use to determine what end 
users see. And at some level we need access not only to the 
data, but to the black box.
    There are some things the Federal Government can do, I 
think, to help social media researchers, and allow independent 
social media research to be done, which I think is vitally 
important. The strategy my group thinks of would combine action 
by the Federal Government to compel the social media companies 
to share data, contributions by the social media providers 
themselves, help from private foundations, and help from 
private Federal science funders. The Federal Government could 
require the platforms to provide data to research groups who 
are investigating public interest questions about 
misinformation incidents, prevalence, and consequences, and 
this data sharing could take many forms.
    There could be central--we could see the creations, for 
example, of central data depositories like we have in 
astronomy, for example, or in other social science areas, where 
there are depositories that act as a basic infrastructure for 
studying social media information, so people don't have to 
reinvent the wheel every time they want to study social media 
information, collect their own misinformation, deal with--or 
deal with the possible legal consequences, and everything else. 
And the access to this data could be through some sort of cloud 
computing format, with strict human subjects protocols, so many 
more researchers would have access, and they wouldn't have to 
jump through the hoops that our group has had to jump through 
here. And with that, I'll conclude my remarks. Thank you.
    [The prepared statement of Dr. Leicht follows:]
    [GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
    
    Chairman Foster. Well, thank you. And at this point we will 
begin our first round of questions. The Chair will now 
recognize himself for five minutes.
    I'd like to start out my questions by entering a statement 
for the record prepared by the Center for Countering Digital 
Hate, which studies how dangerous content spreads online and 
harms society at large, whether it be offensive hate speech, or 
misinformation aimed to change people's beliefs and behaviors 
for the worst. So I'd ask unanimous consent for that statement 
to be entered into the record. Hearing none, so ordered. And 
now on to my questions.
    Dr. Mislove, in your research you have purchased ads on 
Facebook, and used the performance metrics to gain insight into 
the algorithm that determines who actually sees the ads. You 
noted in your testimony that your team has spent over $25,000 
running ads. Frankly, it strikes me odd that researchers are in 
the position of having to pay the subject of their study in 
order to gain sufficient access to crucial data. But--so, 
first, how do the metrics available to you as a paying 
advertiser differ from those that are available to researchers 
who are not paying for privileged access, or what forced you to 
go and decide to spend money here?
    Dr. Mislove. So thank you for the question, Chairman 
Foster. You're precisely right, we have used the advertising 
system as a methodology. The reason is that if you are not 
using that, and using the publicly facing data, the most useful 
thing is what's called the ad library. Laura alluded to that. 
That only gives extremely coarse-grained statistics on active 
ads. You can't look back, you get no idea of the breakdown of 
who's actually seeing the ad, their gender, and the delivery 
location.
    When you become an advertiser, you get access to much more 
fine-grained data. For every ad you run, Facebook gives you 
very detailed information about how much money is being spent 
on that ad, the demographic makeup of who is being shown it, 
and that is what we use as the basis of understanding the 
delivery algorithm itself. In other words, the decisions that 
Facebook is making about which users get to see which ads.
    Chairman Foster. Yes. You--do you ever worry that Facebook 
knows who you are, and might sort of give you a warped view of 
the way they treat advertisers?
    Dr. Mislove. That's a fantastic question, and we do. We 
actually sometimes use multiple accounts, some of which don't 
reveal to Facebook, to make sure that our--we're seeing 
consistent behavior across those accounts. But yes, we do 
worry----
    Chairman Foster. You worry about it, OK.
    Dr. Mislove. We do worry about it.
    Chairman Foster [continuing]. The ridesharing companies a 
while ago, you know, got caught doing things like that.
    Dr. Mislove. Precisely.
    Chairman Foster. Is there an ethical or privacy-related 
reason to share more data with paying advertisers than 
researchers?
    Dr. Mislove. There's no user privacy related reason to do 
so. The statistics we get back do not tell us anything about 
the actual people who see our ads. Again, it's just fraction of 
men, fraction of women. Facebook will claim that there is, and 
I think Laura may have some words about that, but that--when 
they say that, they're protecting the privacy of advertisers, 
not the privacy of end users.
    Chairman Foster. OK. Is there an agreed-upon list that's--
about what sort of information, you know, that is available to 
advertisers now that should just be--automatically be available 
to researchers? Would that be a reasonable, you know, mandate 
for social media generally?
    Dr. Mislove. I don't know that such a list exists right 
now, but it would not be difficult to develop exactly such a 
list. There are big--they already release certain metrics, and 
I would argue that there are a number of others that we already 
have access to in various ways that could become the basis of 
such a list.
    Chairman Foster. Yes. Dr. Edelson, in your testimony you 
mentioned CrowdTangle and Twitter's Firehose API (application 
programming interface) being primarily business analytic tools. 
Are--so are businesses getting access to information that 
researchers aren't when these tools are being, you know, 
throttled, or not made available to researchers? And how does 
their intent--their design intent, as business analytic tools, 
limit their usefulness to researchers?
    Ms. Edelson. Thank you for the question. So, in short, yes. 
We've found CrowdTangle to be quite a rich tool for studying 
user engagement, and it was quite illuminating for that 
purpose. But as researchers, you know, we don't just want to be 
able to, you know, come to this conclusion, misinformation is 
very engaging. We would also like to be able to understand--to 
start to be able to understand how we could stop that, how we 
could design systems to make misinformation maybe less 
engaging. And in order to do that, one of the things that we 
really need is impression data. This is something that would be 
really, really crucial to actually start getting to solutions, 
and it's something that Facebook doesn't make available through 
CrowdTangle.
    If I could just speak very quickly to the prior question 
about ad data? I actually published--I have a pre-print of a 
paper that's available that has a--that is a technical standard 
for what data could be made available about ads. I'd be happy 
to forward that on to you. It's going to be published in the 
next couple of months formally.
    Chairman Foster. Thank you, I appreciate that. And I'll now 
recognize Ranking Member Obernolte for five minutes.
    Mr. Obernolte. Thank you very much, and thank you to our 
witnesses. It's been a very interesting hearing. Let me start 
with Ms. Edelson. You had something in your written testimony 
that you didn't have time to bring up in your oral testimony, 
in which you made some recommendations about things that can be 
done to facilitate access to information, and one of the things 
that you proposed was to create a legal safe harbor for 
researchers in working with this data. And I wanted to give you 
a platform to elaborate a little bit on that, but if you could 
also, as you talk about that, if you could talk about whether 
or not that legal safe harbor should also apply to the 
platforms themselves, since, ostensibly, they would be giving 
you that data that create liability for them as well?
    Ms. Edelson. Thank you for the question. I think it's a 
very good one. So the researcher safe harbor proposal that the 
Knight First Amendment Group and I have called for would 
provide legal protections to researchers like me who engage in 
direct study of platforms by using it. I think that's the 
general thrust. There are many excellent researchers who do 
really important ethnographic work, I'm thinking particularly 
of Joan Donovan out of Harvard, who does really good work 
studying militia groups, how they recruit, other extremist 
groups like this, and this would provide cover to these 
researchers for their work so that--you know, again, within 
bounds, that they handle data responsibly, that their work is 
overseen by institutional review boards (IRBs), that is within 
ethical boundaries.
    As to whether platforms themselves would need legal cover, 
I think in general I would need to go back and talk to the 
lawyers about that. To my knowledge, in general, we're covering 
data that is generally accessible, so I actually don't know if 
that would be required.
    Mr. Obernolte. Interesting. OK. You brought up the 
institutional review boards, which is something else I have a 
question about, just because when I got my doctorate, you know, 
my research was qualitative, and only involved interviews, and 
yet my IRB gave me a hard time about that data. I can't imagine 
what yours did to you.
    I have a question for Dr. Mislove. In your oral testimony 
you discussed the fact that running experiments on the platform 
is beyond the capabilities of a lot of researchers, and yet 
that seems to be the only way that we can get really unbiased 
data, because even if we ask the platform owners for data, you 
know, we have, you know, a concern that the data that we're 
going to get back is biased in some way, just the same way that 
if you ask a cigarette manufacturer whether or not tobacco use 
was safe, you know, you wouldn't necessarily trust the veracity 
of that data.
    I have a further concern about this, though, and--as a 
computer scientist myself, you know, a lot of these algorithms 
are interconnected. You know, you can't create a fake user and, 
you know, run some tests about, you know, what liking this 
does, or what not liking that does, and to see what kind of--
how the algorithm works without affecting real users' pages, 
right? Because all of that data feeds back into it. So we've 
kind of got this quantum mechanical situation where the act of 
observing the system is influencing the behavior of the system. 
As a researcher, how do you combat that?
    Dr. Mislove. Those are fantastic questions, thank you for 
them. So to address your--sort of the first question about sort 
of the ability to study these, we are able to do it, but the 
limitation is when we run--when we become an advertiser, we're 
only really able to say what happens to our own ads, right? So 
we--it's much harder for us to go beyond that and say, OK, this 
is the kind of effects we're seeing on other advertisers' ads. 
So we really can speak to the algorithm a bit, but we can't 
speak to sort of its impacts on users in many cases.
    To your second question around sort of the feedback loops, 
and these sorts of quantum mechanical effects, as you described 
them, that's exactly right, and we think very much about that. 
To give you one example, one of the things we worried about is 
how much--like, teasing out how much of the effects we see are 
due to the users who engage with the content, versus the 
algorithms that actually, you know, choose who to deliver it 
to, and Kevin had alluded to that in his testimony.
    We have come up with a number of cute tricks to be able to 
sort of tease those out in many cases, where we can sort of 
make sure that ads show up as the same to individual users, so 
we know the users can't react any differently, but the 
algorithm will see them differently. So there are ways, in 
certain cases, we can get around that, but it's something we 
take into account every time.
    Mr. Obernolte. Very interesting. Well, Mr. Chair, it looks 
like the clock is malfunctioning, which I guess is a good thing 
for me, but I'm just going to ask one final question, and open 
it up to the whole panel to answer. You know, the end goal here 
you know, of the research you're doing, I think, is not only to 
understand how misinformation spreads, but to enable us to 
reach some kind of societal solution to halting the spread of 
misinformation without suppressing free speech.
    And I think we can get there, right? We've done that in 
other venues. You can't yell fire in a crowded theatre. You 
know, that's not--recognized as something that's not 
infringement on people's free speech because of its potential 
to cause harm. And I think we're going to reach some kind of 
standard with that as pertains to online misinformation. And I 
think, just like we did there, it's going to revolve around the 
intent of the poster of that misinformation. But I'm wondering 
if you could weigh in, and anyone would like to, about what you 
think that ultimate solution is going to look like.
    Dr. Leicht. Can I take a stab at that one? One of the 
things my group thinks about in that regard, about how to 
balance the relationship between controlling misinformation and 
censorship, is to think about simply coming up with more 
effective labels. So people can post and basically spread 
anything that they want, so the communication itself is not 
censored, but one of the things that stops misinformation from 
spreading as often, the cognitive interference of actually 
labeling this and say, are you sure you want to spread this or 
not? But I actually think even something like that is going to 
have to be fairly conservative, so there will be some types 
misinformation that are simply not in the public interest to 
control, or necessarily stop the spread of, and others that's 
more vital for, say, the public health, or the public safety. 
So that would sort of be my group's way of dealing with this 
conundrum.
    Mr. Obernolte. Ms. Edelson, go ahead.
    Ms. Edelson. So one of my very recent studies, one of our 
key findings was that misinformation outperformed factual 
content on Facebook. But the real meat of this was this was 
true for every partisan category, so far right misinformation 
outperforms far right factual content. Far left misinformation 
outperforms far left factual content. So I think we all want to 
get to a place where misinformation isn't prioritized, it is 
not in a fast lane against factual content, and we can do this 
without discriminating based on viewpoint, or suppressing--you 
know, suppressing certain opinions, or certainly suppressing 
facts, as you've spoken to. I think what we need to get to is a 
place where engagement--user--you know, user interactions is 
not the driving force of what content is promoted.
    Mr. Obernolte. Right. OK. Well, thank you very much. It's 
been really interesting. I look forward to the rest of the 
questions. Despite what's on the clock, I'm sure I'm out of 
time, so, Mr. Chair, I'll yield back.
    Chairman Foster. Thank you. And I guess, if there's Member 
interest, we can certainly entertain a second round of 
questions here, because this is--I can't imagine a more 
important subject, actually, right now. And so I'll now 
recognize my colleague from Illinois, Mr. Casten, for five 
minutes.
    Mr. Casten. Thank you, Mr. Chairman, and thanks to our 
witnesses here. This is really fascinating. The--about three 
years ago, relevant that this was before COVID, and I feel 
somewhat prescient in an angry way, Mark Zuckerberg testified 
before Financial Services Committee, and I asked him in the 
first instance whether they would suppress anti-vaccine 
information if it came from Jenny McCarthy's Facebook page, and 
then separately whether they would suggest--suppress 
information from the American Nazi Party if it came from Art 
Jones's Facebook page. Art Jones, at the time, had just won the 
Republican nomination to run for Congress in Illinois's 3d 
congressional District. His answers were unsatisfactory, and 
seemed to suggest that the content of the information was one 
question, the speaker was another.
    I mention that because the recent Wall Street Journal 
reporting that they are, in fact, whitelisting certain high-
profile people suggests that this problem has not been solved. 
And I'd like to start just with Ms. Edelson, because it sounds 
like you've spent a lot of time thinking about this. Do you see 
a disparate approach to information protocols depending on the 
speaker in your research as we sit right now? Sorry, I think 
you're muted.
    Ms. Edelson. There certainly currently exists, you know, as 
we all now know, two separate systems on Facebook, where some 
speakers are effectively not moderated at all, and then there's 
everyone else. I think this is almost entirely backwards, 
because what Facebook has set up is a situation where these 
speakers who have the widest reach are free to spread, you 
know, whatever lies they choose, and it will take a long time 
for Facebook to act, and often Facebook won't act at all.
    I think that we do--that--you know, this is where I think 
there is a difference in how we think about content moderation 
versus how we think about content promotion. I think that 
speakers that have a bigger audience should have a bigger 
responsibility to ensure that the information that the 
platforms spread on their behalf to their audiences is factual.
    Mr. Casten. Yes. That--I think we're all fond of the 
framing that freedom of speech and freedom of reach are two 
separate things, and I think sometimes we allow them to amplify 
horrible messages that would go away if we just limited it to 
freedom of speech.
    My next question, I want to start with Mr. Mislove, but I--
if we have time, I'd love all of your thoughts on this. I 
totally agree with your idea that we should have this data 
shared and available for research. At the same time, there's an 
implicit premise behind that that says that the data we provide 
on social media platforms does not belong to us, and the 
custodian of that data is now the firm that has the data. And 
the--I personally have been rather persuaded by Roger McNamee 
in his writing, that if we gave--if we essentially made sure 
that everybody is the custodian of your own data, and all of 
your own metadata, and all of that data was portable, we would 
essentially end up with a much healthier social media 
environment because the--essentially there wouldn't be this 
walled garden, and the conflict of interest where the company 
that has information about where you traveled last week, who 
you were with, what you bought, had that information to share.
    And I realize that's a long list, and gets a little bit 
beyond the purview of this Committee, but if we were to wave a 
wand tomorrow and change the premise such that everybody owned 
their own data, that they could opt into sharing that data, and 
the metadata around their data, so that they truly had 
portability so that they could still say, I actually find it 
useful that this device knows where I am, and where I want to 
go, and can have all the automated--if we were to do all that, 
does that change the environment that you would have where 
essentially we would have to get sort of permission for the 
data from the public, rather from the companies, that we have, 
without really questioning, assume that they're the custodians 
of the data? So, Mr. Mislove, start with you, because I see 
you're nodding your head so vigorously, but I welcome all of 
your thoughts on that question.
    Dr. Mislove. That's a great question. I'm sure my other 
panelists will have similar thoughts. So one is that--what 
you're talking about is essentially sort of democratizing the 
ownership of data, which there have been a number of proposals 
to do in--at least in the computer science research literature. 
It--you know, those sorts of things have some technical 
challenges, but I think those are solvable. But I think one way 
you could move toward that is give users legal rights over the 
data to--that these companies already have on them. So, for 
example, Facebook allows you to extract your data from the 
site, but there's many things they don't provide you. We have 
some information on that. And this--if you allowed users to 
have the legal right to say, give me all of your data on me, 
that would enable many more research studies, because you could 
then get users to contribute their data themselves, with 
consent and so forth. So the--you know, do--what you're saying, 
I think there's a number of different ways to tackle it, but it 
would make significant progress toward enabling researchers to 
be able to study these systems.
    Mr. Casten. And I realize we're out of time, and we may 
come back, but I would be curious if that changes, because now 
every individual user would have to consent to sharing the data 
with you to do their research, as opposed to saying to 
Facebook, just give me the data.
    Dr. Mislove. Absolutely. It--I mean, in some ways it would 
make it more challenging, but at the same time we've done those 
sorts of studies. Like, we've recruited users of--you know, 
Laura has a whole study where she did exactly that. So there--
you know, there's precedent for doing it, and it's something 
we're used to doing.
    Mr. Casten. OK. Well, I'm out of time. I yield back. Thank 
you.
    Chairman Foster. Thank you. I'll now recognize my colleague 
from Colorado, Mr. Perlmutter, for five minutes.
    Mr. Perlmutter. Thank you, Mr. Chair. A couple comments, 
and then some questions. So one, I have to applaud our Ranking 
Member, and our Chair, and to the panel, it is the Science 
Committee, and between the two of them, they are able to weave 
in quantum mechanics, and usually the Theory of Relativity, 
into every panel. So--and I just--I want to congratulate the 
Ranking Member on getting quantum mechanics into this panel. 
So--No. 1.
    No. 2, to the Ranking Member--and, you know, I guess the 
concern I have, and the general concern that you've raised 
about misinformation and censorship, I think in this day and 
age I'm very concerned about The Big Lie, about Joseph 
Goebbels, and the ability to promote, and promulgate, and 
propagate The Big Lie. And--so I'll start with you, Ms. 
Edelson. And, you know, obviously the Anti-Defamation League is 
something always concerned about the truth. So you said in your 
op-ed in the Times, ``In the course of our overall research, 
we've been able to demonstrate that extreme, unreliable news 
sources get more engagement, user interaction on Facebook, at 
the expense of accurate posts and reporting. What's more, our 
work shows that the archive of political ads that Facebook 
makes available to researchers is missing more than 100,000 
ads.'' Can you elaborate on those two sentences, first about--
you know, and you've talked about it a little bit, but how do 
you know that this misinformation really is able to spread 
farther and faster than accurate stuff? You're muted.
    Ms. Edelson. So the way that we know that is we use 
Facebook's own tools. We use Facebook's own business 
intelligence tools for understanding how content spreads, how 
it engages, because that is very much what Facebook wants its 
users to do. It wants its user to create content, to create--as 
engaging content as possible, because that is Facebook's 
business model. It is a user engagement maximization engine, 
and then it sells that engagement to advertisers. So we used 
those tools to study, you know, what Facebook told us their 
users interacted with the most, and that is what we found.
    I want to be clear about one thing. I don't think Facebook 
chooses to promote--it has not sat down and made the choice, we 
will promote misinformation. What it has done is it has chosen 
to promote the most engaging content. And when its own internal 
research told it that the most engaging content was 
misinformation, it was the most polarizing content, it was 
hateful content, it didn't do anything about it. It was a 
conscious choice not to take steps that would increase the 
quality of its information ecosystem, but would also decrease 
engagement. And the reason why is ads. Ads are Facebook's 
business, and, you know, one of the reasons that that finding, 
you know, that finding that there are many, many, many ads and 
advertisers who slip through the cracks is that Facebook isn't 
willing to make its ad platform more secure, more trustworthy, 
because that would make its ad experience worse, and it would 
cost it money.
    Mr. Perlmutter. So let me just stop you because I've got 
all scientists on here, or engineers, except I'm the lawyer, 
and at some point it moves from unintentional to intentional. 
And--so that would be my argument. And so I want to turn to 
Professor Leicht for a second. So--and a number of you brought 
up, you know, would you trust the information that you might 
get from The Tobacco Institute. And here--so--now, Ms. Edelson 
is relying on their tools. I mean, how would you approach this 
thing? Would it be any different than she has, to try to figure 
out what's going on here? I mean, she's used their own tools to 
prove a case against them.
    Dr. Leicht. Yes. Well, I would trust her research in part 
because the tools are what--the tools are in integral part of 
their business model. So if the tools don't work somehow, or 
don't promote more engagement, then the company doesn't make as 
much money. So unless, through the tools, they are somehow 
feeding her false information that's specifically bespoke and 
just sent to her, I would be inclined to trust that. But it is 
another situation where we are basically trusting them, but on 
the other hand, some of what she's getting access to is sort of 
behind the wall, or behind the veil, and so--and it's tied to 
how they make money, so I tend to trust that.
    Mr. Perlmutter. Thank you. I yield back, Mr. Chair.
    Chairman Foster. All right. And I guess we now have time 
for a second round of questions, so I will recognize myself for 
five minutes.
    The first question--you know, how do you publish 
information here, where the tools that you use are likely to be 
altered or abolished underneath, you know, your feet? And so, 
you know, scientific reproducibility, it's the touchstone of 
everything, seems to be hard to get to. And some of you touched 
on that in your testimony. I'm just wondering what--the 
conflicts you see there, and reasonable solutions to them. I 
think any one of you, just----
    Dr. Mislove. Just to clarify, did you mean that--how do we 
study this system when it's changing constantly, and it--you 
know, our access could be revoked at any moment?
    Chairman Foster. Correct. And that the access may not be 
granted to someone who wants to reproduce your results.
    Dr. Mislove. Um-hum. Yes. No, that--you're absolutely 
right, that's a real problem for us. When we act as an 
advertiser, we keep logs of everything, so we have--we get 
copies of all of the data on our ads, because, like, our 
accounts could be shut down at any moment, and as a result, 
we'd lose access to our scientific data. But it is challenging 
because there are other, you know, features of the platform 
that one can only access when one has been in the platform for 
a long time, and so we have access to some of those. And that 
would mean that other groups would have significant trouble 
being able to reproduce our results. That's why I think a more 
sustainable solution would be one where the platforms are 
required to make data available, so then the other researchers 
could analyze that data in a way Professor Leicht talked about, 
and reproduce any analysis that comes out.
    Ms. Edelson. I just wanted to quickly follow onto Professor 
Mislove's testimony, because there are also some really 
perverse incentives here. So, for example, Dr. Mislove is the 
absolute expert in Facebook--in ad--the Facebook advertiser 
view, but my team engaged in a little bit of that once 
ourselves. We found a security vulnerability in the Facebook 
advertising process. I can't say too much about this, 
unfortunately, because it is a security vulnerability, but we 
reported it to Facebook, and when we did report that to 
Facebook, Facebook terminated our advertiser account, so we 
couldn't continue that work. And that's--I know I'm not the 
only person that that's happened to.
    Chairman Foster. Yes. So--some of your work involved 
basically making a Chrome add-on, and so--Facebook had some bad 
experience with add-on tools, with Cambridge Analytica and so 
on, so I can understand how they're a little bit reticent to 
let people make add-ons of various kinds.
    Ms. Edelson. Actually, this was totally separate from that.
    Chairman Foster. No, I understand it was a different 
mechanism, but it's sort of a similar approach, where someone 
claims to be doing research, and in fact are--is doing 
something much more nefarious. And so I can--you know, they--it 
will cost them money to do due diligence on people that claim 
that they're doing research. And so it--you know, that's--it's 
just one of the many tensions we're under on this. Do you think 
the best solution is actually not to have to rely on, you know, 
essentially spyware that people opt into on their browser, and 
just say--and just provide, under controlled circumstances, 
direct access to the huge data base of all user engagement?
    Ms. Edelson. I mean, frankly, yes. I think moving toward a 
world where platforms do not, you know, do not have the--are 
not the final authority on who gets to study them, that's 
probably a much healthier environment. I mean, you know, 
tobacco companies--I forget who made this analogy earlier, but 
tobacco companies don't get to decide who does research on 
smoking, and the idea that social media companies get to decide 
who studies them is perverse.
    Chairman Foster. Yes. Dr. Leicht?
    Dr. Leicht. If I could add to that, the way social media 
dialog is taken hold of in American society, you know, social 
media posts, and the sharing of, is really a public record of 
our communication with each other, so it's an awful lot like 
other forms of public records about communication with each 
other that we store in places like that Library of Congress, or 
something. So historians, someday, are going to look back at 
this era, and they're not going to have a very good perception 
of what's going on because they're not going to have any access 
to any of the original social media posts that a lot of our 
discussions were based on, and that's going to not be a good 
situation at all.
    Chairman Foster. Yes. Dr. Mislove?
    Dr. Mislove. Yes, I'll just add on to that to say that 
the--it would--to echo Ms. Edelson's point, that the current 
ways that these platforms make data available often allow you 
to find the malicious actors on their platforms, for example 
the purveyors of misinformation, right? But they don't allow 
you to look at the role that the platform itself plays in 
amplifying that information. So, specifically, we try to study 
the algorithm, and the data made available via the ad library 
and other tools don't allow us to tease out what the algorithm 
is doing versus the malicious actors. So having a regime where 
Congress would require all data to be released to be able to be 
studied would allow us to tease out both the malicious actors, 
as well as the role of the platform itself.
    Chairman Foster. Thank you. And my time is now up. I'll 
recognize Representative Obernolte for five minutes.
    Mr. Obernolte. Thank you very much, Chairman Foster. So, 
you know, for the second round, I'd like to take us, like, up 
to 30,000 feet. We've been talking about, you know, the 
specific subject matter of this hearing is how do we eliminate 
the barriers to data to allow researchers to conduct research 
into the way that misinformation spreads on social media, 
right? But the big goal here is to try and figure out how to 
stop the spread of misinformation, which a lot of people have 
raised different examples of how it's been destructive over the 
last couple of years. And I have to say, I am not optimistic 
about this. I'm a pessimist. Ms. Edelson, you were talking 
about the fact that maybe Facebook hasn't--has not deliberately 
chosen to provide misinformation, and I know Congressman 
Perlmutter was skeptical about that. I'm skeptical too, and I 
don't think it's ever going to be reasonable to think that the 
data that you're getting voluntarily out of these platforms is 
going to be unbiased. I mean, there's too big a commercial 
incentive there.
    So I'd like to talk about the business model. And let me 
also say that, you know, there's been testimony that perhaps 
a--some kind of framework around users owning their own 
personal data would solve this problem, and I have to tell you 
emphatically, I don't think it will. I was--when I was in the 
California legislature, I was deeply involved in the drafting 
of the California Consumer Privacy Act, so I know a lot about 
it, but the problem here is not data, and its connection to 
users. The problem is that these companies have a business 
model that's based around user engagement. And, you know, they 
can't even articulate to you, probably, in some senses, how 
that works, because if you're--if it's a machine learning kind 
of thing, that's--the goal of--it has the goal of maximizing 
user engagement, you know, you might not even know that it's 
promoting this information because, you know, we don't get that 
kind of information back out of these algorithms. So I'm very 
skeptical that this is going to allow us to solve the problem.
    And I'm wondering your thoughts on this question. You know, 
should we be focusing more about--on the model. You know, this 
model where Facebook and Twitter provide you this service for 
free, and if you don't know how it's being monetized, if you 
don't know what the product is then the product is you, right? 
That's what economists say, and that's what it is. They're 
selling this user engagement. And the reason why you can't pay 
a monthly subscription fee to Facebook to avoid their 
advertising is that people would be horrified if they knew what 
it would cost you, how much money they're making off of each 
user. So how--he's the question to you. How do we avoid that? I 
mean, do we outlaw business models like this? Do we need more 
transparency? What's the ethical way of dealing with this 
issue?
    Ms. Edelson. That's a great question, and I think the meat 
of what you're asking is how much is this a systemic issue? And 
I think the answer is you're right, there is probably an 
inherent systemic problem with platforms that--whose business 
model is built around maximizing user engagement. I think--you 
know, I hear the tobacco company analogy a lot. I think I 
personally prefer maybe a pharmaceutical company analogy, 
because there are good things that come out of social media 
too, but there are certainly a lot of problems that can happen.
    You know, social media addiction is a very real thing. I 
think that we may be going toward a world where, you know, we 
can acknowledge that there are good things about social media, 
and there are risks, and there are harms, and some of these 
risks and harms are particularly acute for the youngest users. 
And I think, in a framework like that, you know, we probably 
need some regulation for this industry, in the same way that we 
have regulation for pharmaceutical companies, we have 
regulation for banks. I think, in Chairman Foster's testimony, 
he--you know, he spoke about this analogy as well, and I think 
it's an apt one, you know, where we--I think this is something 
that's important for society, but we all need much better 
auditing and transparency of how these platforms function.
    Mr. Obernolte. Thank you. Any other thoughts about whether 
or not we need to focus more around maximizing user engagement 
as a business model? Dr. Leicht?
    Dr. Leicht. I--that--I wanted to say, another way of 
attacking the business model, or of making the business model a 
little bit more benign, might be to allow more competition for 
social media in the first place. So social media is dominated 
by a very small number of companies that sort of dominate the 
entire landscape, and if there were more competition over users 
themselves, and users' attention, then the abuse of the users 
could probably be reduced somewhat, or I would think it would 
be--at least be possible that would happen. So that might be 
one direction to go as well, if the business model itself can't 
be directly attacked.
    Mr. Obernolte. Sure. I've thought about that too, that 
maybe--you know, similar to e-mail. You know, when I send you 
an e-mail, you and I don't both have to be on Gmail for you to 
read what I'm saying, and so maybe we need to think about 
social media a different way. When I post something, maybe it 
goes out to everybody, and it's out there in the metaverse, 
and, you know, if you choose to look at it on Facebook, that's 
your choice. But I don't know that that solves the bigger 
problem.
    But--I mean, I really think that we, as a society, need to 
look at this, and also realize, and this is the reason I'm 
pessimistic--realize that, because there is such a strong 
commercial incentive, that no matter what we do, it's going to 
be an uphill battle. I mean, it's like counterfeit tax stamps 
on cigarettes, right? The commercial incentive for doing that 
is so strong that no matter how much resource you devote to 
enforcement, you're still going to have the problem. And, you 
know, I think that's the ethical situation we find ourselves in 
with social media. Anyway, my time's expired. I'd love to 
continue with the conversation, but thanks, everyone, for being 
here, and thanks for the fascinating discussion.
    Chairman Foster. And, in fact, it appears as though there 
are enough interested Members with questions that I would 
entertain a third round, so if you want to get your--get with 
your staff and if you're interested and let me know, and we'll 
consider that. I will now recognize Representative Casten for 
five minutes.
    Mr. Casten. Thank you, pleasure to be back. Professor 
Leicht, in your testimony you said that the companies have a 
conflict of interest with regards to researching and policing 
their own content because the goal of social media companies is 
attention and engagement, and if extreme content produces that 
attention and engagement, that means more profit. We saw 
recently that Facebook's own--I think Facebook's own internal 
analysis was that the majority of people who join hate groups 
on Facebook join at the recommendation of a Facebook algorithm. 
Now, I realize I'm going to ask you speculate a little bit, 
but, to the extent that engaging with extreme content drives 
engagement on the site, can one reasonably assume that Facebook 
and other social media companies, either by individual or 
algorithmically, know where the extreme content is, know the 
consequences of the extreme content, and are actively 
encouraging you to engage with it?
    Dr. Leicht. That is certainly possible. I think they--I 
think that the truth is, because a lot of the sharing is done 
by the algorithm itself, much as Representative Obernolte said, 
they probably don't, you know, personally know that this is 
happening, but they don't really do anything to stop it. So 
they certainly--so in that sense, especially in the extremist 
cases, you could be heading toward a--the situation 
Representative Perlmutter was talking about, where there's sort 
of almost active negligence here.
    Mr. Casten. Yes. And I guess, you know, there's a liability 
question there, but in a lot of other venues, you know, if I 
had a high speed trading fund that was actively profiting from, 
you know, that I was anticipating, you know, I don't know, 
Russian invasions of Crimea, whether or not I did that or the 
algorithm did that, I might be concerned about the reputational 
damage that would come from my fund trading on such 
information, right? But let me----
    Dr. Leicht. Certainly true, yes.
    Mr. Casten. Let me then take that to a more specific 
question, because that's a general question, but let's be very 
specific. A couple weeks ago we recognized the 20th anniversary 
of 9/11, and among the things we recognized was the complete 
heroes on Flight 93 who, in a largely pre-internet era, on a 
plane, within 10 minutes were able to deduce that there was 
about to be a terrorist attack on the United States Capitol and 
got together to stop it. Is it reasonable to assume that in the 
more recent attack on the U.S. Capitol, given how much was 
being amplified on Facebook, that a bunch of smart computer 
nerds at Facebook had knowledge a priori of what was being 
organized? Because those 40 people on 93 figured it out.
    Dr. Leicht. I think it's possible. It's also possible that 
nobody at Facebook actually bothered to pay attention to what 
their algorithms were recommending. So whether there was 
deliberate promotion or deliberate--or--a better description 
would be, I suppose, benign neglect of what the algorithm was 
doing. In either case, there's--there are invidious problems 
there, you know, whether----
    Mr. Casten. You know, I guess----
    Dr. Leicht [continuing]. An actual person was involved or 
not.
    Mr. Casten. I guess we get into a question--and I see Dr. 
Mislove and Ms. Edelson raising their hands, so let me just--
but I do want to make--just make clear that sometimes we get 
caught in our own knickers when we say, sure, something is 
immoral, but it's not illegal, so it must be OK. For my money, 
if I had the capability to anticipate that there was going to 
be an attack on the U.S. Capitol and I didn't give a damn, 
there has to be some responsibility there. Shame on us if it's 
not illegal, but my goodness, don't look the other way. Ms. 
Edelson, I know you--I saw you wanting to comment there.
    Ms. Edelson. Yes. I'm sorry, this is really--I worked on 
Wall Street on 9/11. That's--that was a bad day. That was a 
really, really bad day. And I remember the morning of January 6 
because I told my team that morning that I thought it was going 
to be a bad day, because this is, you know, this is what I live 
and breathe. I look at this stuff every day, and it's awful.
    I don't know if anyone at Facebook knew it was going to be 
a bad day. I don't work there. But one of the things we do know 
is that their internal research has been telling them about the 
extremist problem for years. They knew that their algorithm was 
promoting hateful and extremist content. They knew that there 
were fixes. They knew that those fixes might come at the cost 
of user engagement, and they chose not to put those fixes into 
place. So as to whether anyone knew on January 6, I don't know, 
but they knew about the problem, they knew how to fix it, and 
they chose not to.
    Mr. Casten. Thank you. I yield back, unless the Chair would 
like to allow Dr. Mislove to comment.
    Chairman Foster. I'll--yes. If you can give a 30 second----
    Dr. Mislove. I'll just add on that the fact that--like, 
the--your question goes at the heart of this hearing, which is 
that we--that--it's a question that we don't know the answer 
to, and as researchers, as outsiders, we don't have the ability 
to answer. So that--so, essentially, it's really pointing out 
exactly why, you know, legislation in this area really is 
needed. I will say that what we do know is that when we have 
run political ads, we became a political advertiser and ran 
that, we do see exactly the echo chambers that you--that could 
lead to these sorts of things. When we run ads, they deliver 
more right wing messages toward more right wing users, and vice 
versa for left wing messages. So we know the algorithm has 
these effects, and it's incredibly important that we understand 
how those are playing out in the ways that you're alluding to.
    Mr. Casten. Thank you. I yield back.
    Chairman Foster. Thank you. And I'll recognize Mr. 
Perlmutter for five minutes.
    Mr. Perlmutter. All right. Well, that exchange was 
particularly sobering. Sean, nice questions. I think you 
mentioned one thing about reputational damage, and Professor 
Leicht, you know, talked about the market control that these 
companies have. If you're a monopolist, it's hard to have 
reputational damage. I mean, you've got it. You--you're it. It 
doesn't matter. There's nobody else to go to. So my question is 
much more--kind of baseline, for me. In the introduction, I 
don't know if it was Bill that talked about it, or one of the 
panelists, talked about sort of the ability to study Twitter 
versus the ability to study some of the others, particularly 
Facebook. Can somebody explain that to me? That it was 
expensive for Twitter, but at least it was possible. So I just 
open it to the panelists.
    Ms. Edelson. So Twitter has a--what's called the Firehose 
API. You can buy access to, you know, all of Twitter--well, a 
fraction of it, and there are researchers who do this, but it 
is quite expensive to use. There are also some--Members of this 
Committee will appreciate the replicability issues that we 
face, because there are some issues with data portability, but 
this is why Twitter is the best study platform. Alan?
    Dr. Mislove. And we have historically gotten access to 
exactly that Firehose API, which is really useful, and Twitter 
deserves credit for making that available. I will note, though, 
that it is an incomplete view. It doesn't cover many of the ad 
targeting information that we've talked about in this hearing, 
it doesn't cover delivery information, and so forth. It really 
lets you get a view of a random fraction of the public content 
shared on Twitter.
    Ms. Edelson. And then CrowdTangle has a view to public 
pages and groups on Facebook and Instagram, and there is both a 
web portal and an API. That's what folks who ingest large 
volumes of data, such as I used to do, use. And then, for 
platforms like YouTube, we really don't have anything. There's 
just--that really is a black box. TikTok is a black box.
    Mr. Perlmutter. OK. Thank you. I yield back, Mr. Chair.
    Chairman Foster. Thank you. And it's my----
    Mr. Perlmutter. And this----
    Chairman Foster [continuing]. Understanding----
    Mr. Perlmutter. This has been--I just want to say, this has 
been fascinating. I've got to leave, but if we have some follow 
up hearing at some point, I think it would be fantastic. So 
thanks to the panelists.
    Chairman Foster. Thank you. And, let's see, I--it's my 
understanding that Representative Obernolte, and potentially 
Mr. Casten, are up for another round of questions. Is that--all 
right, all right, well, then I think that's a quorum for that, 
and we'll proceed. Let's see.
    So when you think about, you know, data portability 
standards, imagine that you're some startup social media firm. 
Putting all of this apparatus on top of you is going to be a 
huge operational cost. And so, you know, it seems likely that 
we're going to have to make this--OK, until you've got a 
million users, or something like that, to have a very light 
touch on this. But at some point we're going to have to scale 
the mandates here. And--so one way to make that less of a 
burden is to actually, from the start, have data portability 
and access standards that they can design their software 
around, so from the start they can know that when we get big, 
our data layout and so on is compatible with that. Is that 
something that's been thought about? And just, you know, any of 
you can grab onto that.
    Ms. Edelson. So----
    Chairman Foster. Otherwise there's a danger that we'll just 
squeeze everyone but the big players out of the business with a 
bunch of burdensome requirements.
    Ms. Edelson. So I, along with some other researchers at 
Mozilla and with the Wesleyan Media Project, as I mentioned, we 
published a technical standard for universal ad transparency. 
There's a pre-print that's available right now, I'd be happy to 
send it to you. We will be publishing it more formally soon. 
When we looked at this issue, what we actually found is that we 
think it will be less expensive for platforms to comply with 
just general data access than it would be for them to have to 
build the large public web portals that companies like Facebook 
and, to a lesser extent, Google do provide for ads. Because 
just shipping data is not actually that expensive, as long as 
there is a standard format that they can comply with.
    There's a different question here if we're talking about 
other forms of non-ad data, organic data, because the volumes 
of data get really, really large. The recommendation--so I--
this is something that I am working on developing a technical 
standard for. I think our recommendation will likely not 
require an archive. I think the recommendation that we'll be 
making in a paper I'm developing is for public access, so we 
could come to a place where there is programmatic access to the 
same content that is publicly available, and meet some other 
thresholds. And that is given, again, to--you know, to 
researchers who have registered for our program.
    And I think, again, as long as there is a standard in 
place, complying wouldn't be terribly expensive. I do think 
there is a competitiveness concern, so I do think that probably 
there's going to be a minimum size threshold that goes into 
place, but I think you are right that the research community 
needs to do more here.
    Chairman Foster. And when you talk about, you know, sort 
of--people's right to have access to their data, one of the big 
problems there is that a lot of the data is purchased from 
third parties, and so what you're going to have to get to is 
sort of an identifier for people, some unique identifier for 
people, that they can stand up and say here, you know, this is 
Bill Foster, you know, here's my--whatever my identifier is, 
and everyone who has passed around data on me will have a duty 
to respond to that request. And if they've sold it to someone, 
or if you purchased it, you're going to have to maintain sort 
of the chain of custody of who sold the data to who, to who, to 
who, and keep that identifier around, and keep up a response--
you know, a duty to respond to that sort of request, either for 
access to your own data, or deletion of that data.
    And has--have people tried to write down such a system? How 
that would work, how you'd pretend--how you'd avoid things like 
identity fraud, and people stealing your entire data set by 
claiming they were you? Has--have people attempted that sort 
of--to design systems like that?
    Dr. Mislove. I can speak a little bit to this, if it--if 
that was to the panel. The--we've actually done a decent amount 
of work looking at the data broker industry, which is sort of 
where these concerns that you're bringing up are sort of the 
most acute. In fact, many of the data brokers have actually 
partnered, historically, with social media platforms for the 
purposes of ad targeting, so that I could target people on 
Facebook using data-broker derived attributes. And so the 
upside of all of that is that the--in terms of the unique 
identifier, they're--the industry is already doing this. They 
need to join the Facebook identifiers with the Experian 
identifiers, and, you know, we know that they're able to do it, 
even though the information about how exactly they did it is 
public.
    But the--in terms of sort of the identity theft, you know, 
concerns you raise, that is absolutely a real concern. I will 
say that there is a little bit of transparency on the data 
broker industry, that, you know, like, there are certain sites 
where you can go to see a limited snapshot of your data, and on 
those sites they have identity verification procedures in 
place. So I'm not concerned that that's not a solvable problem, 
that, you know, this has already been solved in other contexts, 
and so, if there were regulation in this area, I think that 
would--you know, the technical problems wouldn't be the ones 
that would come first.
    Chairman Foster. Thank you. I will now recognize 
Representative Obernolte for five minutes.
    Mr. Obernolte. Thank you, Chairman Foster. Dr. Leicht, if I 
could ask you about something that was in your written 
testimony that I found very interesting? You were talking about 
how research indicates that one of the primary catalysts for 
the spread of misinformation is our inability as humans to 
process an overabundance of information. And so I wonder if you 
could elaborate on that for a minute, and then maybe throw out 
some possible solutions to that problem?
    Dr. Leicht. Yes. So I--well, unfortunately, that's a 
problem of the end user. So there's some research that suggests 
that a lot of misinformation is spread not necessarily because 
a person is intending to spread misinformation, but because 
they're bombarded with so much information they're not spending 
time to cognitively process what they see, so they just forward 
on posts that look interesting or attractive. And that's--you 
know, that, I think, is a problem that psychologists have been 
talking about for years, not only with social media, but in 
other areas where we're just overloaded with information all 
the time, and so our ability to process it isn't very good.
    One of the solutions to that seems to be to sort of 
interrupt the automatic process that seems to go on when we 
read social media sites. So one of the promises of labeling is 
that--I mean, if you're reading a set of social media posts, 
and then you come upon something that is labeled, that actually 
jars you out of this tendency to want to immediately share 
something gets you to think about whether you want to share it 
or not, and so it actually slows the process down. And that's a 
way, then, to get people to think about a specific thing 
they're reading, and not necessarily this specific thing as one 
of 200 things I'm reading, and they're all the same. So this is 
going to be a pervasive problem that is going to be very hard 
to deal with, but some forms of labeling may help interrupt the 
process so that just automatic sharing, using essentially our 
brain stems, is stopped.
    Mr. Obernolte. Interesting. So, I mean, what you're talking 
about is kind of a supply side solution to the problem, where 
social----
    Dr. Leicht. Yes.
    Mr. Obernolte [continuing]. Media companies would be--you 
know, would be interjecting this in a--you know, in a 
deliberate effort to combat the spread of misinformation. But 
I'm wondering if there might be a demand-side solution. And, 
Dr. Mislove, maybe I'll ask you about this. You know, is part 
of the solution perhaps increasing our technological literacy? 
So, you know, in other words, when--you know, we know that 
alcohol addiction is a problem in society, right? So we solved 
that problem, you know, to the extent that we have solved it--
we solve it with education, right? If you know you've got 
alcoholism that runs in your family because they're--the 
genetic component, you know, if you know that alcoholism can 
occur, you know, perhaps that you're a little bit more careful 
about monitoring how many drinks you take, right?
    And so I'm wondering if there's--isn't an educational 
component, like we make people aware of this phenomenon, of how 
misinformation spreads. You know, we make people aware that 
you've got confirmation bias, and so that makes you--when you 
read a piece of misinformation that fits right into your 
worldview, you're more likely to believe it. You know, and then 
that way maybe we encourage people to verify the veracity of 
something before they share it. I mean, is there anything to 
that, or, you know, or does it have to be a supply side 
solution?
    Dr. Mislove. I'll--I think it's a great question. I'll 
admit it's not my area, so I am truly speculating here, and 
I'll defer to some of my other--the other panelists to perhaps 
provide a more detailed answer, but I would think so, and I 
think--I'll point you to--I know Twitter has recently done a 
number of things where, if you go to retweet something, but you 
haven't clicked the link, it will ask you, are you sure you 
want to do that? Maybe you should read the article first. And 
so it seems like those are----
    Mr. Obernolte. Maybe you should go to Snopes as well.
    Dr. Mislove. Maybe you should go to Snopes as well. So I 
think those are inching in the direction of what you're talking 
about, but some of my--some of the other witnesses may have a 
more detailed answer.
    Mr. Obernolte. Sure. Anyone else?
    Ms. Edelson. The only thing I'll say is that I suspect some 
kind of demand-side solution, as you refer to it, is going to 
be necessary, but we don't know what that will look like. It 
could come in a wide range of forms, and this is actually one 
of the reasons we need data, because we really do need to start 
working on solutions, and we need an answer to that question.
    Mr. Obernolte. OK. Well, thanks everyone. It's been a 
really fascinating hearing, and thank you, Mr. Foster, for 
catalyzing this whole discussion. I've really enjoyed it. I 
yield back.
    Chairman Foster. Thank you. And we'll now recognize 
Representative Casten.
    Mr. Casten. Thank you, and I echo that this has just been 
fascinating, and I'm sorry you didn't have the Full Committee, 
and everybody participating, but I'm actually kind of glad 
because we've gotten to follow up, and go into a little bit 
more depth than we usually do.
    Ms. Edelson, shortly after you released your results, which 
found that people who rely on Facebook for information have 
substantially lower vaccination rates than those who rely on 
other sources, Facebook cutoff your access to data. I think 
your research said that people who rely exclusively on Facebook 
for news, 25 percent of them do not intend to get vaccinated.
    Now, I understand, and I appreciate in your text--I think 
you said Facebook is using privacy as a pretext to squelch 
research that it considers inconvenient, and that--I worry 
sometimes that that sounds like, well, we don't do some 
research, how much does that really matter? With--I realize 
we're all math and science nerds here, at least since Mr. 
Perlmutter has not been able to continue, but at core this is 
an epidemiological question, right? If we know that certain 
behaviors increase the rate of spread of a communicable 
disease, the rate of contraction of communicable disease, there 
are consequences. And we--you do epidemiology right, people 
live. You do it wrong, people die. Can you speak at all to the 
consequences of your inability to do what is at core 
epidemiological research?
    Ms. Edelson. So I just want to first say the study you're 
referencing, although it certainly aligns with my work, was 
done by David Lazer. Excellent work, that I can recommend. But, 
yes, I think you're right. Misinformation--I'm willing to say 
this. This misinformation is killing people. We have had a safe 
and effective vaccine for COVID for a long time now. We're back 
over 2,000 deaths a day. Facebook is not the only reason this 
is happening, but it's certainly contributing, because of 
exactly that study you cite, and that I personally keep in 
mind.
    Right now there is vaccine misinformation that is 
widespread and easily available on Facebook. I know this 
because I have colleagues who still do have access to Facebook 
who find it and try to report it every day. And it's really, 
really hard for those folks, because they do not feel like the 
platforms are their allies in this. And, again, this is 
something that Facebook's own research has pointed to, and 
Facebook has just chosen not to fix.
    Mr. Casten. Feel like we're back where we were in the last 
line of questioning. They know they are causing harm, and 
choosing not to act. I see a lot of head nods. I'm just getting 
depressed, so I'm reluctant to ask any more questions. But, Dr. 
Leicht, Dr. Mislove, anything you'd like to add there?
    Dr. Mislove. Yes. I mean, I'll just very briefly echo 
exactly everything Ms. Edelson said, and say that, you know, 
essentially what you're trying to get at is, you know, how do 
we fix this? And we've talked to this--in this hearing about a 
number of, you know, supply side, demand-side, and so forth, 
but ultimately I feel like, as a scientist, you know, I need to 
be able to diagnose the problem before I can, you know, 
understand how to design fixes that will address the problem, 
and currently we don't have the tools able to do that. We don't 
know the--you know, how much of the role that the platform is 
playing, versus the malicious actors that were referred to 
earlier.
    And so I think, for me, you know, sort of going with the 
phrase, you know, sunlight's the best disinfectant, just being 
able to understand it can then enable us to develop, you know, 
mitigations, regulations, whatever it is that would address the 
issues that we're seeing.
    Ms. Edelson. Just to follow up with that, if the platforms 
wanted to do one thing today to help start to deal with this 
problem, reinstating my account, broadening access to 
CrowdTangle, would be the most immediate steps they could take, 
because there are many researchers who want to find answers. 
They want to be part of the solution, and Facebook is just 
refusing any help.
    Mr. Casten. At the risk of being crass, it would seem to be 
the bare minimum to demonstrate that they give a damn. Thank 
you all. This has been truly fascinating, and I yield back.
    Chairman Foster. Thank you, and, before we bring the 
hearing to a close, I just want to also thank our witnesses for 
testifying before the Committee today. The record will remain 
open for two weeks for additional statements from the Members, 
and for any additional questions the Committee may ask of the 
witnesses. The witnesses are now formally excused, and the 
hearing is now adjourned.
    [Whereupon, at 11:35 a.m., the Subcommittee was adjourned.]

                                Appendix

                              ----------                              


                   Additional Material for the Record

[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]

                                 [all]