[Senate Hearing 119-202]
[From the U.S. Government Publishing Office]
S. Hrg. 119-202
TOO BIG TO PROSECUTE?: EXAMINING THE AI
INDUSTRY'S MASS INGESTION OF
COPYRIGHTED WORKS FOR AI TRAINING
=======================================================================
HEARING
before the
SUBCOMMITTEE ON CRIME AND COUNTERTERRORISM
OF THE
COMMITTEE ON THE JUDICIARY
UNITED STATES SENATE
ONE HUNDRED NINETEENTH CONGRESS
FIRST SESSION
__________
JULY 16, 2025
__________
Serial No. J-119-30
__________
Printed for the use of the Committee on the Judiciary
[GRAPHIC(S) NOT AVAILABLE IN TIFF FORMAT]
www.judiciary.senate.gov
www.govinfo.gov
U.S. GOVERNMENT PUBLISHING OFFICE
61-891 WASHINGON : 2026
COMMITTEE ON THE JUDICIARY
CHARLES E. GRASSLEY, Iowa, Chairman
LINDSEY O. GRAHAM, South Carolina RICHARD J. DURBIN, Illinois,
JOHN CORNYN, Texas Ranking Member
MICHAEL S. LEE, Utah SHELDON WHITEHOUSE, Rhode Island
TED CRUZ, Texas AMY KLOBUCHAR, Minnesota
JOSH HAWLEY, Missouri CHRISTOPHER A. COONS, Delaware
THOM TILLIS, North Carolina RICHARD BLUMENTHAL, Connecticut
JOHN KENNEDY, Louisiana MAZIE K. HIRONO, Hawaii
MARSHA BLACKBURN, Tennessee CORY A. BOOKER, New Jersey
ERIC SCHMITT, Missouri ALEX PADILLA, California
KATIE BOYD BRITT, Alabama PETER WELCH, Vermont
ASHLEY MOODY, Florida ADAM B. SCHIFF, California
Kolan Davis, Chief Counsel and Staff Director
Joe Zogby, Democratic Chief Counsel and Staff Director
Subcommittee on Crime and Counterterrorism
JOSH HAWLEY, Missouri, Chair
LINDSEY O. GRAHAM, South Carolina RICHARD J. DURBIN, Illinois,
JOHN CORNYN, Texas Ranking Member
TED CRUZ, Texas AMY KLOBUCHAR, Minnesota
MARSHA BLACKBURN, Tennessee CHRISTOPHER A. COONS, Delaware
KATIE BOYD BRITT, Alabama RICHARD BLUMENTHAL, Connecticut
CORY A. BOOKER, New Jersey
Stephen Andrews, Republican Chief Counsel
Saurabh Sanghvi, Democratic Chief Counsel
C O N T E N T S
----------
OPENING STATEMENTS
Page
Hawley, Hon. Josh................................................ 1
Durbin, Hon. Richard J........................................... 3
WITNESSES
Baldacci, David.................................................. 9
Prepared statement........................................... 28
Responses to written questions............................... 82
Lee, Edward...................................................... 12
Prepared statement........................................... 33
Responses to written questions............................... 87
Pritt, Maxwell................................................... 4
Prepared statement........................................... 51
Responses to written questions............................... 93
Smith, Michael................................................... 6
Prepared statement........................................... 72
Viswanathan, Bhamati............................................. 8
Prepared statement........................................... 77
APPENDIX
Items submitted for the record................................... 99
TOO BIG TO PROSECUTE?: EXAMINING THE
AI INDUSTRY'S MASS INGESTION OF
COPYRIGHTED WORKS FOR AI TRAINING
----------
WEDNESDAY, JULY 16, 2025
United States Senate,
Subcommittee on Crime and Counterterrorism,
Committee on the Judiciary,
Washington, DC.
The Subcommittee met, pursuant to notice, at 12:03 p.m., in
Room 226, Dirksen Senate Office Building, Hon. Josh Hawley,
Chair of the Subcommittee, presiding.
Present: Senators Hawley [presiding], Durbin and Welch.
OPENING STATEMENT OF HON. JOSH HAWLEY,
A U.S. SENATOR FROM THE STATE OF MISSOURI
Chair Hawley. Welcome, everyone, to the hearing today,
which is entitled ``Too Big to Prosecute?: Examining the AI
Industry's Mass Ingestion of Copyrighted Works for AI
Training.'' This is the third hearing of the Senate Judiciary
Committee's Subcommittee on Crime and Counterterrorism, which I
am delighted to work on with my colleague, Ranking Member
Durbin.
I want to say a special thank you to the witnesses for
being here. Many of you, I think all of you, traveled in order
to be here today. Thanks to everybody for accommodating our
change in time. The Senate floor is going to be tied up here
later today, and thus, no Committee business is happening, so
thanks, all of you, for being here and for accommodating us.
I am going to make just a few opening remarks. Senator
Durbin will do the same. Then we will swear in the witnesses
and be off to the races.
Let me just start by saying that today's hearing is about
the largest intellectual property theft in American history.
For all of the talk about artificial intelligence and
innovation and the future that comes out of Silicon Valley,
here is the truth that nobody wants to admit. AI companies are
training their models on stolen material, period. That is just
the fact of the matter. And we are not talking about these
companies simply scouring the internet for what is publicly
available. We are talking about piracy. We are talking about
theft. For years, AI companies have stolen massive amounts of
copyrighted material from illegal online repositories.
Now, the FBI and the Department of Homeland Security
regularly prosecute individuals who engage in exactly the same
kind of behavior using platforms like LimeWire or Napster in
the old days, using a process called torrenting. But have these
Big Tech companies been prosecuted? No, of course not. They are
getting off scot-free. And this hearing will show us that Meta
and Anthropic and other AI companies are willfully using these
illegal networks, these torrenting networks as they are called,
to steal vast swaths of copyrighted materials.
The amount of material that we are talking about is
absolutely mind-boggling. We are talking about every book and
every academic article ever written. Let me say that again,
every book and every article ever written, billions of pages of
copyrighted works, enough to fill 22 libraries the size of the
Library of Congress. Think about that, 22 libraries of
Congresses full of works. That is how much has been stolen.
And this theft was not some innocent mistake. They knew
exactly what they were doing. They pirated these materials
willfully. As the idea of pirating copyrighted works percolated
through Meta, to take one example, employee after employee
warned management that what they were doing was illegal. One
Meta employee told management that, and I quote now, ``This is
not trivial.'' And he shared an article asking, ``What is the
probability of getting arrested for using torrents''--illegal
downloads--``in the United States?
Another Meta employee shared a different article saying
that downloading from illegal repositories would ``open Meta up
to legal ramifications.'' That is a nice way of saying that
what they were doing was exactly, totally, 100 percent barred
by copyright law.
Did Meta management listen? No. They bulldozed straight
ahead. We will see evidence today that Mark Zuckerberg himself
approved the decision to use these pirated materials. And then
the best part, Meta management tried to hide it. They tried to
hide the fact that they were engaged in the illegal download of
pirated works, and not just the illegal download, but the
illegal distribution of these same works. They tried to hide it
by using non-company servers. They went so far as to train
their AI model--get this. Meta trained its AI model to lie to
users about what data it had been trained on. I mean, you talk
about an inception-level-worthy deception, training the AI
model to lie about what its own sources were. This isn't just
aggressive business tactics. This is criminal conduct.
And I just want to point out, Meta's conduct is not an
exception. This is the rule when it comes to what is happening
right now in the AI space among these mega companies. Big Tech
operates on the model of do whatever you want and count on the
lobbyists and the lawyers to fix it later. They don't care
about the rule of law. They don't care about America. They
don't care about freedom. They certainly don't care about
working people. They care about power and they care about
money. And every time they say things like, we can't let China
beat us, let me just translate that for you. Every time they
say that, oh, we can't let China beat us, what they are really
saying is, give us truckloads of cash and let us steal
everything from you and make billions of dollars on it. That is
the translation. We are going to see that in the testimony and
the evidence today.
Here is the bottom line. We have got to do something to
protect the people of this country. I am all for innovation,
but not at the price of illegality. I am all for innovation,
but not at the price of destroying the intellectual property of
the average man and woman in this country. We have laws for a
reason. Those laws ought to be enforced, and Big Tech should
not be above the law. Enough is enough. It is time to enforce
the law, and that is what this hearing today is about.
Now, I will turn it over to Ranking Member Durbin.
OPENING STATEMENT OF HON. RICHARD J. DURBIN,
A U.S. SENATOR FROM THE STATE OF ILLINOIS
Senator Durbin. Thanks, Mr. Chairman.
The way AI interacts with intellectual property rights,
particularly copyrights, is a critical topic we can't overlook.
America's creative industries, including software, music,
movies, literature, collectively contribute over $1 trillion to
our economy each year, employing millions of people. While AI
can be an incredible tool that unlocks further creativity,
writers, artists, musicians, and others are rightfully
concerned about what technology would mean to them personally.
Should AI companies be able to use their materials freely as
``fair use'' or should they receive compensation when their
works are used to train AI models?
I want to tell you, chapter one, how I discovered
intellectual property. I was an attorney in Springfield,
Illinois, and in a rash moment decided to buy a restaurant. So
I joined a few friends and bought a restaurant, and we had live
music. And I got a phone call one day from a fellow who said, I
just was out at your restaurant. I said, great, did you have a
good time? Couldn't have been better. Saturday night, the music
was terrific. And I said, well, I am glad you had a good time.
And he said, you played 10 BMI tunes and six ASCAP tunes. I
said, no, I didn't, I didn't play any tunes. He said, well, the
way the law is written, you are responsible for the fact that
copyright material was used by you to make a profit at your
restaurant. I said, tell it to the judge. He said, no, before
you say that, call your friend over in Jacksonville, Illinois,
a few miles away and ask him about a similar experience. And
his reaction was the same as yours. I called my friend who
said, ask him how much money he needs each month for ASCAP and
BMI, and we started paying it. That was my first course in
intellectual property. I hold onto it to this day.
So how can creators compete with AI products that generate
content at the push of a button, especially when the content
might mimic or even produce their own work? These are just a
few of the questions that we are going to consider in this
hearing as we try to find the right balance between promoting
technological innovation, protecting the work of our Nation's
creators, and continuing to incentivize creativity in years to
come.
We must recognize that AI innovation and protection of
intellectual property rights are not mutually exclusive. That
is why it is troubling, as I listened carefully to the
Chairman, to hear stories about steps Big Tech companies are
taking to train their AI models on copyright materials without
compensation to the creators of these works. For example,
rather than license authors' works, companies like Meta and
Anthropic have obtained copyright materials from sites that
host pirated copies of the authors' books and writings.
Anthropic pirated over 7 million books from shadow libraries.
As Anthropic's CEO put it, Anthropic had many places from which
it could have purchased, but it preferred to steal them to
avoid ``legal practice business slug,'' whatever that means.
While Anthropic later became not so gung-ho about training
their LLM on pirated books for legal reasons, it kept the
pirated copies that it had already downloaded anyway. I don't
get that.
As a judge in the Meta case recently put it, ``Companies
have been unable to resist the temptation to feed copyright-
protected materials into their models without getting
permission from the copyright holders or paying them for the
right to use their works for this purpose.''
This hearing is going to be interesting. Thanks, Mr.
Chairman.
Chair Hawley. Thank you very much to the Ranking Member.
It is the practice of the Judiciary Committee and all of
its Subcommittees to swear in witnesses before they testify, so
could I ask you to stand up, raise your right hand, and repeat
after me.
[Witnesses are sworn in.]
Chair Hawley. Very good. We will now proceed to opening
statements. We will give 5 minutes to each witness. I will just
say a brief word of introduction before each witness. We will
just go straight down the table here down the dais. We will
start with Mr. Max Pritt. Mr. Pritt is a partner at Boies
Schiller, and he represents authors in a civil copyright
infringement suit against Meta, among other matters.
Mr. Pritt, the floor is yours.
STATEMENT OF MAXWELL PRITT, PARTNER, BOIES SCHILLER FLEXNER
LLP, SAN FRANCISCO, CALIFORNIA
Mr. Pritt. Chairman Hawley, Ranking Member Durbin, thank
you for the invitation and opportunity to testify today. The
Art of the Deal by Donald Trump, Hillbilly Elegy by J.D. Vance,
Theodore Roosevelt: Preacher of Righteousness by Josh Hawley,
these are just a handful of the many, many millions of
copyrighted books and publications that some of the world's
largest and wealthiest corporations--Meta, OpenAI, Anthropic,
and others--knowingly and intentionally pirated from illicit
online marketplaces for financial gain and to seek a
competitive advantage in AI.
Today, this Committee begins to investigate and shine a
light on what is likely the largest infringement of American
intellectual property by U.S. companies in our Nation's
history. As tech companies scrambled to release generative AI
models and to catch up with OpenAI's ChatGPT, many of them
turned to illicit online repositories to take tens of millions,
if not hundreds of millions, of books and scholarly
publications and articles for free instead of buying them or
licensing them from copyright owners. By pirating these works,
AI companies have built a multibillion-dollar industry that is
projected to be a trillion-dollar industry in the next few
years without paying a single cent to the authors whose works
power their products or the publishers responsible for
introducing those works to the public here and abroad.
Take Meta, for example. From the early days of its
generative AI program, Meta concluded that training its models
using books and articles would help their performance. But
instead of buying or licensing these works from copyright
owners, Meta decided to take them from notorious online
marketplaces of stolen copyrighted works, including some of the
same ones targeted by the Department of Justice and the FBI for
criminal copyright infringement. And Meta didn't just download
books from these illegal repositories. It used the same kind of
peer-to-peer file-sharing networks that powered Napster. In
other words, Meta also made copies and sent them to other
pirates.
In total, Meta pirated well over 200 terabytes, terabytes
of pirated books and articles, a size comparable to the entire
printed collection of the Library of Congress 20 times over, or
the equivalent of a stack of many billions of pages of text.
Meta's piracy included many millions of works, including at
least 12 books authored by Members of this very Subcommittee
and every U.S. President and Vice President in the 21st
century. Meta also made and sent copies of over 40 terabytes of
pirated works to others.
In doing so, Meta has helped to revive online piracy by
propping up the foreign criminal syndicates that run these
illicit marketplaces to violate U.S. copyrights around the
globe. As Anna's Archive, the largest illicit online
marketplace of stolen literature in the world today, says on
its own website, ``Shadow libraries were dying. Then came AI.''
Meta is not alone, and it was not the first U.S. company to
engage in rampant domestic piracy for its own commercial
purposes. Pending lawsuits against OpenAI and Anthropic
revealed that both companies also pirated millions of
copyrighted works. And the decisions to engage in this mass
domestic piracy were made at the highest levels. Company
documents that are now public show, for example, the decision
to pirate instead of license was approved by Meta's co-founder
and CEO, Mark Zuckerberg, himself.
This decision to engage in mass piracy was made, even
though key employees knew that doing so was both illegal and
unethical. One Meta researcher argued that using pirated
material should be beyond our ethical threshold. Another called
Meta an accomplice to piracy. Yet another warned that if the
media got wind of the company's use of pirated data, it could
undermine Meta's negotiating position with regulators, the very
people in this room and across the hall, in the White House,
and in State houses across the country. And when asked if he
cared whether Meta protects human creativity rather than
exploits it, Meta's head of AI partnerships testified, he does
not care.
AI companies now seek a pass for this unprecedented piracy
by invoking a limited exception to copyright infringement
called fair use, which Congress codified in the Copyright Act
of 1976. They also argue they can't compete with China if they
can't infringe every American's copyright. Nonsense. Our tech
companies employ the best and brightest minds in the world, and
they are the wealthiest corporations in the world. It is not
credible for these companies to argue they can invest hundreds
of billions of dollars into hiring talent and building data
centers to power their commercial AI products and models, but
they can't pay a single cent to copyright owners. There is no
carveout in the Copyright Act for AI companies to engage in
mass digital piracy.
I am grateful to Chairman Hawley, Ranking Member Durbin,
and this Subcommittee for your attention to the issue. I look
forward to your questions.
[The prepared statement of Mr. Pritt appears as a
submission for the record.]
Chair Hawley. Thank you very much.
Next up is Professor Mike Smith. Professor Smith is
professor of information technology and marketing at Carnegie
Mellon University. He has written extensively on piracy and its
effects on innovation. Professor Smith.
STATEMENT OF MICHAEL SMITH, PROFESSOR OF INFORMATION TECHNOLOGY
AND MARKETING, CARNEGIE MELLON UNIVERSITY, PITTSBURGH,
PENNSYLVANIA
Professor Smith. Chairman Hawley, Ranking Member Durbin, I
am very honored and thankful for the opportunity to testify
today on this important issue. My testimony today is informed
by 25 years of empirical research into the impact of new
technologies on the markets--on the creative markets and my
experience serving on a roundtable of 10 economists convened by
the U.S. Copyright Office to study the implications of
generative AI on copyright policy.
My research into piracy started in the early 2000's when
digital piracy was a relatively new problem for the creative
industries. During that period, many in the tech community
argued that piracy was fair use because it would not harm legal
sales, was unlikely to harm creativity, and any legislative
efforts to curtail piracy would not only be ineffective, but
would also stifle innovation.
My empirical research over the past 25 years has studied
these questions. In 2020, my colleagues and I surveyed over 40
papers published in peer-reviewed academic journals as part of
a piracy landscape study we wrote for the U.S. Patent and
Trademark Office. Our report drew three broad conclusions.
First, the peer-reviewed academic literature shows that
digital piracy does harm creators by reducing their ability to
make money from their creative efforts.
Second, the peer-reviewed academic literature shows that
digital piracy does harm society by reducing the economic
incentives for investment in creative output.
Third, the peer-reviewed academic literature shows that
copyright enforcement has been effective in reversing these
harms while also allowing businesses and legal online
distribution platforms to thrive and innovate.
Today, we're hearing many of the same arguments we heard in
the early days of the internet. Allowing generative AI
companies to use pirated content to train their models is fair
use because it won't harm legal sales, won't harm creativity,
and any enforcement efforts to curtail the use of pirated
material for training will not only be ineffective, but will
also stifle innovation.
My response to those arguments is that while the time has
changed, the underlying economic principles are the same today
as they were in 2000. And by applying those principles, I think
we can draw many of the same conclusions.
First, the use of pirated content to train generative AI
models will harm sales for creators. Allowing generative AI
companies to train their models with pirated content is likely
to harm markets for creators by damaging the original markets
for their work, by damaging licensing markets for those works,
and by creating perverse incentives for bad actors to add new
copyrighted content to pirate networks, in essence, allowing
generative AI companies to launder licensable content through
piracy.
Second, the use of pirated content to train generative AI
models will harm society by reducing economic incentives for
creators. This conclusion is similar to the early piracy
research: Economic incentives drive creative output. But there
is a new and unique indignity to our current situation. When
piracy is used to train generative AI models, we're not only
stealing from creators, we're then using the theft of their
content to create tools that can flood the market with machine-
generated output, which in turn will replace many of those
creators, particularly emerging artists.
And third, as in the early days of piracy, I believe that
enforcing copyright law in the context of generative AI
training can be effective at reversing these harms and can
create a world where both the creative industries and the
technology industries are able to thrive. If the Napster and
Grokster decisions had gone the other way in the early 2000's,
it is hard to imagine that Spotify and Netflix would exist
today, and that would be to the detriment of consumers, the
creative community, and the technology community.
I think today we have a similar opportunity to create a
win-win-win for society, creators, and tech firms by making it
clear that piracy is wrong and that a vibrant technology
economy depends on a vibrant creative economy. We found a way
to make licensed streaming and sales channels work for
consumers, copyright owners, and platforms in the early 2000's.
We must do the same for generative AI today.
Generative AI has the potential to benefit industry and
society in many ways, but achieving that potential will require
a more robust and transparent partnership between technology
firms and the creative industries. On our current path, we risk
killing the goose--or in this case, the authors, musicians,
coders, and filmmakers--who laid the golden eggs that are key
to the present and future value of generative AI output.
I thank you and look forward to your questions.
[The prepared statement of Professor Smith appears as a
submission for the record.]
Chair Hawley. Thank you very much.
Next up is Professor Bhamati Viswanathan. Did I get that
right, Professor? Am I close?
Professor Viswanathan. Perfect. Thank you.
Chair Hawley. Okay.
Professor Viswanathan. Perfect.
Chair Hawley. Professor Viswanathan is a professor of law
at New England Law School, and she is an expert in AI and
copyright. Thank you for being here. The floor is yours,
Professor.
STATEMENT OF BHAMATI VISWANATHAN, PROFESSOR OF LAW, NEW ENGLAND
LAW SCHOOL, BOSTON, MASSACHUSETTS
Professor Viswanathan. Chairman Hawley, senior Ranking
Member Durbin, and Members of this Subcommittee, thank you so
much. I am honored to testify today on a subject that I feel
passionate about.
I feel that Senator Hawley did an excellent job of laying
the table for us. I would like to drill down on what he's
presented us with so far and help us walk through this.
So first, it's an interesting moment that we're at.
Generative AI is a promising set of technologies, and I think
we can all agree that they're beneficial. However, the training
that they're engaging in is deeply problematic and troubling,
and courts don't know what to do about this yet. They haven't
reached a consensus on what should obviously be done about the
training of AI on pirated works. So I would like to give us a
call for action and a solution as I talk us through this.
First, we know that what pirate websites are doing is
illegal. How do we know that? Multiple actions have been
brought against pirate sites, and in every case, the pirate
websites or repositories have lost. The FBI, the Department of
Homeland Security have gone after pirate websites and tried to
shut them down. Now, of course, we all know this can be like
whack-a-mole, right? They shut down, they come back up again.
But the point is it's well-established that what they're doing
is illegal, and that makes sense.
If you and I stole books from the library or from a
bookstore and said, I need to train, I need to learn, I need to
develop my mind, we wouldn't argue that this is fair use. We'd
say you can't steal the materials even for a good cause. That's
not even what's happening here. The AI generative companies are
going to pirate websites, stealing the materials that have
already been stolen. It is a crime compounding a crime. How is
this fair?
Say you want to go drag racing, an illegal activity. I tell
you, hey, there's a shop down the street that sells stolen cars
for cheap. Go buy a car and you can drag race. You go, great,
that helps me be able to afford what I want to do. You buy a
stolen car, you drag race, you win. Do you now get to say, hey,
it's okay that I stole that--I bought that stolen car. It's
okay that I engaged in illegal activity. Neither activity is
legal, and one is compounding the other, and that's what's
happening here. It's simple. It's a crime compounding a crime.
And it's not a victimless crime. As Professor Smith showed
us, there are real victims here, the loss of author's
livelihoods. Mr. Baldacci will be eloquent on this topic, but
as an author myself, I feel the same. The loss of my livelihood
not only hurts, but it affects what I have spent my life
training to do.
It contravenes copyright laws, basic incentive structure. I
don't just teach copyright, I teach constitutional law as well.
This is enshrined in the United States Constitution. The
Intellectual Property Clause is one of the things that makes
this country not just great, but robust, powerful, economically
hugely successful. Over $1 trillion in revenues from the
creative content industries, this is truly at risk right now,
this entire incentive structure that was brilliantly thought of
by our Founders.
It has negative incentives. If you know that you can go to
a pirate website and steal things, why would you ever pay for
anything again? The generative AI companies have shown us the
way to massive theft, not just by themselves, but by others as
well. It depreciates the quality and the quantity of works out
there. The tradeoff of copyright law is you, the copyright
author, take the risk, and the market rewards you with rewards
if the marketplace likes what you've done. There is no
incentive structure anymore. That's been undermined by what's
happening now.
And there's a solution. The solution is licensing. It
already exists, the licensing of works, the fair compensation
of creators. These are all things that actually exist now. We
don't even need new legislation in some ways. We might want
that as well someday, but right now we have a solution. Enforce
good, standard, accepted, acknowledged licensing practices.
None of this is to say that we're against innovation. We
all believe in innovation. We believe that generative AI has
potential. But you cannot compromise the livelihood of
creators. You cannot compromise our trove of creative activity
and our entire world of art and culture and the things that we
have done that make us most human and that enrich us the most--
you cannot compromise those simply by saying we need new
technologies to flourish. What we need is for new technologies
to flourish fairly, sustainably, in ways that make sense to us
and that have already been provided for by our Constitution, by
the U.S. copyright law, by intellectual property law itself.
It is critical that Congress recognize that this is the
tradeoff that matters for the livelihoods of everyone whose
lives right now and well-being are at risk.
Thank you so much.
[The prepared statement of Professor Viswanathan appears as
a submission for the record.]
Chair Hawley. Thank you very much, Professor.
Next is Mr. David Baldacci. Mr. Baldacci is one of the
best-selling authors in America. I don't know how many books he
has had as the number one New York Times bestseller. I bet he
knows. Maybe he will tell us. I have read his books. I am
delighted to have him here today. He is going to tell us about
AI's impact on authors. Welcome, Mr. Baldacci.
STATEMENT OF DAVID BALDACCI,
BESTSELLING AUTHOR, RICHMOND, VIRGINIA
Mr. Baldacci. Thank you. It's a lot, number one, best-
selling.
[Laughter.]
Mr. Baldacci. I'll leave it at that.
Chairman Hawley, Ranking Member Durbin, Members of the
Subcommittee, 119 years ago, Mark Twain traveled to D.C. and
appeared before a Congressional Committee to advocate on behalf
of copyright--stronger copyright laws. He was the most pirated
author of his day. I'm pirated all over the world as well. I
get why that upset him. He thought creative arts was the
lifeblood of this country, and I agree with him. That was the
first time at that hearing that he wore his signature white
suit publicly, and he did so because he thought it represented
purity of thought and spirit. I don't own a white suit, and
even if I did, I don't think my wife would have let me wear it
today, so you just get blue.
Twain once said that ``Travel is fatal to prejudice,''
meaning if you meet people where they live, you find out
they're just like you. I had no chance to leave the segregated
world of Richmond, Virginia, when I was growing up, but I
visited the library every week, and I liked to think through
books. I traveled the world without a plane ticket or a
passport. And born from my love of reading came my desire to be
a writer.
I worked away for decades and getting rejected over and
over, but I kept going, honing my craft, remaining disciplined,
taking the rejections head on, and using them as motivation,
and finally I was successful. And after 60 novels under my
belt, I work just as hard as I ever have. It's the American
way. You work hard, you play fair, you stay the course, and
you'll make it.
I truly believed that until my son asked ChatGPT to write a
plot that read like a David Baldacci novel. In about 5 seconds,
3 pages came out that had elements of pretty much every book
I'd ever written, including plot lines, twists, character
names, narrative, the works. That's when I found out that the
AI community had taken most of my novels without permission and
fed them into their machine learning system. I truly felt like
someone had backed up a truck to my imagination and stolen
everything I'd ever created.
I'm aware of the argument that what AI did to me and other
writers is no different than an aspiring writer reading other
books and learning how to use them in original ways. I can tell
you from personal experience that is flatly wrong.
I was once such an aspiring writer. My favorite novelist in
college was John Irving. I read everything that Irving wrote.
None of my novels read remotely like a John Irving novel. Why?
Well, unlike AI, I can't remember every line that Irving wrote,
every detail about his characters and his plots. The fact is,
also unlike AI, I read other writers not to copy them or steal
from them but because I love their stories. I appreciate their
talent. It's motivated me to up my game.
What AI does is take what writers produce as an incredibly
valuable shortcut. It's like super fuel to teach software
programs what they need to know. And I have learned that these
trillion-dollar companies didn't even buy my books. They got
them off a website that has pirated works. They complained that
it would be far too difficult to license the works from
individual creators, so apparently, it was more efficient to
steal it. Trillion-dollar companies with battalions of lawyers
did not have the resources to do things lawfully.
I was once a trial lawyer. If I had made that argument in
court, I would either have been laughed out of the courtroom or
held in contempt by the judge and rightly so. If AI companies
only needed words, they could have fed every dictionary in the
world into their machine learning, but that was not nearly good
enough because it would mean decades of additional work and
hundreds of billions of dollars of additional investment. What
they needed was complete, well-crafted, living, breathing
stories with characters that seemed real, plots that made
sense, dialog that appeared genuine, humanity on the page. In
sum, they needed us and our craft that we learned with the
sweat of our brows and the flexing of our imaginations.
And these companies have swooped in, stolen that labor in
order to make enormous profits. But we, the writers, the true
source of all of this, will receive nothing. AI will allow
anyone, with no effort at all, to order up a novel written in
the vein of an established writer. And that book can be sold
saying that it reads just like a David Baldacci novel. Yes, it
does read like my novels because it is my novel. It is my
imagination.
People complain about cheap imported goods hurting American
workers. Well, we have cheap books being created by American
technology flooding the market. That will mean lower profits
for publishers and less money to spend on new emerging writers.
Trust me, that hurts all of us.
Online vendors now require the author to disclose if a book
was not human-created. It's getting to the point where they
will have to limit the number of books that someone can publish
on a weekly or even daily basis. This is insane.
Source code and elements of algorithms are also protected
by copyright. I would hazard to bet that if I stole any of the
AI community's source codes or algorithms and then tried to
profit off them, they would unleash a tsunami of lawsuits
against me. However, if, as AI contends, fair use is actually
my entire body of work, there is no more copyright protection
for anyone. I'm sure AI believes that their IP should be fully
protected against interlopers, and I agree with them. Thus, I
am deeply disappointed they don't feel the same about people
like me.
The AI community apparently is there entitled to steal our
work product despite it being copyrighted because what they're
doing is so transformational. Well, let me tell you, billions
of people have been transformed by books. Many significant
events in human history and in this country had seminal authors
in their works that wrote at the head of the pack. We didn't
truly emerge from the dark ages until the invention of the
printing press when books became widely available. Books also
teach empathy, making the world a kinder, gentler, more
meaningful place.
I'm only one man, but books transformed my life, propelling
me to a far better existence. I am sure there are aspects of AI
that will also transform the world, but if you want to bet on
which side is more transformational for all of us, I will bet
on books every single time.
Thank you.
[The prepared statement of Mr. Baldacci appears as a
submission for the record.]
Chair Hawley. Thank you very much, Mr. Baldacci, very well
said.
Next up and finally is Professor Edward Lee. Professor Lee
is professor of law at Santa Clara University School of Law,
where he has written extensively about the intersection of AI
and copyright law.
Thank you for being here, Professor Lee.
STATEMENT OF EDWARD LEE, PROFESSOR OF LAW, SANTA CLARA
UNIVERSITY SCHOOL OF LAW, SANTA CLARA, CALIFORNIA
Professor Lee. Chair Hawley, Ranking Member Durbin, and
other Members of the Subcommittee, thank you for this
opportunity to testify. I am a professor of law at Santa Clara
University School of Law. I'm also a book author and a
photographer, and my personal experience informs my scholarship
and understanding of the importance of copyright to authors and
artists across the country.
In my testimony, I will discuss whether using copyrighted
works to train AI models is a fair use, giving particular
attention to the two recent decisions by Judges Alsup and
Chhabria in cases filed by book authors against Anthropic and
Meta. This novel question of law, which has important
implications for U.S. national interest, has sparked sharp
disagreements among parties, stakeholders, and now Federal
judges. As Judge Bibas noted in an earlier non-generative AI
case, this question of law is difficult.
In my opening remarks, I would like to stress three points.
First, I believe Judges Alsup and Chhabria correctly concluded
that the use of copies to--the use of copies of works to train
an AI model serves a highly transformative purpose in
developing a new technology under factor one of fair use.
During training, an AI model is exposed to vast training
materials, typically many millions of works. Through a process
called deep learning, the model identifies the statistical
relationships among words and within subparts of words, thereby
enabling the model to conduct numerous functions, including
research, translation, delivery of medical advice, generation
of content, and so forth.
As Judge Chhabria concluded in his opinion, ``The purpose
of Meta's copying was to train its large language models, which
are innovative tools that can be used to generate diverse texts
and perform a wide range of functions.'' And as Judge Alsup
recognized, ``The technology at issue was among the most
transformative many of us will see in our lifetimes.''
Now, the history of AI development strongly supports this
conclusion. It is important to understand why AI researchers at
universities began training AI models on large datasets. This
practice originated not at AI companies, not at Big Tech, but
at universities where AI researchers discovered a key insight.
Scaling or using larger and more diverse datasets actually
worked in developing and improving AI models, an achievement
that escaped researchers for many years. This seminal
breakthrough, which took decades to figure out, has propelled
the advances of AI that we are witnessing today.
Second, while I agree with the ultimate findings of fair
use in both cases, it's important to remember that fair use is
fact-specific and decided on a case-by-case basis. In some
cases, a transformative purpose in AI training might be
outweighed by the other factors. For example, an AI model that
routinely produces outputs that are infringing, such as
regurgitations, might not be a fair use even in the training of
the model due to insufficient guardrails on the model.
Critically, in the cases against Anthropic and Meta, the
judges concluded the plaintiffs did not show the models had
produced any infringing outputs of the plaintiff's works. And
that can be appealed, but that is the findings of both judges.
My final point is the need for caution, caution by the
courts, caution by Congress, and the States. I believe it's
important to weigh the United States' interest in AI
innovation. President Trump has issued an executive order
making U.S. development and global leadership in AI a national
priority. China has its own priority and a plan of surpassing
the United States and becoming the world leader in AI by 2030.
The United States' national priority in AI counsels caution.
Indeed, in Google v. Oracle, another technology fair use
case of national importance, the U.S. Supreme Court itself
cautioned, ``Given the rapidly changing technological,
economic, and business-related circumstances, we believe we
should not answer more than is necessary to resolve the
parties' dispute.'' Judges Alsup and Chhabria departed from
this approach in some controversial parts of their opinions
that were just dicta. I disagree with Judge Alsup's suggestion
on pirated books and Judge Chhabria's suggestion on copyright
dilution, as more fully elaborated in my written statement.
At this juncture, I think the best approach is for Congress
to wait and see how other district courts, the courts of
appeals, and potentially the U.S. Supreme Court resolves these
difficult issues, including access to pirated shadow libraries
in the many pending copyright lawsuits across the country.
Thank you, Senator.
[The prepared statement of Professor Lee appears as a
submission for the record.]
Chair Hawley. Thank you very much, Professor. Thanks for
being here. Thanks again to all of our witnesses.
We are going to now have 7-minute rounds of questioning,
and we will see if we can fit in maybe a couple of rounds, just
depending on the time that we have. I will start, and then we
will go to the Ranking Member and any other Members who arrive
in that time.
Professor Viswanathan, let me just start with you, if I
could, and let's see if we can just drill down on some of the
specifics here. Mr. Baldacci mentioned in his opening statement
that AI could just feed dictionaries into their platforms in
order to train them. They don't do that. They prefer published
works, fully formed works. Why is that? Can you give us an
insight into that?
Professor Viswanathan. That's absolutely right. They learn
syntax, structure. They learn how we learn language, right?
When you learn language, you just don't learn words. You don't
memorize words. You don't memorize notes when you learn music.
You learn structure and syntax. And the point that Professor
Lee is making is correct. They need large datasets. More is
better to learn predictive language models. However, more is
not everything. It's not pirated works.
Chair Hawley. So let me just ask this. You said that they
are not buying the books. They are not buying Mr. Baldacci's
book or anybody's book who is sitting up here, anybody in the
audience. They are getting them. They are stealing them. They
are pirating them from somewhere. If they are not buying the
books, they are not stealing them out of libraries, where are
they getting them?
Professor Viswanathan. These large repositories of
materials that are available online, there are many. Some are
licit, some are not licit. The pirate websites in particular
are not licit. So if you need a lot of material, you go out and
you scoop up all that material that you can find, but you don't
go to pirate websites to get that material if what you want to
do is legal. None of these works are licensed. None of these
works are licensed. No author has been compensated to date.
Chair Hawley. They go to these--let's call them shadow
libraries--to get the works illegally. By the time they go to
the shadow library, the works there are already stolen, right?
They have already stolen Mr. Baldacci's book, Professor Lee's
book, everybody's, your books. They have stolen them. When they
go to the shadow library, how do they get them? I mean, how
does the AI company then take possession of the particular
work?
Professor Viswanathan. There's a process called torrenting,
and I will not trouble you all with the details of torrenting,
but essentially huge amounts of data streamed to you and you
get them. At the same time, you can send them out. That's
called seeding. You can send them out at the same time.
Uploading and downloading exists at the same time. This is a
peer-to-peer process. So not only are you taking in these
pirated materials, you are also distributing them. The
violation of copyright law exists at the reproduction of these
works, at the making available of them by the pirate libraries,
the dissemination of them, and your dissemination gen AI
company of them as well.
Chair Hawley. So they are both taking the works and
distributing them as well in this thing called, kind of like
Napster, this thing that you call torrenting. Let me ask you
this. I mean, is torrenting legal? That is not legal, is it?
Professor Viswanathan. Torrenting can be illegal, but in
this case, it is not. And in this particular case, this is
benefiting the--now I agree with Judge Alsup who said, if
you're taking it from pirate libraries, no way. That is not
acceptable, right? Part of what we're seeing here, Judge
Chhabria said, well, it's not helping the pirate websites.
Well, yes, it is. The pirate websites, there's one in
particular called Anna's Archive. They actually put on their
website, hey, gen AI companies, come train on us. We'll do some
data swaps. Or, you know what, you can make us a donation too.
This is directly helping the pirate websites thrive, flourish,
proliferate.
Chair Hawley. Let me ask you this. Have there been, to your
knowledge, any criminal enforcements against these torrenting
platforms?
Professor Viswanathan. Yes, there have been attempts to.
Again, it's like a game of whack-a-mole. You get one, you knock
it down, it pops up again in some jurisdiction that you don't
have control over.
Chair Hawley. What is the key to criminal enforcement? You
know, civil versus criminal in this context, when do we have a
criminal case against torrenting? What is the key to that?
Professor Viswanathan. Okay. This is a really important
point. What's criminal here? Criminal copyright liability has
two prongs to it. Prong one is you have to do it willfully, and
prong two is you have to do it for commercial advantage or
gain. We clearly know that prong two is met. This is for
commercial advantage or gain. I don't think Meta is doing this
out of the goodness of its heart. Prong one, willful means you
need to know that what you are doing is illegal. There's lots
and lots of evidence now, particularly from the Kadrey v. Meta
case, that shows that they knew this was illegal. They even had
to ask all the way up the chain of command to Mark Zuckerberg
and say, hey, is this okay? And he said, yes, it's okay.
So not only did he do it knowing it was illegal, he did it
knowingly, he did it willfully, intentionally, and whether or
not he knew what statute it was legal doesn't matter. For this
to be willful, you have to know that what you're doing is
wrong, and this meets that prong. So this is, in fact,
amounting to what you might call criminal copyright liability.
Chair Hawley. Mr. Pritt, let me just ask you about this,
about the willful aspect, and let's talk about Meta in
particular, since Professor Viswanathan just mentioned Meta.
They are one of the biggest monopolists in the world and one of
the biggest AI companies now in the world, if not the biggest.
So let's just talk about them for a second. Meta uses torrents
to acquire pirated data for its Llama model, is that right?
Mr. Pritt. Correct.
Chair Hawley. How much data would you estimate that Meta
has torrented? It is illegally downloaded and also then shared
in this peer-to-peer scheme.
Mr. Pritt. It has pirated well over 200 terabytes of
copyrighted material from multiple--I don't call them shadow
libraries because they're not libraries--but illicit criminal
enterprises.
Chair Hawley. And how much has it paid the copyright
holders for these works that it has used, to your knowledge?
Mr. Pritt. Nothing.
Chair Hawley. Nothing, zero. So billions of works, billions
of books like Mr. Baldacci's, zero payment. If Meta were to
pay, do you have any idea what the cost might be? I mean, to
your knowledge and your discovery, did they ever explore
paying? I mean, is there any sense of how much this might have
cost them?
Mr. Pritt. Early on, they explored licensing. They assigned
two individuals part-time to attempt to license, and they
decided it would take too long, for example, and that's when
they turned to piracy. At the time, they had public documents
show that certainly tens of millions, if not hundreds of
millions, had been contemplated for licensing at that time.
Chair Hawley. Okay. So let's just think about this.
Hundreds of millions of dollars, that is the value, maybe sort
of the base, the bare value of the works that they have used,
like the works that you all have written on this panel,
hundreds of millions, and they paid zero of that.
So let's just drill down a little further. Did Meta know
what they were doing was wrong? Do you, Mr. Pritt, believe in
the evidence you have seen that there is any evidence to
suggest that Meta's employees knew what they were doing is
illegal?
Mr. Pritt. I think the documents that have become public
clearly show that.
Chair Hawley. Let's just look at a few of these documents.
I am going to show you a few things, and I will ask you to help
me interpret them to make sure that we get them right. Let's
start here with a Meta employee, a Meta engineer working on
their AI project, Eleonora Presani. She says, ``I don't think
we should use pirated material.'' This is in a chat with other
Meta employees. ``I don't think we should use pirated material.
I really need to draw a line there.'' She goes on, ``I feel
that using pirated material should be beyond our ethical
threshold. Sci-Hub, ResearchGate, LibGen are basically like
Pirate Bay or something like that. They are distributing
content that is protected by copyright, and they are infringing
it.'' How do you read this, Mr. Pritt? Does this look like
knowledge to you?
[Poster is displayed.]
Mr. Pritt. That's certainly what we've argued in the case.
Chair Hawley. Let's look at another Meta employee. Here is
Nisha Deo in the same chat. She replies and said, ``It's the
piracy (and us knowing and being accomplices) that's the
issue.'' This is a Meta engineer working on their AI project.
``It's the piracy (and us knowing and being accomplices) that's
the issue.''
[Poster is displayed.]
Let's look at another one. Here is the response that
another Meta engineer in the same chat gave. ``Well, we want to
buy books and be nice, open people here. But, however, to make
it happen and not letting the bad guys win''--that's the beat-
China argument--``we need to make a case--fast--and cut some
corners here and there.'' ``We need to cut some corners here
and there.'' Mr. Pritt, what are we looking at here? I mean, is
this knowledge of illegal activity?
[Poster is displayed.]
Mr. Pritt. When they refer to bad guys, I think they're
actually referring to OpenAI and other AI competitors.
[Laughter.]
Mr. Pritt. But yes, this is certainly one of the many
documents that show that they knew these were pirated websites
that contained copyrighted materials, and they were taking them
for free.
Chair Hawley. So here we have it in black and white. Don't
believe me. Read the evidence. These are Meta's own engineers,
Meta's own employees saying, they know what they are doing is
ethically wrong, illegal, likely to subject them to legal
liability, and they are doing it anyway because they need the
money.
There is a lot more here. We will come back to this. I want
to give Senator Durbin a chance to ask questions. Senator
Durbin.
Senator Durbin. Thanks, Mr. Chairman.
I want to ask startup questions with Mr. Baldacci. A number
of authors have shared with the public the process they go
through to write a book. I believe John Irving in The Imaginary
Girlfriend did that. I think John McPhee has done that in the
past. Stephen King has done that. Give us a kind of an insight,
now that you have published successfully in volume, what the
process is in writing a novel.
Mr. Baldacci. Well, you know, one, you have to sort of be
in love with words and storytelling because that is sort of the
essence of what you're trying to create. You draw upon personal
experiences, your own curiosities, people you've met along the
way, things that have happened to you, places you've traveled
to, humanistic experiences that a software platform really
can't replicate. And if it ever manages to do it, I would like
another planet to live on, quite frankly.
And for me, it was 20 years of hard work learning the craft
before I ever was published at all. I started writing short
stories and wrote them for 15 years when I was in college and
law school and tried to get them published and was not
successful. But it's a craft that you build over time. And you
have a lot of frustration, a lot of dips and valleys. Good
times happen, bad times happen, rejections happen. You learn
from them, you keep going. And at the end of the day,
hopefully, you get good enough to where someone who has the
ability to make your career happen will read your material and
respond to it, and you can then maybe hopefully write for a
living. And that's what happened to me after a long period of
incubation.
You never really see a lot of young writers--you know,
you're not going to see a lot of teenage writers making it big
because writing is about life, and you have to have something
to sort of write about. And it takes a long time. And that is
why I felt when my son brought this up where every single one
of my books was presented to me in an outline in like 3
seconds, it really felt like I had been robbed of everything my
entire adult life that I had worked on now was in the
possession of someone else that someone else I didn't even know
could then use to write their own books that are actually my
books. I mean, that's not supposed to happen in this country.
And that's what was so enraging to me that I--I license my
work all over the world. I license it for different foreign
publishers. I license my work for television and movies and all
types of endeavors. And I am open to any offer. If someone
comes to me and wants to license my work, I will listen to
them. If we can negotiate something that's agreeable to both
parties, I will do it, and they can use my work for the
parameters that are in the licensing agreement, and life can go
on and people can be happy.
But the uncertainty of like stealing stuff from pirated
sites operated in Russia just so you can gain an advantage and
you don't really care about what happens to the likes of me and
other writers coming up--I make a lot of money from my
publisher, and my publisher has used that money to take risks
on new writers coming up they ordinarily would not have been
able to take a risk on. So when you hurt established writers
like me, you hurt all the other writers coming behind us.
Senator Durbin. So when you are in the creative process of
writing novels and other things, are you policing against
plagiarism?
Mr. Baldacci. I get--I am pirated a lot, but I never worry
about that because my ideas are my ideas. And I--nobody has the
sort of mindset and the experiences that I have, nor do I have
the mindset and experiences of other people. It is very
individualized. I never worry about that I'm going to
inadvertently take something away from another writer because
my stories are my own.
And that's why a software platform, the only thing they can
do is take from what has already been created. They can't
create anything really on their own. They take my mishmash and
put it all together and throw it out the other end, but it
still looks like my stuff because it is my stuff.
Senator Durbin. Professor Lee, if I understand part of your
argument here, you were suggesting that this is the age of
innovation. Deep learning deserves special treatment. We've
been through this argument in Congress before. Section 230 is a
good illustration of that. We decided this fledgling industry
called the internet just may not have a future, better be
careful, so we exempted them from liability. Is that what you
are suggesting?
Professor Lee. Not at all, Senator. My position is that we
should pay heed to the existing Supreme Court precedence on
fair use, which repeatedly states that fair use is a flexible
doctrine decided on a case-by-case manner. And there is a way
for authors to prove market harm based on a taking or the
copying of protected elements of their works.
Judge Alsup said, if the authors show that there is market
harm based on an output of this model, you could bring another
case. And that's exactly, I think, the approach to strike the
correct--as you mentioned earlier at the opening remarks--to
strike the right balance between protecting copyrighted works
and authors and protecting innovation. Even just a story in
Emerson v. Davies recognized that not everything in a book is
protected by copyright. Authors build on the past books to
write new books, and that fuels creation.
And here, the line that Judge Chhabria and Alsup drew in
terms of non-infringing output--or excuse me, just Judge
Alsup--there is no copyright claim in the production of non-
infringing works.
Senator Durbin. I am sorry to interrupt you, but I only
have a minute left. It looks to me like you are shifting the
burden to the author of the creative work when there is an
assertion of fair use here. So Meta or others can virtually
steal this creative product of Mr. Baldacci and others, and
then he has the responsibility of proving that there has been
an economic loss to him as a result of it?
Professor Lee. Not at all, Senator. The judges explained in
their opinions that the--yes, the initial burden for fair use
is on the defendant, but the defendants in both cases provided
evidence that there was no output of infringing works. And the
question then becomes, will the plaintiffs present contrary
evidence? And neither judge found evidence of outputs that had
substantially similar copies of the plaintiff's works. So the
entire----
Senator Durbin. So, ultimately, the thievery, if you want
to use that word, of the creative work is for the economic
benefit of those who are creating the AI, is it not?
Professor Lee. Not necessarily. I think if the plaintiffs
are able to prove cognizable market harm from the copying of
their copyrighted expression, then the fair use argument is
likely to fail for their training.
Senator Durbin. I am coming at it from a different angle. I
am talking to you about why do we have AI? Why are we
interested in AI? Clearly, it is a commercial purpose, is it
not?
Professor Lee. Oh, entirely. For the AI companies, yes.
Senator Durbin. For the companies. So that they are
ultimately the winners in this approach that you are taking. We
assume we are in the world of new innovation here, and there is
a use of someone else's creative work. The burden is on them to
prove that they have lost money because of that piracy. But the
ultimate winner in this is going to be the AI because if they
escape this responsibility, they can use Mr. Baldacci's product
and make money off of it.
Professor Lee. Yes, if the training is considered a fair
use, the direct benefit would be to the AI companies. I grant
that. But in terms of the larger national interest, it redounds
to the benefit of the United States. If we have a priority in
AI development, and if we are in a competition or arms race
with China, winning the AI race by United States companies
benefits the United States, in my view.
Senator Durbin. And Mr. Baldacci should be prepared to pay
the price for that, right?
Professor Lee. Well, I would suggest that if it is so easy
to generate copies of Mr. Baldacci's novels or any other
authors, that should go in the complaint in these lawsuits. And
some of the lawsuits do allege infringing outputs. So those are
yet to be resolved. But my ultimate position is that we should
not throw out the window the established Supreme Court
precedence on how to apply fair use. It is case-by-case,
flexible, and it balances the interests of both sides in terms
of copyright, as well as innovation.
Senator Durbin. Thank you.
Chair Hawley. I just want to followup on this line of
questioning, Professor Lee. When you say that it would be to
the benefit of the United States, isn't Mr. Baldacci a citizen
of the United States?
Professor Lee. Entirely. I'm not saying that Mr. Baldacci
does not benefit from the copyright. There is another----
Chair Hawley. But let's take a different author, Professor
Viswanathan. She is a citizen of the United States?
Professor Lee. Yes.
Chair Hawley. So I am just struggling to understand, when
you say that the mass theft of their works will benefit the
United States ultimately, you are saying that the mass theft
and potential impoverishment of American citizens ultimately
redounds to the good of America?
Professor Lee. Not at all, Senator.
Chair Hawley. I think you are being a little too imprecise,
right? What you mean to say is it may benefit American
corporations. It may impoverish American citizens, but it will
benefit American corporations.
Professor Lee. Well, Senator, there is a balance to be
struck and the courts----
Chair Hawley. Well, indeed, but you are waving the magic
wand that this will benefit the United States, said we are in
an arms race with China. I am just trying to drill down on your
assertion. I think what you are really saying is is that the
enrichment of certain multinational corporations that are
incidentally based in the United States taking the works and
personal property of American citizens is a good thing. That is
a little bit less clear to me.
Professor Lee. Well, the way that I view the national
interest, as stated by President Trump's executive order, is
that there is a national priority in maintaining the United
States' dominance and leadership globally in AI. And I would
defer to the view of the AI czar, David Sacks, who said if
there is no pathway to fair use in AI training, we will lose
the race with China.
Chair Hawley. Well, you think that we should allow an
unelected AI czar to decide what the rights of American
citizens are?
Professor Lee. No, not at all. This is going through the
courts. I would let the courts decide all of these disputes.
And there are presently 44 lawsuits around the country, so this
is not a time for Congress to intervene in terms of deciding
these very difficult questions.
Chair Hawley. It just sounds strange to me to say that the
United States, as a nation, is going to benefit from the mass
violations of its citizens' rights. I thought what made us a
nation was our common citizenship, the things that we agree on
together, the rights that we hold in common. And your argument
seems to be it is fine to violate those rights en masse if it
redounds to the benefit of the Nation. I think what you are
really saying is to the benefit of certain people in the Nation
and their immediate interests.
Let me ask you about something else you said, fair use.
Professor Lee. Can I respond?
Chair Hawley. Well, just a second. I have limited time
here. Fair use, you said, is a flexible doctrine. It is an
equitable doctrine. And these companies aren't exactly coming
to this with clean hands, are they? They are coming to claiming
fair use after they have stolen Mr. Baldacci's work. They
didn't take it from the library. They didn't license it. They
didn't buy it. They went to a pirated illegal site and took it.
And now they are coming and claiming the cover of equity. That
seems kind of strange, doesn't it? Is that how equitable law
works?
Professor Lee. That is the very question, the initial
acquisition, whether that was justified as fair use. And the
two judges disagreed on how to treat that initial acquisition
from the shadow libraries. So I think it would be incorrect for
us to assume that it is necessarily a violation. And the
Supreme Court in Google v. Oracle had an opportunity to discuss
or require considerations of bad faith in the fair use
analysis, and it rejected that opportunity and even cited Judge
Leval's very influential fair use article saying that fair use
is not limited to the well-behaved.
Chair Hawley. Okay.
Professor Lee. Now----
Chair Hawley. We appreciate you being here, and thank you.
You are making these arguments very gamely. That is helpful, I
think, to have this debate. But I just want to point out that
there is a lot of hand-waving going on here. Every time we get
down to the nub of the question, can these giant corporations
take the copyrighted work of individual citizens, we get
distracted with, well, it is for the good of the country, maybe
it is not so bad, we have an arms race on, there is an AI czar.
Actually, I don't think it is that complicated. I think it is
pretty simple. I think in America, we have rights. Those rights
are what protect us. These rights are being violated. And if we
are going to succeed as a nation and uphold our principles as a
nation, we better darn well enforce the individual rights on
which the nation is founded. I mean, it is just a thought.
Senator Welch, am I catching you off guard?
Senator Welch. I was kind of enjoying it.
[Laughter.]
Chair Hawley. Well, you are welcome to ask questions if you
would like.
Senator Welch. I would hate to step on anyone, but
especially a colleague Senator and the Chair of the Committee,
you know, mid-expression of righteous outrage and indignation
with which I am aligned, so thank you very much. Thank you. And
I appreciate you calling this hearing because this is
incredibly important.
You know, Senator Blackburn and I have a bill which is
called the TRAIN Act, and it is trying to address this question
of artistic content being used. And, you know, we have got a
celebrated author here, and it would protect you. But what I
appreciate about you being here, Mr. Baldacci, is there is a
lot of folks who are aspiring to be David Baldacci. There are a
lot of artists aspiring to be a Taylor Swift. And it is the
folks who have made it that are in a position to advocate. And
it is not, I don't think, going to benefit you, but it is going
to benefit artists who have so much to contribute even though
they are not yet discovered.
And, you know, this is the reality, and this is where I
think the Chairman is really right. The AI companies need
content, so they don't care where it comes from. It is just a
voracious, insatiable appetite. And they are going to go into
copyrighted material. We just know that. And to suggest they
won't I think is naive. And the question and the burden here is
that is going into copyrighted material. And the artist has the
right to have that copyright respected.
The burden is that how do you know they used it? That is
the whole point of the TRAIN Act where if there is copyright
infringement, a reasonable assertion of that and suspicion of
it is going to require disclosure on the part of the AI
platform.
So I wanted to ask a little bit about that. And I will
start with you, Mr. Baldacci. Do you have any suspicion that
some of your works have been used to train AI systems?
Mr. Baldacci. I have been told and I have been shown a data
base, and it's part of the--part of a class-action lawsuit
against the AI community. And I think they've conceded that
they've taken at least 44 of my novels and fed them into their
large language models.
Senator Welch. I mean, that is astonishing. Literally, you
have got 44----
Mr. Baldacci. Well, at least they didn't take them all, so
that was nice.
[Laughter.]
Senator Welch. Just wait.
[Laughter.]
Mr. Baldacci. I know.
Senator Welch. And so you don't know for sure, and the only
way you are going to find out is hopefully through this class-
action litigation that you are part of.
Mr. Baldacci. Well, I certainly learned that when my son
put in ChatGPT that ChatGPT was intimately familiar with my
entire body of work because it was able to throw out, you know,
plotlines that took from many of my novels, so someone had to
feed my novels into ChatGPT. Otherwise, it could not have
created that response.
Senator Welch. And we just can't allow that. You know, that
is just really wrong. Thank you. So we are in agreement here
that we need some reforms here to protect the artist.
Mr. Smith, you know, music, it is the same situation. And,
you know, our music industry is so important. Using the word
industry is wrong. Music is so important. It really helps
people get a sense of who they are, it helps people connect,
and it is across political divisions. That is what is one of
the inspiring things about the incredible contributions that
musicians provide to our society. And can you just explain what
the dangers are of allowing AI models to freely train off
copyrighted works?
Professor Smith. Sure. There are multiple dangers. What we
have seen in the early piracy research is that Article I,
Section 8, Clause 8 is actually a really good idea. Giving
artists incentives to create actually yields more creation. And
when artists' incomes are lowered through piracy, they have
lower incentives to create. I think we see the same thing here,
both directly by participating in these pirated networks, the
generative AI companies are making it easier for other people
to steal. But then indirectly, they're also making it harder
for licenses to be signed. Mr. Baldacci talks about signing
licenses, but when you sign a license with a generative AI
company, you're signing with a gun held to your head because
they can say, either sign what I'm offering or I'm going to go
steal it instead.
Senator Welch. Well, that is the adhesion contract that
good lawyers like Senator Hawley still remember from law school
days. No, but explain that a little bit more because, you know,
this is where I think all of us have some real appreciation for
young artists. They have a vision that there is something
inside them that they can express and that it will make a
difference to people who hear it or people who read it. And
they start out against their parents' will most of the time,
right, because it is not an income-producing, promising career,
and a lot of them don't succeed, commercial success. But they
actually are contributing in a local community to a sense that
helps develop our culture and helps create respect for the
creative process and helps create respect that there are other
things than the career path that some of us up here have
followed where you can make a real contribution and a
meaningful contribution.
So this is the concern I have about how this AI and the
grabbing is going to make it tougher for those folks against
great odds to keep at it. So maybe you could just, from your
experience, talk a little bit about how it would adversely
impact any chance they have of being able to pay their bills at
the end of the month while they are trying to create
inspirational music for the benefit of all of us.
Professor Smith. Yes, I deeply share that concern, Senator,
and it's based on peer-reviewed academic research showing that
creative output goes down when piracy is allowed to flourish. I
worry that the future David Baldaccis of the world won't get
through that hump, and we won't get to appreciate their
creative output if we allow piracy to continue to be used to
train these generative AI models.
Senator Welch. Well, thank you. My time is just about up,
but I just want to express my gratitude to each of the
witnesses. I didn't have a chance to speak with you, but I
think this is an extraordinarily important issue.
I yield back.
Chair Hawley. Thank you, Senator Welch. Senator Durbin.
Senator Durbin. Mr. Pritt, you represent plaintiffs in a
lawsuit against Meta that alleges copyright infringement of the
plaintiffs' authors' works. Do you have any idea how much Meta
as a company is valued?
Mr. Pritt. That's a good question. Many trillions, I
believe.
Senator Durbin. Did Meta compensate any of the copyright
owners in your case for the use of their works?
Mr. Pritt. No, but Meta did spend money on contributing its
processing power to pirate from illicit websites and also to
pay Amazon to host pirated data.
Senator Durbin. Which, of course, did not inure to the
benefit of your plaintiffs.
Mr. Pritt. Certainly not.
Senator Durbin. How does the downloading and uploading of
pirated copyright material impact the analysis of whether a
copyright infringement could meet the mens rea requirement or
willfulness necessary for criminal infringement?
Mr. Pritt. I would let the professors answer that question.
Certainly as to willfulness in the civil copyright context, as
the documents Senator Hawley showed, I think the answer is
clear, that the piracy committed by Meta was knowing and
intentional.
Senator Durbin. Anyone else want to comment on that? Mr.
Lee, Dr. Lee?
Professor Lee. Yes, thank you, Senator. The standard of
willfulness for criminal copyright infringement requires
knowledge that it is illegal to engage in that particular
copying. Now, I don't want to relitigate what Judge Chhabria
has already ruled on, but he was given all of this evidence
that was submitted by Mr. Pritt and his colleagues. He saw the
comments by engineers, but he also saw comments and analysis by
lawyers of Meta advising them on whether this was permitted or
not under fair use law. And Judge Chhabria made a
determination. The crime fraud exception simply didn't apply.
And I'm not privy to all of the analysis that Judge
Chhabria made, but I'm assuming it was based on the question
not being resolved, the legal question of whether accessing or
copying from a pirated website to serve a highly transformative
purpose is the very question raised in the lawsuit. There was
no prior precedent that has so held that it is piracy or
illegal, let alone criminal infringement, to do that. And that
is the very question that Judge Chhabria ruled on. And to
assume that it is piracy is begging the question--with all due
respect, it is begging the question that the courts are the
appropriate determiners of.
And that can be appealed, you know, and I am sure it will
be appealed, but here the question of whether acquiring for a
putative fair use purpose is unlawful, Judge Chhabria ruled it
was not. It was for the fair use purpose of developing the AI
model. I believe that is supported by the text of Section 107.
Senator Durbin. So Professor Viswanathan, would you like to
comment on that?
Professor Viswanathan. I would, thank you so much. The very
fact that we're talking about this kind of behavior as to
whether or not it's criminal, right, the very fact that we're
here talking about willful, knowing, intentional, massive scale
training on pirated materials. Let's just step back for a
moment from the question of whether it comes under criminal
copyright infringement. Does it come under fair use at all? Is
this what fair use was developed to be? Fair use, for those of
you who don't take my copyright class, sorry about that, fair
use is an affirmative defense. Yes, I infringed, but I did it
for a good reason, a societally beneficial reason.
All right. Maybe creating a world's repository of
generative AI companies is that, but it doesn't seem to me that
it squares with the other things that we think of as fair use.
What's well-established fair use? Education, criticism,
commentary, First Amendment purposes that we consider valuable
and necessary and that are done in good faith. I educate in
good faith. I don't want to have to clear all those copyrights
to educate. Okay, great, we allow you to do that.
That is not what's going on here. I don't want to
relitigate the cases, Professor Lee, but Judge Chhabria was
clearly distressed by this. And when he raised the possibility,
as you rightly say, in dicta, that market dilution might be
what's happening, he's saying, look, exactly what the Senator
was talking about, flooding--what Mr. Baldacci was talking
about, flooding the market with subpar works that substitute
for the original works. This is not what fair use was intended
to achieve or to facilitate.
And the very fact that these companies are arguing we're in
good faith, we're doing fair use purposes, to me, this
shouldn't even be a defense that they're allowed to raise. But
okay, they will raise it, and it will be litigated. But boy, it
just does not seem consonant with what fair use was ever meant
to do.
Senator Durbin. Thank you. Thank you, Mr. Chairman.
Chair Hawley. Mr. Pritt, if I could just ask you another
question or two about some of the evidence. We talked about
Meta engineers saying that they realized what they were doing
was crossing an ethical line, that they felt they shouldn't be
doing it, but they had to cut some corners. Let me just ask
you, did Meta ever try to hide what it was doing? Did it try to
hide the fact that it was pirating these works?
Mr. Pritt. What the documents show is that in 2024, when
Meta began to use Anna's Archive, it decided intentionally to
not use its own servers and instead to go through Amazon Web
Services in order to ensure that the seeding, the sharing of
pirated works would not be traced back to Meta's own IP.
Chair Hawley. It doesn't sound to me like a company and
executives that think what they are doing is above board. It
sounds like a company that thinks that what they are doing is
probably illegal in some manner, but they want to go on doing
it anyway.
Let me just show you a couple of documents, help us
understand what we are seeing here. These are more Meta
engineers now, again, working on AI. We have got the first one,
Nikolay, who says, ``not sure we can use Meta's IPs to load
through torrents pirate content, haha.''
[Poster is displayed.]
[Laughter.]
Chair Hawley. I emphasize, these are their documents. I
mean, for all of Professor Lee's--and again, I appreciate
Professor Lee making these arguments, but for all of Professor
Lee's comments that we are not sure if it is really pirated or
not, they thought so. This is Meta. Meta thought so. The next
employee, ``I'm curious to start looking at some samples, but I
feel like we should get some clarity on what's allowed and
how,'' smiling emoji. Nikolay again, ``haha, yes, I think
torrenting from a corporate laptop doesn't feel right.''
[Poster is displayed.]
I mean, what are we looking at here, Mr. Pritt? I mean, is
this an attempt to be above board and forthcoming, and, you
know, they think everything's fine?
Mr. Pritt. I think that is a very difficult conclusion to
draw from these documents. And with all due respect to
Professor Lee, as I am still litigating the case against Meta
on behalf of a group of authors, Judge Chhabria in that case
specifically declined to decide whether Meta's piracy, what it
has engaged in, in terms of the downloading, the making
available, the making additional copies, and then sending those
copies, over 40 terabytes of data, to other individuals, is in
fact fair use. And no court, including the Supreme Court, has
ever held that rank piracy is somehow fair use. And instead,
the Supreme Court case law, still the law of the land, says
that fair use presupposes good faith and fair dealing. I will
leave it to you whether or not you think any of these documents
shows good faith and fair dealing.
Chair Hawley. Well, let's just look at one other document
and ask ourselves if this looks like good faith and fair
dealing. More Meta employees, more AI engineers. ``Frank, can
you clarify why we can't use Facebook infra''--internal--``for
this again?'' Frank Zhang replies, ``avoiding risk of tracing
back the seeder from a Facebook server.'' And he clarifies,
``avoiding risk of tracing back the seeder/downloader are from
Facebook servers.'' So here we have Meta employees saying they
know they are pirating, they think it is ethically wrong, they
think it is illegal, and they are actively avoiding trying to
create a paper trail. They are trying to hide it. I mean, that
doesn't sound like fair use to me. Does it sound like fair use
to you, Professor Lee? I mean, do you think this is fair use?
[Poster is displayed.]
Professor Lee. I would just say I agree with Judge
Chhabria's approach. The distribution claim is still alive in
the case, and this aspect of the torrenting may well be
infringement and not fair use.
Chair Hawley. I will just say this. If this isn't
infringement, Congress needs to do something. I mean, if the
answer is that the biggest corporation in the world worth
trillions of dollars can come take an individual author's work
like Mr. Baldacci, lie about it, hide it, profit off of it, and
there is nothing our law does about that, we need to change the
law. And if nothing else comes out of this hearing today, I
hope that is it. And I hope that this is motivation to this
body that we need to be paying attention to what is going on
here.
Mr. Baldacci, you said you would rather live on a different
planet if there was AI that could write your books. I am sure
that that will never happen. They will never write your books.
I want to live on a different planet if this can go on and it
is perfectly legal. We have got to do something about this.
[Applause.]
Chair Hawley. Let me just ask you, Mr. Pritt, finally, what
about Mark Zuckerberg in all of this? I mean, do we think that
Zuckerberg knew about this, approved this? I mean, what does
the evidence suggest?
Mr. Pritt. Certainly, the documents that have become public
in the case explain that the decision whether or not to use
Library Genesis, which is a notorious illicit marketplace, for
example, for actual training as opposed to exploration was
escalated to Mark Zuckerberg.
Chair Hawley. I think the judge said something to this
effect--let's just look here if we have got it--that in fact,
Zuckerberg was asked about it. There it is. In the spring of
2023, after failing to acquire licenses and following
escalation up to Zuckerberg, Meta decided to just use the works
acquired from a torrenting platform as training data. So they
just did it anyway. They just, yes, you know, do it anyway.
Forget it. Don't pay Mr. Baldacci. Don't pay anybody. It costs
too much. A lot cheaper to take it for free and then make
billions of dollars off of it.
[Poster is displayed.]
Listen, I will just conclude with this. I want to thank all
the witnesses for their testimony. And Senator Welch, if you
have more questions, or Senator Durbin, I am happy to let you
ask those.
For my part, I just want to say, I think that this is a
moral issue as much as anything else. I think this is an issue
about who are we going to be as a country? Are we going to be a
country, as it is written into our Constitution, where we
protect the rights of our citizens? It is part of what makes us
Americans. And we welcome the creative genius of people like
Mr. Baldacci and the marvelous diversity of imagination and
viewpoints and perspectives that has come to characterize our
country. Are we going to protect that? Are we going to allow a
few mega corporations to vacuum it all up, digest it, and make
billions of dollars in profits, maybe trillions, and pay nobody
for it? That is not America. That is not our country. It never
has been.
Listen, I am all for the free market. I am glad Mark
Zuckerberg can make his billions. That is fine. But not by
running over people like Mr. Baldacci or anybody else or any
young author who is trying to get a start or any other person,
creative, noncreative, or just a working guy who puts something
on Facebook. Why should all his stuff get taken? I just think
that is wrong. I think it is morally wrong. I think, frankly,
it is not consonant with our principles as Americans, and I
think we can and should do better than that.
Senator Welch, Senator Durbin?
[No response.]
Chair Hawley. I want to thank again the witnesses for being
here. Thanks to each of you. I know you had to travel far for
this. And thank you again for accommodating our schedule.
Thanks to everyone who has been here today.
And with that, we will stand adjourned.
[Whereupon, at 1:25 p.m., the hearing was adjourned.]
[Additional material submitted for the record follows.]
[GRAPHIC] [TIFF OMITTED] T1891.001
[GRAPHIC] [TIFF OMITTED] T1891.002
[GRAPHIC] [TIFF OMITTED] T1891.003
[GRAPHIC] [TIFF OMITTED] T1891.004
[GRAPHIC] [TIFF OMITTED] T1891.005
[GRAPHIC] [TIFF OMITTED] T1891.006
[GRAPHIC] [TIFF OMITTED] T1891.007
[GRAPHIC] [TIFF OMITTED] T1891.008
[GRAPHIC] [TIFF OMITTED] T1891.009
[GRAPHIC] [TIFF OMITTED] T1891.010
[GRAPHIC] [TIFF OMITTED] T1891.011
[GRAPHIC] [TIFF OMITTED] T1891.012
[GRAPHIC] [TIFF OMITTED] T1891.013
[GRAPHIC] [TIFF OMITTED] T1891.014
[GRAPHIC] [TIFF OMITTED] T1891.015
[GRAPHIC] [TIFF OMITTED] T1891.016
[GRAPHIC] [TIFF OMITTED] T1891.017
[GRAPHIC] [TIFF OMITTED] T1891.018
[GRAPHIC] [TIFF OMITTED] T1891.019
[GRAPHIC] [TIFF OMITTED] T1891.020
[GRAPHIC] [TIFF OMITTED] T1891.021
[GRAPHIC] [TIFF OMITTED] T1891.022
[GRAPHIC] [TIFF OMITTED] T1891.023
[GRAPHIC] [TIFF OMITTED] T1891.024
[GRAPHIC] [TIFF OMITTED] T1891.025
[GRAPHIC] [TIFF OMITTED] T1891.026
[GRAPHIC] [TIFF OMITTED] T1891.027
[GRAPHIC] [TIFF OMITTED] T1891.028
[GRAPHIC] [TIFF OMITTED] T1891.029
[GRAPHIC] [TIFF OMITTED] T1891.030
[GRAPHIC] [TIFF OMITTED] T1891.031
[GRAPHIC] [TIFF OMITTED] T1891.032
[GRAPHIC] [TIFF OMITTED] T1891.033
[GRAPHIC] [TIFF OMITTED] T1891.034
[GRAPHIC] [TIFF OMITTED] T1891.035
[GRAPHIC] [TIFF OMITTED] T1891.036
[GRAPHIC] [TIFF OMITTED] T1891.037
[GRAPHIC] [TIFF OMITTED] T1891.038
[GRAPHIC] [TIFF OMITTED] T1891.039
[GRAPHIC] [TIFF OMITTED] T1891.040
[GRAPHIC] [TIFF OMITTED] T1891.041
[GRAPHIC] [TIFF OMITTED] T1891.042
[GRAPHIC] [TIFF OMITTED] T1891.043
[GRAPHIC] [TIFF OMITTED] T1891.044
[GRAPHIC] [TIFF OMITTED] T1891.045
[GRAPHIC] [TIFF OMITTED] T1891.046
[GRAPHIC] [TIFF OMITTED] T1891.047
[GRAPHIC] [TIFF OMITTED] T1891.048
[GRAPHIC] [TIFF OMITTED] T1891.049
[GRAPHIC] [TIFF OMITTED] T1891.050
[GRAPHIC] [TIFF OMITTED] T1891.051
[GRAPHIC] [TIFF OMITTED] T1891.052
[GRAPHIC] [TIFF OMITTED] T1891.053
[GRAPHIC] [TIFF OMITTED] T1891.054
[GRAPHIC] [TIFF OMITTED] T1891.055
[GRAPHIC] [TIFF OMITTED] T1891.056
[GRAPHIC] [TIFF OMITTED] T1891.057
[GRAPHIC] [TIFF OMITTED] T1891.058
[GRAPHIC] [TIFF OMITTED] T1891.059
[GRAPHIC] [TIFF OMITTED] T1891.060
[GRAPHIC] [TIFF OMITTED] T1891.061
[GRAPHIC] [TIFF OMITTED] T1891.062
[GRAPHIC] [TIFF OMITTED] T1891.063
[GRAPHIC] [TIFF OMITTED] T1891.064
[GRAPHIC] [TIFF OMITTED] T1891.065
[GRAPHIC] [TIFF OMITTED] T1891.066
[GRAPHIC] [TIFF OMITTED] T1891.067
[GRAPHIC] [TIFF OMITTED] T1891.068
[GRAPHIC] [TIFF OMITTED] T1891.069
[GRAPHIC] [TIFF OMITTED] T1891.070
A P P E N D I X
The following submissions are available at:
https://www.govinfo.gov/content/pkg/CHRG-119shrg61891/pdf/CHRG-
119shrg
61891-add1.pdf
Submitted by Chair Hawley:
Article III Project, letter...................................... 2
Artificial Intelligence Threatens Ownership of Online Content.... 5
Association of American Publishers (AAP), statement.............. 10
CreativeFuture, letter........................................... 16
Motion Picture Association (MPA), letter......................... 19
News Media Alliance, statement................................... 23
Rumble, statement................................................ 28
Society of Composers & Lyricists (SCL), letter................... 30
Submitted by Ranking Member Durbin:
Center for AI and Digital Policy (CAIDP), statement.............. 32
Copyright Alliance, statement.................................... 38
CreativeFuture, letter........................................... 16
News Media Alliance, statement................................... 23
Society of Composers & Lyricists (SCL), letter................... 30
Submitted by Senator Klobuchar:
Artificial Intelligence Threatens Ownership of Online Content.... 5
Submitted by Senator Coons:
Motion Picture Association (MPA), letter......................... 19