[Congressional Record Volume 153, Number 5 (Wednesday, January 10, 2007)]
[Senate]
[Pages S359-S361]
From the Congressional Record Online through the Government Publishing Office [www.gpo.gov]

      By Mr. FEINGOLD (for himself, Mr. Sununu, Mr. Leahy, and Mr. 
        Akaka):
  S. 236. A bill to require reports to Congress on Federal agency use 
of data mining; to the Committee on the Judiciary.
  Mr. FEINGOLD. Mr. President, I am pleased today to introduce the 
Federal Agency Data Mining Reporting Act of 2007. I want to thank 
Senator Sununu for once again cosponsoring this bill, which we also 
introduced in the last Congress. Senator Sununu has consistently been a 
leader on privacy issues, and I am pleased to work with him on this 
effort. I also want to thank Senators Leahy, Akaka, and Wyden, for 
their continuing support of the bill.
  The controversial data analysis technology known as data mining is 
capable of reviewing millions of both public and private records on 
each and every American. The possibility of government law enforcement 
or intelligence agencies fishing for patterns of criminal or terrorist 
activity in these vast quantities of digital data raises serious 
privacy and civil liberties issues--not to mention serious questions 
about the effectiveness of these types of searches. But four years 
after Congress first learned about and defunded the Defense 
Department's program called Total Information Awareness, there is still 
much Congress does not know about the Federal Government's work on data 
mining.
  We have made some progress. We know from reviews conducted by the 
Government Accountability Office that as of May 2004 there were nearly 
200 Federal data mining programs, more than one hundred of which relied 
on personal information and 29 of which were for the purpose of 
investigating terrorists or criminals. And we have learned a few more 
details on five of those programs from a follow-up report that GAO 
issued in August 2005. We also have a brief report from the DHS 
Inspector General published in August 2006, and as a result of my 
amendment to the DHS appropriations bill we have a July 2006 report 
from the Privacy Office at the Department of Homeland Security that 
provides some interesting policy suggestions relating to data mining.
  But this information has come to us haphazardly, and lacks detail 
about the precise nature of the data mining programs being utilized or 
developed, their efficacy, and the consequences Americans could face as 
a result. Furthermore, much of the reporting thus far has focused on 
the Department of Homeland Security. It also appears there has been 
little if any government-wide consideration of privacy policies for 
these types of programs. Indeed, public debate on government data 
mining has been generated more by press stories than as a result of 
congressional oversight.
  My bill would require all Federal agencies to report to Congress 
within 180 days and every year thereafter on data mining programs 
developed or used to find a pattern or anomaly indicating terrorist or 
other criminal activity on the part of individuals, and how these 
programs implicate the civil liberties and privacy of all Americans. If 
necessary, specific information in the various reports could be 
classified.
  This is information we need to have. Congress should not be learning 
the details about data mining programs after millions of dollars are 
spent testing or using data mining against unsuspecting Americans. The 
possibility of unchecked, secret use of data mining technology 
threatens one of the most important values that we are fighting for in 
the war against terrorism--freedom.
  Data mining could rely on a combination of intelligence data and 
personal information like individuals' traffic violations, credit card 
purchases, travel records, medical records, and virtually any 
information contained in commercial or public databases. Congress must 
conduct oversight to make sure that all government agencies engaged in 
fighting terrorism and other criminal enterprises--not just the 
Department of Homeland Security, but also the Department of Justice, 
the Department of Defense and others--use these types of sensitive 
personal information effectively and appropriately.
  Let me clarify what this bill does not do. It does not have any 
effect on the government's use of commercial data to conduct 
individualized searches on people who are already suspects, nor does it 
require that the government report on these types of searches. It does 
not end funding for any program, determine the rules for use of data 
mining technology, or threaten any ongoing investigation that might use 
data mining technology.
  My bill would simply provide Congress with information about the 
nature of the technology and the data that will be used. The Federal 
Agency Data Mining Reporting Act would require all government agencies 
to assess

[[Page S360]]

the efficacy of the data mining technology they are using or 
developing--that is, whether the technology can deliver on the promises 
of each program. In addition, my bill would make sure that Congress 
knows whether the Federal agencies using data mining technology have 
considered and developed policies or guidelines to protect the privacy 
and due process rights of individuals, such as privacy technologies and 
redress procedures. With complete information about the current data 
mining plans and practices of the Federal Government, Congress will be 
able to conduct a thorough review of the costs and benefits of the 
practice of data mining on a program-by-program basis and make 
considered judgments about whether programs should go forward. Congress 
will also be able to evaluate whether new privacy rules are necessary.
  In addition, Congress must look closely at the government's 
activities because data mining is unproven in this area. Some argue 
that data mining can help locate potential terrorists before they 
strike. But we do not, today, have evidence that pattern-based data 
mining will prevent terrorism. In fact, some technology experts have 
warned that this type of data mining is not the right approach for the 
terrorism problem. Just last month, the Cato Institute released a 
report--coauthored by a scientist specializing in data analytics and an 
information privacy expert--concluding that ``[t]he only thing 
predictable about predictive data mining for terrorism is that it would 
be consistently wrong.''
  Some commercial uses of data mining have been successful, but have 
arisen in a very different context than counterterrorism efforts. For 
example, the financial world has successfully used data mining to 
identify people committing fraud because it has data on literally 
millions, if not billions, of historical financial transactions. And 
the banks and credit card companies know, in large part, which of those 
past transactions have turned out to be fraudulent. So when they apply 
sophisticated statistical algorithms to that massive amount of 
historical data, they are able to make a pretty good guess about what a 
fraudulent transaction might look like in the future.
  We do not have that kind of historical data about terrorists and 
sleeper cells. We have just a handful of individuals whose past actions 
can be analyzed, which makes it virtually impossible to apply the kind 
of advanced statistical analysis required to use data mining in this 
way. That raises serious questions about whether data mining will ever 
be able to locate an actual terrorist. Before the government starts 
reviewing personal information about every man, woman and child in this 
country, we should learn what data mining can and can't do--and what 
limits and protections are needed if data mining programs do go 
forward.
  We must also bear in mind that there will inevitably be errors in the 
underlying data. Everyone knows people who have had errors on their 
credit reports--and that is the one area of commercial data where the 
law already imposes strict accuracy requirements. Other types of 
commercial data are likely to be even more inaccurate. Even if the 
technology itself were effective, I am very concerned that innocent 
people could be ensnared because of mistakes in the data that make them 
look suspicious. The recent rise in identity theft, which creates even 
more data accuracy problems, makes it even more important that we 
address this issue.
  I also want to touch on one issue that has proved difficult in many 
debates about data mining: how to define the term. What is data mining? 
From policy debates to government reports, many people have wrestled 
with this question. While it can be defined more broadly, for the 
purpose of this reporting requirement, data mining is limited to the 
process of attempting to predict future events or actions by 
discovering or locating patterns or anomalies in data. However, for 
purposes of the reporting requirement in this bill, which seeks 
information on those data mining programs most likely to threaten the 
privacy and civil liberties of Americans, I have limited the definition 
in a couple of other ways. First, the bill's core definition of data 
mining is to conduct a query, search or other analysis of one or more 
electronic databases to ``discover a predictive pattern or an anomaly 
indicative of terrorist or criminal activity on the part of any 
individual or individuals.'' Data mining has a number of applications 
at various government agencies outside the context of terrorism and 
other criminal investigations, but I have limited the definition for 
purposes of this legislation in order to get reports on the programs 
most likely to raise privacy concerns. For example, the May 2004 GAO 
report identified a number of government data mining programs whose 
goals are managing resources efficiently or identifying fraud, waste 
and abuse in government programs, and that do not rely on personally 
identifiable information. I am not seeking reports on programs like 
these.
  Second, as I alluded to earlier, the definition explicitly excludes 
queries to retrieve information from a database that is based on 
information--such as address, passport number or license plate number--
that is associated with a particular individual or individuals. This 
type of query is a traditional investigative technique. Although 
government agencies must be careful in their use of commercial 
databases, simply querying a Choicepoint database for information about 
someone who is already a suspect is not data mining.
  Most Americans believe that their private lives should remain 
private. Data mining programs run the risk of intruding into the lives 
of individuals who have nothing to do with terrorism or other criminal 
activity and understandably do not want their credit reports, shopping 
habits and doctor visits to become a part of a gigantic computerized 
search engine operating without any controls or oversight, and without 
much promise of locating terrorists. As the Cato report put it, ``[t]he 
possible benefits of predictive data mining for finding planning or 
preparation for terrorism are minimal. The financial costs, wasted 
effort, and threats to privacy and civil liberties are potentially 
vast.''
  At a minimum, the administration should be required to report to 
Congress about the various data mining programs now underway or being 
studied, and the impact those programs may have on our privacy and 
civil liberties, so that Congress can determine whether any benefits of 
this practice come at too high a price to our privacy and personal 
liberties. As Senator Wyden and I have told the Director of National 
Intelligence, we must have a public discussion about the efficacy and 
privacy implications of data mining. We wrote a letter to him on 
November 15, 2006, that included the following:

       [W]e believe there needs to be a public discussion before 
     the implementation of any government data mining program that 
     would rely on domestic commercial data and other information 
     about Americans. There are serious questions about whether 
     pattern analysis of such data can effectively identify 
     terrorists, given the relative lack of historical data about 
     terrorist activities. And as the furor over the Total 
     Information Awareness program demonstrated, the American 
     public has serious--and legitimate--concerns about the 
     privacy ramifications of programs designed to fish for 
     patterns of criminal or terrorist activity in vast quantities 
     of digital data, collected by other entities for entirely 
     different reasons. Pattern analysis runs the risk of 
     generating a large number of false positives, meaning that 
     innocent Americans could become the subject of investigation. 
     Before we go down that path, it is critical that we have a 
     public discussion about the efficacy and privacy implications 
     of this technology. And, if we decide that data mining is 
     effective enough to warrant spending taxpayer dollars on it, 
     we should establish strong privacy protections to protect 
     innocent people from being the subject of government 
     suspicion.
       Of course, the Intelligence Community should be taking 
     advantage of new technologies in its critical responsibility 
     to protect our country from terrorists, and much of its work 
     must remain classified to protect national security. But we 
     can have a public debate about what privacy rules should 
     constrain data mining programs deployed domestically, without 
     revealing sensitive information like the precise algorithms 
     that the government has developed.

  This bill is the first step in this process--a way for Congress and, 
to the degree appropriate, the public to finally understand what is 
going on behind the closed doors of the executive branch so that we can 
start to have a policy discussion about data mining that is long 
overdue. I urge my colleagues to support this bill. All it asks for is 
information to which Congress and the American people are entitled.

[[Page S361]]

  Mr. President, I ask unanimous consent that the text of this bill be 
printed in the Record.
  There being no objection, the text of the bill was ordered to be 
printed in the Record, as follows:

                                 S. 236

       Be it enacted by the Senate and House of Representatives of 
     the United States of America in Congress assembled,

     SECTION 1. SHORT TITLE.

       This Act may be cited as the ``Federal Agency Data Mining 
     Reporting Act of 2007''.

     SEC. 2. DEFINITIONS.

       In this Act:
       (1) Data mining.--The term ``data mining'' means a query, 
     search, or other analysis of 1 or more electronic databases, 
     where--
       (A) a department or agency of the Federal Government, or a 
     non-Federal entity acting on behalf of the Federal 
     Government, is conducting the query, search, or other 
     analysis to discover or locate a predictive pattern or 
     anomaly indicative of terrorist or criminal activity on the 
     part of any individual or individuals; and
       (B) the query, search, or other analysis does not use 
     personal identifiers of a specific individual, or inputs 
     associated with a specific individual or group of 
     individuals, to retrieve information from the database or 
     databases.
       (2) Database.--The term ``database'' does not include 
     telephone directories, news reporting, information publicly 
     available to any member of the public without payment of a 
     fee, or databases of judicial and administrative opinions.

     SEC. 3. REPORTS ON DATA MINING ACTIVITIES BY FEDERAL 
                   AGENCIES.

       (a) Requirement for Report.--The head of each department or 
     agency of the Federal Government that is engaged in any 
     activity to use or develop data mining shall submit a report 
     to Congress on all such activities of the department or 
     agency under the jurisdiction of that official. The report 
     shall be made available to the public, except for a 
     classified annex described in subsection (b)(8).
       (b) Content of Report.--Each report submitted under 
     subsection (a) shall include, for each activity to use or 
     develop data mining, the following information:
       (1) A thorough description of the data mining activity, its 
     goals, and, where appropriate, the target dates for the 
     deployment of the data mining activity.
       (2) A thorough description of the data mining technology 
     that is being used or will be used, including the basis for 
     determining whether a particular pattern or anomaly is 
     indicative of terrorist or criminal activity.
       (3) A thorough description of the data sources that are 
     being or will be used.
       (4) An assessment of the efficacy or likely efficacy of the 
     data mining activity in providing accurate information 
     consistent with and valuable to the stated goals and plans 
     for the use or development of the data mining activity.
       (5) An assessment of the impact or likely impact of the 
     implementation of the data mining activity on the privacy and 
     civil liberties of individuals, including a thorough 
     description of the actions that are being taken or will be 
     taken with regard to the property, privacy, or other rights 
     or privileges of any individual or individuals as a result of 
     the implementation of the data mining activity.
       (6) A list and analysis of the laws and regulations that 
     govern the information being or to be collected, reviewed, 
     gathered, analyzed, or used with the data mining activity.
       (7) A thorough discussion of the policies, procedures, and 
     guidelines that are in place or that are to be developed and 
     applied in the use of such technology for data mining in 
     order to--
       (A) protect the privacy and due process rights of 
     individuals, such as redress procedures; and
       (B) ensure that only accurate information is collected, 
     reviewed, gathered, analyzed, or used.
       (8) Any necessary classified information in an annex that 
     shall be available, as appropriate, to the Committee on 
     Homeland Security and Governmental Affairs, the Committee on 
     the Judiciary, the Select Committee on Intelligence, and the 
     Committee on Appropriations of the Senate and the Committee 
     on Homeland Security, the Committee on the Judiciary, the 
     Permanent Select Committee on Intelligence, and the Committee 
     on Appropriations of the House of Representatives.
       (c) Time for Report.--Each report required under subsection 
     (a) shall be--
       (1) submitted not later than 180 days after the date of 
     enactment of this Act; and
       (2) updated not less frequently than annually thereafter, 
     to include any activity to use or develop data mining engaged 
     in after the date of the prior report submitted under 
     subsection (a).

  Mr. LEAHY. Mr. President, I am pleased today to join with Senators 
Feingold, Sununu and others to introduce the Federal Agency Data Mining 
Reporting Act of 2007. This important privacy legislation would begin 
to restore key checks and balances by requiring Federal agencies to 
report to Congress on their datamining programs and activities. We 
joined together to introduce a similar bill last Congress. Regrettably, 
it received no attention. This year, I intend to make sure that we do a 
better job in considering Americans' privacy, checks and balances, and 
the proper balance to protect Americans' privacy rights while fighting 
smarter and more effectively against security threats.
  In recent years, the Federal Government's use of data mining 
technology has exploded. According to a May 2004 report by the General 
Accounting Office, there are at least 199 different government data 
mining programs operating or planned throughout the Federal Government, 
with at least 52 different Federal agencies currently using data mining 
technology. And, more and more, these data mining programs are being 
used with little or no notice to ordinary citizens, or to Congress.
  Advances in technologies make data banks and data mining more 
powerful and more useful than at any other time in our history. These 
can be useful tools in our national security arsenal, but we should use 
them appropriately so that they can be most effective. A mistake can 
cost Americans their jobs and wreak havoc in their lives and 
reputations that can take years to repair. Without adequate safeguards, 
oversight and checks and balances, these powerful technologies also 
become an invitation to government abuse. The government must take 
steps to ensure that it is properly using this technology. Too often, 
government data mining programs lack adequate safeguards to protect the 
privacy rights and civil liberties of ordinary Americans, whose data is 
collected and analyzed by these programs. Without these safeguards, 
government data mining programs are prone to produce inaccurate results 
and are ripe for abuse, error and unintended consequences.
  This legislation takes an important first step in addressing these 
concerns by pulling back the curtain on how this Administration is 
using this technology. It does not by its terms prohibit the use of 
this technology, but rather provides an oversight mechanism to begin to 
ensure it is being used appropriately and effectively. This bill would 
require Federal agencies to report to Congress about its data mining 
programs. The legislation provides a much-needed check on federal 
agencies to disclose the steps that they are taking to protect the 
privacy and due process rights of American citizens when they use these 
programs.
  We need checks and balances to keep government data bases from being 
misused against the American people. That is what the Constitution and 
our laws should provide. We in Congress must make sure that when our 
government uses technology to detect and deter illegal activity that it 
does so in a manner that also protects our most basic rights and 
liberties. This bill advances this important goal, and I urge all 
Senators to support this important privacy legislation.
                                 ______