[House Hearing, 108 Congress]
[From the U.S. Government Publishing Office]




       DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES

=======================================================================

                                HEARING

                               before the

                SUBCOMMITTEE ON TECHNOLOGY, INFORMATION
                POLICY, INTERGOVERNMENTAL RELATIONS AND
                               THE CENSUS

                                 of the

                              COMMITTEE ON
                           GOVERNMENT REFORM

                        HOUSE OF REPRESENTATIVES

                      ONE HUNDRED EIGHTH CONGRESS

                             FIRST SESSION

                               __________

                             MARCH 25, 2003

                               __________

                           Serial No. 108-11

                               __________

       Printed for the use of the Committee on Government Reform


  Available via the World Wide Web: http://www.gpo.gov/congress/house
                      http://www.house.gov/reform

                                 ______

87-229              U.S. GOVERNMENT PRINTING OFFICE
                            WASHINGTON : 2003
____________________________________________________________________________
For Sale by the Superintendent of Documents, U.S. Government Printing Office
Internet: bookstore.gpr.gov  Phone: toll free (866) 512-1800; (202) 512ï¿½091800  
Fax: (202) 512ï¿½092250 Mail: Stop SSOP, Washington, DC 20402ï¿½090001

                     COMMITTEE ON GOVERNMENT REFORM

                     TOM DAVIS, Virginia, Chairman
DAN BURTON, Indiana                  HENRY A. WAXMAN, California
CHRISTOPHER SHAYS, Connecticut       TOM LANTOS, California
ILEANA ROS-LEHTINEN, Florida         MAJOR R. OWENS, New York
JOHN M. McHUGH, New York             EDOLPHUS TOWNS, New York
JOHN L. MICA, Florida                PAUL E. KANJORSKI, Pennsylvania
MARK E. SOUDER, Indiana              CAROLYN B. MALONEY, New York
STEVEN C. LaTOURETTE, Ohio           ELIJAH E. CUMMINGS, Maryland
DOUG OSE, California                 DENNIS J. KUCINICH, Ohio
RON LEWIS, Kentucky                  DANNY K. DAVIS, Illinois
JO ANN DAVIS, Virginia               JOHN F. TIERNEY, Massachusetts
TODD RUSSELL PLATTS, Pennsylvania    WM. LACY CLAY, Missouri
CHRIS CANNON, Utah                   DIANE E. WATSON, California
ADAM H. PUTNAM, Florida              STEPHEN F. LYNCH, Massachusetts
EDWARD L. SCHROCK, Virginia          CHRIS VAN HOLLEN, Maryland
JOHN J. DUNCAN, Jr., Tennessee       LINDA T. SANCHEZ, California
JOHN SULLIVAN, Oklahoma              C.A. ``DUTCH'' RUPPERSBERGER, 
NATHAN DEAL, Georgia                     Maryland
CANDICE S. MILLER, Michigan          ELEANOR HOLMES NORTON, District of 
TIM MURPHY, Pennsylvania                 Columbia
MICHAEL R. TURNER, Ohio              JIM COOPER, Tennessee
JOHN R. CARTER, Texas                CHRIS BELL, Texas
WILLIAM J. JANKLOW, South Dakota                 ------
MARSHA BLACKBURN, Tennessee          BERNARD SANDERS, Vermont 
                                         (Independent)

                       Peter Sirh, Staff Director
                 Melissa Wojciak, Deputy Staff Director
              Randy Kaplan, Senior Counsel/Parliamentarian
                       Teresa Austin, Chief Clerk
              Philip M. Schiliro, Minority Staff Director

   Subcommittee on Technology, Information Policy, Intergovernmental 
                        Relations and the Census

                   ADAM H. PUTNAM, Florida, Chairman
CANDICE S. MILLER, Michigan          WM. LACY CLAY, Missouri
DOUG OSE, California                 DIANE E. WATSON, California
TIM MURPHY, Pennsylvania             STEPHEN F. LYNCH, Massachusetts
MICHAEL R. TURNER, Ohio

                               Ex Officio

TOM DAVIS, Virginia                  HENRY A. WAXMAN, California
                        Bob Dix, Staff Director
                 Chip Walker, Professional Staff Member
                 Lori Martin, Professional Staff Member
                      Ursula Wojciechowski, Clerk
           David McMillen, Minority Professional Staff Member


                            C O N T E N T S

                              ----------                              
                                                                   Page
Hearing held on March 25, 2003...................................     1
Statement of:
    Dockery, State Senator Paula, majority whip, Florida State 
      Senate.....................................................     7
    Forman, Mark A., Associate Director, Information Technology 
      and Electronic Government, Office of Management and Budget.    23
    Kutz, Gregory, Director, Financial Management and Assurance, 
      U.S. General Accounting Office.............................    32
    Louie, Jen Que, president, Nautilus Systems, Inc.............    15
    Rosen, Jeffrey, George Washington University Law School, 
      legal affairs editor of the New Republic...................    55
Letters, statements, etc., submitted for the record by:
    Clay, Hon. Wm. Lacy, a Representative in Congress from the 
      State of Missouri, prepared statement of...................    77
    Dockery, State Senator Paula, majority whip, Florida State 
      Senate, prepared statement of..............................    10
    Forman, Mark A., Associate Director, Information Technology 
      and Electronic Government, Office of Management and Budget, 
      prepared statement of......................................    26
    Kutz, Gregory, Director, Financial Management and Assurance, 
      U.S. General Accounting Office, prepared statement of......    34
    Louie, Jen Que, president, Nautilus Systems, Inc., prepared 
      statement of...............................................    17
    Putnam, Hon. Adam H., a Representative in Congress from the 
      State of Florida, prepared statement of....................     4
    Rosen, Jeffrey, George Washington University Law School, 
      legal affairs editor of the New Republic, prepared 
      statement of...............................................    58

 
       DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES

                              ----------                              


                        TUESDAY, MARCH 25, 2003

                  House of Representatives,
   Subcommittee on Technology, Information Policy, 
        Intergovernmental Relations and the Census,
                            Committee on Government Reform,
                                                    Washington, DC.
    The subcommittee met, pursuant to notice, at 9:30 a.m., in 
room 2154, Rayburn House Office Building, Hon. Adam Putnam 
(chairman of the subcommittee) presiding.
    Present: Representatives Putnam, Miller, Turner, and Clay.
    Staff present: Bob Dix, staff director; John Hambel, senior 
counsel; Chip Walker and Lori Martin, professional staff 
members; Ursula Wojciechowski, clerk; David McMillen, minority 
professional staff member; Jean Gosa, minority clerk; and 
Earley Green, minority chief clerk.
    Mr. Putnam. A quorum being present, the Subcommittee on 
Technology, Information Policy, Intergovernmental Relations and 
the Census will come to order.
    Good morning and welcome to the first in a planned series 
of hearings addressing the important subject of data mining 
technology or ``factual data analysis,'' as some might refer to 
it.
    Before we get into my opening statement, considering the 
events of the world today and the enormous pressures that this 
Congress and our President are under, I would ask that we pause 
for a moment of silence.
    [Moment of silence.]
    Mr. Putnam. Thank you.
    There are a number of proven uses for this data mining 
technology which has played a prominent role in many arenas, 
public and private, for years. This morning we will work to 
define the technology itself and examine the parameters of its 
application. There is no secret that some have expressed 
concerns about the role of data mining, particularly in the 
context of privacy intrusions. We will attempt to explore the 
manner in which this technology will continue to be a valuable 
tool in a variety of governmental uses, not just those of 
national security, while also acknowledging the public interest 
in protecting the privacy of personal information. Data mining 
is a technology that facilitates the ability to sort through 
large amounts of information through data base exploration, 
extract specific information in accordance with defined 
criteria, and identify patterns of interest to its user.
    As I understand the technology, the user has the ability to 
tailor a data mining program to a particular purpose by 
selecting a number of different data bases to search and 
setting the criteria for that search. Data mining technology 
has been utilized successfully for many years in both public 
and private sectors to identify and analyze data that might 
otherwise be overlooked or inaccessible. Examples of the 
variety of commercial or governmental uses associated with data 
mining software would include businesses being able to develop 
a targeted marketing campaign in an effort to identify 
prospective customers; government agencies expanding 
opportunities to track down tax evaders; detection of Medicaid 
or Medicare fraud; and corporations using this tool to estimate 
spending in revenue more accurately, just to name a few.
    For example, a mortgage refinancing lender may seek to 
determine potential candidates for their services by attempting 
to identify mortgage holders who have lived in their homes for 
a certain period of time in a particular geographic location 
with a market value range of property at a certain level in 
order to target a special refinancing rate offer. As you can 
imagine, this type of technology is invaluable to a number of 
institutions. Because it is such a vast and evolving field, the 
subcommittee is very interested in exploring the uses and 
effects of this technology in subsequent followup hearings to 
address more particular applications.
    While data mining may have many legitimate and worthwhile 
uses, we must always be vigilant of any potential encroachment 
on the privacy of the American public. We have great 
responsibilities as elected officials. We must protect the 
American ideals of life, liberty, and freedom. At times these 
ideals would seem to come into conflict with one another, and 
it's our job to ensure that we do all we can to protect the 
public while maintaining the faith entrusted to us by the 
Founding Fathers to protect the right of the people to privacy 
and freedom. Ben Franklin once said, ``Those who would give up 
freedom for security deserve neither.''
    I would like to welcome the following witnesses who are 
offering their expert testimony before us today: The Honorable 
Paula Dockery, Florida State Senator; Dr. Jen Que Louie, 
president of Nautilus Systems, Inc.; Mark Forman, Associate 
Director of Information Technology and Electronic Government, 
Office of Management and Budget, our Nation's CIO; Gregory 
Kutz, Director of Financial Management and Assurance, General 
Accounting Office; and Jeffrey Rosen, associate professor of 
the George Washington University Law School, legal affairs 
editor of the New Republic. Mr. Armey was unable to be with us 
today.
    Interest in expanding the use of this technology at the 
Federal level of government has become more widespread as we 
look to use modern technology to improve intergovernmental 
communications and national security. From our oversight 
perspective as the subcommittee, we have a special interest in 
learning the pros and cons to data mining technology as well as 
how its use could be or is being expanded at the Federal level.
    We appreciate the participation of today's witnesses as 
they provide tremendous information to the subcommittee on this 
important topic, and we thank you again for taking the time out 
of your busy schedules. Today's hearing can be viewed live via 
WebCast by going to reform.house.gov and clicking on the link 
under ``Live Committee Broadcast.''
    As we await the ranking member from Missouri, I want to 
recognize our vice chair, Candace Miller from Michigan, for her 
opening statement. Gentlelady from Michigan.
    [The prepared statement of Hon. Adam H. Putnam follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.001
    
    [GRAPHIC] [TIFF OMITTED] T7229.002
    
    Mrs. Miller. Thank you, Mr. Chairman.
    I want to thank the witnesses for coming today, and Mr. 
Forman, good to see you again. I'm sure this committee will be 
seeing certainly a lot of you.
    As I mentioned at the last committee hearing, I am so 
particularly interested in the subjects, and this data mining 
is a fascinating one. I had been the Secretary of State in 
Michigan where not only did I have the elections there with all 
the registered voters, I also did the motor vehicle 
administrative kinds of things. We had a big data base in our 
State with everybody who had a boat, a snowmobile, and a 
trailer and a car and a truck and everything, and there was 
always a lot of consternation about what was government doing 
with this information; who had the information; for what 
purposes. If you wanted to get licensed in Michigan, you had to 
give me certain amounts of information. But what was government 
doing with it and what was the citizens' expectation of what we 
would do with all of that data?
    There was a time when our State--and I know many States 
still do this--sell the information. It is a huge revenue 
source, of course. But I don't think citizens are normally 
expecting that the government will be selling their personal 
and private information. And so there is a consternation about 
who can access the information, how will it be massaged, how 
will it be utilized, and certainly on the part of the citizens, 
invasion of personal privacy by ``Big Brother,'' by government.
    As we march down the information highway, sometimes there 
is a slippery slope there that I think all of us in government 
at the Federal level, the State level, the county level, anyone 
that has interaction with these various data, that we always 
keep that uppermost in our mind about invasion of personal 
privacy.
    With that being said, the technology is certainly out there 
and it can be utilized to make huge advances in society, and 
there are so many things in every layer of government that 
could be done so much better if we were able to use the 
technology properly. So I am very pleased to see you all today. 
Thank you for coming. I certainly look forward to hearing your 
testimony this morning. Thank you.
    Mr. Putnam. I thank the gentlelady. She brings tremendous 
experience from her days as Secretary of State and work in 
bringing that office into the Information Age.
    We are joined by a former mayor, the gentleman from Ohio, 
Mr. Turner. For your opening statement you are recognized.
    Mr. Turner. Thank you, Mr. Chairman. I am particularly 
interested in this area. NCR is located in Dayton, OH, which is 
a leading technology company in this issue of data mining for 
the private sector. And recently they hosted a forum on the 
issue of data mining applications, taking them from the private 
sector and applying them to government issues. And it was an 
interesting discussion because they began in telling us that 
Wal-Mart, at the end of the day, can tell us how many socks 
they have sold; but we are not necessarily able to tell 
ourselves, in reference to foreign visitors, how many visas 
have expired today and who they are.
    So the possible applications of data mining on very simple 
tasks that clearly do not violate issues of privacy is a wide 
open field which we need to pursue vigorously.
    Also the issue that was fascinating to me in their 
discussion is how you look at the process of data mining, not 
looking first at what data that you have, but looking at what 
questions do you want answered, and that the issue of 
technology is there. The issue of the application of technology 
is demonstrated in the private sector; the issue before us in 
government is to begin the process of asking what questions do 
we need to know answers to and then turning to the experts in 
data mining that have applied it in the private sector to 
assist us so we can have those answers in the public sector.
    Thank you.
    Mr. Putnam. I thank the gentleman.
    We will now take the testimony from the witnesses. Each has 
been very gracious to prepare written testimony which will be 
included in the record of this hearing. And I have asked each 
of you to summarize your presentation into 5 minutes, if you 
could, to leave ample time for questions and answers. Witnesses 
will notice that there is a timer with a light on the witness 
table. Green light means you begin your remarks, the yellow 
light means it's time to wrap up, and the red light means that 
we hit the ejection seat.
    In order to be sensitive to everyone's time schedule, we 
ask that you cooperate with us in our time schedule. As is the 
policy of the Committee on Government Reform, all witnesses 
will be sworn in. So I'll ask you to rise, please, and raise 
your right hands.
    [Witnesses sworn.]
    Mr. Putnam. All witnesses responded in the affirmative. 
Thank you.
    I would like to introduce our witnesses first and then call 
on them for their testimony, followed by questions. We begin 
our panel with an old colleague of mine and a very dear friend 
from Florida, State Senator Paula Dockery. Florida is one of 
the States where data mining techniques have been used in 
several areas, and quite successfully. Senator Dockery's 
experience will lend a very helpful perspective to us today. 
She serves as majority whip in the Senate as well as chairman 
of the Committee on Homeland Security and Seaports. Senator 
Dockery, welcome to the committee and we look forward to your 
testimony, please.

   STATEMENT OF STATE SENATOR PAULA DOCKERY, MAJORITY WHIP, 
                      FLORIDA STATE SENATE

    Ms. Dockery. Thank you, Mr. Chairman, and good morning, Mr. 
Chairman and members of the committee. Thank you very much for 
the opportunity to be here today not only to share with you 
what we think we are doing right in the State of Florida, but 
also to be part of this distinguished panel and to learn from 
the experts to my left. I apologize in advance. I'm going to be 
reading so I can make my time limit, and I'm going to probably 
have to read pretty fast because I timed it at 7 minutes. But I 
would like to get started with that.
    The issue of enhanced information sharing by our law 
enforcement and public safety professionals is at the forefront 
in our war against terrorism in our efforts to keep America 
safe. Florida, I believe, has taken a strong leadership role in 
this effort, one that can serve as a model for other States. 
This model and its reliance on data mining is the focus of our 
discussion today.
    Florida uses the term ``factual data analysis'' to describe 
this information processing system. This process includes the 
collection of information from multiple sources. Once this 
information is processed, analyzed, and evaluated, the 
resulting products represents the intelligence needed to assist 
law enforcement. Intelligence can then can be used in a 
proactive and preventive approach to detect criminal patterns, 
crime trends, modus operandi, financial criminal activity and 
criminal organizations.
    Data collection is much different today than in years past. 
The number of data bases and the information contained there is 
immense, as is the ability to effectively and efficiently 
analyze available data in a timely manner. The results can be 
overwhelming. Factual data analysis plays a crucial role in 
filtering the vast quantity of information by separating the 
significant data from the insignificant data. Some individuals 
and groups voice concern for perceived loss of privacy and a 
perceived attempt to foster the examination of private 
information.
    Florida's law enforcement efforts are aimed at utilizing 
only that specific data which law enforcement already has a 
legal right to use, while doing so in a proficient, 
professional, and expeditious manner. Many safeguards have been 
implemented to ensure appropriate use of information. These 
include user name and password protection, user training, 
agency user agreements, system audits, quality control reviews 
and established purge criteria.
    Florida's intelligence criminal systems are operated in 
compliance with standards established by 28 Code of Federal 
Regulations, Part 23. This regulation was written to protect 
the privacy rights of individuals and to encourage and expedite 
the exchange of criminal intelligence information between and 
among law enforcement agencies. The regulation provides 
operational guidance for law enforcement agencies in five 
primary areas.
    Prior to the September 11th attacks, Florida utilized 
factual data analysis on criminal investigations through the 
Financial Crime Analysis Center at the Florida Department of 
Law Enforcement. The Center integrates and analyzes financial 
data in partnership with local and Federal criminal justice 
agencies to identify and combat financial crimes.
    The Center has developed a ``data warehouse'' which 
contains information from various sources already available to 
law enforcement. As part of the analytical process, the Center 
utilizes specialized software to identify anomalies associated 
with financial transactions. Analytical personnel and 
investigators then examine the results to determine if the 
information is related to a crime. The software currently used 
by law enforcement agencies provides a graphical representation 
of suspicious activity identified by financial services 
companies. This method ensures that the user does not see 
individual records, only the result, a safeguard that we 
believe is very important.
    The pattern of behavior is a key element of the decision 
process of whether to investigate further. Users of this system 
are trained to identify behaviors of known criminal activity 
during all stages of money laundering. It is important to note 
that by FDLE guidelines, reasonable suspicion is necessary 
before initiating an investigation.
    When reasonable suspicion is developed, analyzed data are 
supplied to local State and Federal law enforcement agencies as 
well as to other States for possible investigation. This 
proactive approach results in increased team work amongst law 
enforcement entities as well as a force multiplier effect for 
the investigative process. FDLE agents regularly travel to 
other States to investigate common targets.
    Arizona and Florida are known as the two most effective 
States in conducting these types of proactive investigations.
    After the September 11th attacks, FDLE integrated this 
process and applied it toward the fight against terrorism. FDLE 
employed the assistance of public corporations that have access 
to civil data records. In certain domestic security related 
situations, FDLE has contracted with nationally recognized 
public search businesses to analyze the records based on 
criteria supplied by law enforcement. After the data is 
processed, the results are provided to law enforcement for 
further review. To ensure that the results are as indicative as 
possible, a mathematical analysis is used and includes as many 
as 14 criteria, producing a probability score for criminal 
behavior. Prior to additional investigation or dissemination, 
intelligence analysts and investigators examine only the 
results with the highest scores. This information can be used 
to identify, locate, target and monitor terrorists and other 
criminals. This ability is essential if future terrorist events 
are to be prevented.
    Florida has partnered with a vendor, Seisint Technologies, 
to provide the data analysis tools using both public and 
private data. Over several years, Seisint Technologies has 
acquired technology and data for multiple sources useful to law 
enforcement. Following the terrorist attacks of September 11th, 
Seisint focused on helping local State and Federal law 
enforcement agencies locate and track individuals who might be 
a threat to the United States. As a result of their partnership 
with Florida law enforcement, a customized investigative tool 
was developed. This system has already proven useful in that a 
review of the known information intelligence and reported 
activities of the 19 hijackers associated with the terrorist 
events of September 11th identified several common and 
associated variables. This system has proven useful in Florida, 
but the need for timely sharing and exchange of information 
nationwide remains a critical need.
    Mr. Putnam. Thank you Senator Dockery.
    [The prepared statement of Ms. Dockery follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.003
    
    [GRAPHIC] [TIFF OMITTED] T7229.004
    
    [GRAPHIC] [TIFF OMITTED] T7229.005
    
    [GRAPHIC] [TIFF OMITTED] T7229.006
    
    [GRAPHIC] [TIFF OMITTED] T7229.007
    
    Mr. Putnam. I would like to introduce our next witness, Dr. 
Jen Que Louie. He has spent over 25 years working with data 
analysis systems, specifically with large data base systems, 
data warehousing and data mining. Some of his projects include 
designing, developing, and refining military logistics and C3I 
capability models for the Department of Defense. He has 
designed and implemented medical system diagnostic and analysis 
programs, knowledge- and rules-based business systems, work 
flow process and analysis systems, image management storage and 
retrieval systems, and emergency management information 
systems. Dr. Louie is president of Nautilus Systems, which is 
located in Fairfax, VA. We look forward to your testimony. 
Welcome to the subcommittee.

 STATEMENT OF JEN QUE LOUIE, PRESIDENT, NAUTILUS SYSTEMS, INC.

    Dr. Louie. Good morning, Mr. Chairman and distinguished 
members of the subcommittee. Thank you for the opportunity to 
testify today on data mining current applications and future 
possibilities. Other than my prepared statement, this is a 
quick summarization of data mining in general.
    It is difficult to come up with a universal definition for 
data mining. One consistent focus of data mining has been 
basically that it is an analytic process with an ultimate goal 
of prediction. You are looking to find something that is going 
to be actionable, that is going to get you somewhere. In a 
nutshell, data mining is an extraction of knowledge or 
information from data. And at first glance, this may not seem 
like a very powerful utility, but unlike mere data, knowledge 
leads to incisive decisions and previously unknown 
relationships that could have a bearing on your decision 
process.
    Data mining, unfortunately, like artificial intelligence of 
the early eighties, is getting a lot of media hype and we will 
call it slightly exaggerated benefits or feasibility of it. And 
what I usually tell my clients is the first fallacy is data 
mining tools. Data mining is a process. It is not a specific 
tool, and the process will generally raise more questions than 
it does produce answers. And while data mining does have the 
ability to uncover patterns that can be remarkable, it still 
requires a human with skills, analytical skills, to interpret 
the meaning of what patterns you are looking at.
    And my usual examples are a Dilbert cartoon where the 
marketing person is telling the CEO, ``Our product is always 
seen with people who have flu-like systems.'' And the product 
development team is the reason they have flu-like systems; it 
is because they are taking the product. So how you interpret 
the data, how you apply it is an important part of how you 
apply data mining.
    Data mining is sometimes advertised and portrayed as being 
an autonomous process; that once you have these rules that you 
don't require analysts, and that is another fallacy. Another 
fallacy is that it will pay for itself very rapidly. While 
there is sometimes, we will call it articles, portraying very 
high returns for the investment in data mining, those are not 
very common. And yes, you can achieve a lot of return on your 
investment with data mining. Credit card fraud is one. Tax 
evasion is another. Money laundering. There are several tools 
that are out in the market that require a lot of extensive 
capabilities. Our company has worked with FinCEN on clearing a 
lot of their caseloads. Those, I would say, are great paybacks 
for the amount of money invested in those areas.
    Data mining also sometimes raises the question about 
missing data. Sometimes the data that's missing is more 
interesting than the data that is there, and that provides some 
other insights. Meeting your data mining expectations, planning 
is the single most important step in any data mining effort. 
You have to know and understand what the consumers of your 
information product need and basically deliver it. Once you 
determine what that is, the next thing in your investment in 
your data mining effort is the environment that you run it in. 
It should be what we call the best you can get, the fastest you 
can get, the most storage you can get, and always allow 
yourself plenty of time to review and analyze the data and look 
at all the facets that are there in order to determine that you 
are delivering the right message, and it is actionable in the 
direction that user needs that information to be.
    So, my quick summation: Data analysis is concerned with the 
discovery and examination of patterns and associations found 
with data. There are various ways to achieve this objective, 
but all share the same fundamental notion that patterns 
examined are present in the data. Also remember that what is 
not in data can be just as interesting in certain situations, 
and more useful to know.
    Data mining is a process that involves multiple analytical 
tools, methodologies driven by the needs of the information 
product's consumer. The quality of information is directly 
proportional to the trustworthiness and quality of that data. 
The confidence of the prediction is dependent upon the data 
mining practitioner's subject matter expertise and insight to 
deliver actionable results. The data mining process is highly 
computational, takes time; therefore, planning the approach and 
selection of tools is influenced by the needs of the consumer. 
Thank you.
    Mr. Putnam. Thank you very much, Dr. Louie.
    [The prepared statement of Dr. Louie follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.008
    
    [GRAPHIC] [TIFF OMITTED] T7229.009
    
    [GRAPHIC] [TIFF OMITTED] T7229.010
    
    [GRAPHIC] [TIFF OMITTED] T7229.011
    
    [GRAPHIC] [TIFF OMITTED] T7229.012
    
    [GRAPHIC] [TIFF OMITTED] T7229.013
    
    Mr. Putnam. Our next witness is Mark Forman. He served as 
Associate Director for Information Technology in E-Government 
for the Office of Management and Budget, a position he has held 
since June 2001. He is effectively in charge of information 
technology oversight for the entire Federal Government. And 
his--he has a background in the private sector from Unysis and 
IBM as well as work at the Senate Governmental Affairs 
Committee staff. He is an invaluable resource on all of our IT 
issues, and we believe his insight from the Federal perspective 
will be enlightening to us as well. So with that, Mr. Forman, 
you are recognized.

 STATEMENT OF MARK A. FORMAN, ASSOCIATE DIRECTOR, INFORMATION 
TECHNOLOGY AND ELECTRONIC GOVERNMENT, OFFICE OF MANAGEMENT AND 
                             BUDGET

    Mr. Forman. Thank you, Mr. Chairman, and members of the 
subcommittee. Thank you for the opportunity to appear and to 
discuss the administration's views on data mining. And I also 
want to thank you for taking a very rational, well-balanced 
approach in exploring data mining issues and opportunities. 
While there are many definitions of data mining, the 
committee's definition is generally accepted and we believe 
helpful in defining the issues and its challenges.
    I would like to start by talking about private sector uses 
how we are using it in the Federal Government, and then the 
challenges and opportunities. The private sector uses data 
mining to make sense of a wide breadth of data. Some examples 
are customer relationship management. Applied to customer 
relationship management, data mining is used to analyze 
disparate customer data and provide insights into customer 
needs and wants. Companies that use data mining shorten 
response time to market changes, which allows for better 
alignment of their products with the customer needs. They do 
this to increase revenue performance and allocate investment to 
products that meet customer demand effectively.
    Fraud detection. Companies use software that provide 
comprehensive transaction-level financial reporting and 
analysis to support automatic fraud detection and proactive 
alerting.
    Retail analysis and supply chain analysis. Companies such 
as Wal-Mart are broadly recognized for analyzing sales trends. 
Retail analysis and supply chain analysis can be used to 
predict the effectiveness of promotions, decide which products 
to stock in each store, and help managers understand cost and 
revenue trends in order to adjust pricing and promotion in 
anticipation of changes in marketplace conditions.
    Medical analysis and diagnostics. The health care industry 
uses analysis to predict the effectiveness of surgical 
procedures, medical tests and medications. High-risk segments 
of the population can be identified and targeted for proactive 
treatment. The result is improved quality of life for patients, 
reduced stress on hospitals and insurance providers using such 
activities as proactive approaches to healing, I think it is 
fair to say, and I have many more examples of the commercial 
use of data mining. All of them deal with how fast we can 
understand what customers need, and the Federal Government 
would be well advanced to be able to respond more quickly to 
what our citizens need.
    So I will turn now to the government applications of data 
mining and go through some of the examples and more of the 
effects, both the way we deal with the citizens and how we 
manage the government.
    The Federal Government analyzes data that has been 
collected from the public for several purposes, including 
determining the eligibility of applicants for Federal benefits, 
detecting potential instances of fraud, waste, and abuse in 
Federal programs and for law enforcement activities. Some of 
this analysis is facilitated by data mining.
    So let us talk through a few of the examples. First, 
financial management. Poor management practices create 
opportunities for a wide range of fraud and abuse in the use of 
government travel and purchase cards. Several agency inspector 
general investigations have used data mining-type tools to 
document inappropriate purchases and misuse of cards. OMB is 
taking and will continue to take substantive affirmative steps 
to ensure agencies improve their internal control systems to 
monitor expenditures appropriately.
    Human resource management. One of the 24 E-Government 
initiatives, which we call the Enterprise H.R. Integration, and 
which is managed by the Office of Personnel Management, is 
leading the effort to provide a governmentwide data warehouse 
of H.R. information to minimize the workload as employees move 
from one department to another. A key component of this is the 
E-Clearance project. OPM and its partner agencies on the E-
Clearance project are using data mining to more quickly access 
information which speeds up the overall security clearance 
investigation process.
    Reducing erroneous payments and fraud detection. Data 
analysis accomplished by the matching of electronic data bases 
between government agencies has been an important and 
successful tool for identifying improper payments under Federal 
benefit and loan programs, as well as detecting potential 
instances of fraud, waste, and abuse in the Federal programs. 
As highlighted in the President's 2004 budget, agencies are now 
required to report the extent of erroneous payments made in the 
major benefit program. Through the President's Management 
Agenda Initiative for improving financial performance, we are 
getting a hand on the problem of erroneous payments. 
Furthermore, the administration has proposed several pieces of 
legislation regarding the administration's authority to share 
data that will greatly improve efforts erroneous payments.
    Policy analysis. The quality of policy decisions is a 
function of our ability to correctly analyze enormous amounts 
of data that describe a problem faced by modern society. For 
example, the Department of Education mines data from a variety 
of student financial aid systems, permitting professionals to 
analyze Federal education programs quickly and easily without 
the time expense and burden on citizens.
    Law enforcement and homeland security. Federal agencies 
have found data mining techniques to be an important tool for 
assisting law enforcement in combating terrorism. For example, 
a system such as the Department of Homeland Security's Bureau 
of Customs and Border Protection operates the Automated 
Commercial Environment which utilizes a series of data mining 
tools to strengthen border security efforts.
    Benefits and pitfalls. While the use of data mining to 
access timely data and to identify relationships that were 
previously known as powerful tools for identifying errors, 
fraud, threats, etc., the application of such techniques to 
personal information raises serious questions about privacy and 
how it should be protected. In my written statement I focused 
on two areas. First, the data analysis must be consistent with 
law. We monitor that with business cases. Second, the Federal 
Information Security Management Act further requires protection 
of the data under security processes and techniques. Mr. 
Chairman, thank you.
    Mr. Putnam. Thank you very much.
    [The prepared statement of Mr. Forman follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.014
    
    [GRAPHIC] [TIFF OMITTED] T7229.015
    
    [GRAPHIC] [TIFF OMITTED] T7229.016
    
    [GRAPHIC] [TIFF OMITTED] T7229.017
    
    [GRAPHIC] [TIFF OMITTED] T7229.018
    
    [GRAPHIC] [TIFF OMITTED] T7229.019
    
    Mr. Putnam. For insight from a Federal agency that uses 
data pattern analysis, we have Gregory Kutz, Director of 
Financial Management and Assurance at the General Accounting 
Office. As a Director in the Financial Management Assurance 
Team, Mr. Kutz is responsible for financial management issues 
relating to the Department of Defense, NASA, the State 
Department, and AID. He has also been recently involved in 
preparation of reports issued by GAO and testimony relating to 
credit card fraud and abuse at DOD, financial and operational 
management issues at the IRS, financial condition and cost 
recovery practices of the Department of Energy's Power 
Marketing Administration, the Tennessee Valley Authority, and 
AMTRAK.
    You have been very busy. We look forward to your testimony.

 STATEMENT OF GREGORY KUTZ, DIRECTOR, FINANCIAL MANAGEMENT AND 
           ASSURANCE, U.S. GENERAL ACCOUNTING OFFICE

    Mr. Kutz. Thank you, Mr. Chairman, and members of the 
subcommittee. I'm here to talk about our use of data mining in 
audits of Federal programs. To date we have used data mining 
primarily as an integral part of our audits of credit card 
programs.
    My testimony has two parts: First, the use of data mining 
in our audits and investigations; and second, future uses of 
data mining and related challenges.
    First, our strategy is to use data mining to put a face on 
issues of breakdowns in internal controls. It allows us to go 
beyond simply saying that a program is vulnerable. For example, 
data mining allowed us to report that government credit cards 
were used for escort services, women's lingerie, prostitution, 
gambling, cruises, and Los Angeles Lakers tickets.
    Our data mining has helped us to identify specific 
instances of fraud, waste, and abuse. The posterboard shows 
several examples of government travel card abuse that we 
identified through data mining, including the purchase of a 
used car from Budget Rental Car; adult entertainment charges, 
including gentlemen's clubs; Internet and casino gambling, 
including an individual who charged $14,000 to pay for his 
blackjack gambling habit and reimbursed travel money used to 
pay for closing costs on a home purchase. For each of these 
examples, we used various data mining inquiries to identify the 
transactions and completed the case with auditor and 
investigator followup.
    The second posterboard is an excerpt from a government 
purchase card statement. As you can see, somebody went on a 
Christmas shopping spree. This bill, which includes nearly 
$12,000 of fraudulent charges, was identified using data 
mining. We identified these fraudulent transactions because of 
the suspicious vendors and because of the timing of the 
transactions. We used these findings in conjunction with 
systematic internal control testing to make recommendations to 
Federal agencies to develop effective systems and controls that 
provide reasonable assurance that fraud, waste, and abuse are 
minimized.
    An important element of our success with data mining is the 
synergy of auditors and investigators working together. Our 
auditors have expertise in financial systems, data 
manipulation, and evaluating internal control systems. Our 
investigators bring a much different perspective. For example, 
Special Agent Ryan, who is with me today, has several decades 
of experience working on financial crimes for the Secret 
Service. Investigators and auditors work together to assess 
system vulnerabilities and develop our data mining strategies.
    Moving on to my second point, our data mining work for the 
Congress is expanding. Currently, we have a number of audits 
underway that use data mining, including nine that I am 
directly responsible for. Some examples of our expanded data 
mining audits include DOD vendor payments, Army military pay 
systems, HUD housing programs and Department of Energy national 
laboratories. As we move forward, challenges will include data 
reliability and security issues.
    For the credit card work to date, we have used commercial 
bank data bases to do our data mining, which we found to be 
highly reliable. However, as we move beyond the credit cards, 
one major challenge is the poor quality of Federal Government 
data bases. In most cases, data base quality issues can be 
overcome, but they result in less productive data mining and a 
greater cost to our work.
    Data security and privacy protection is another challenge. 
For example, in handling large data bases of credit card 
transactions, we developed strict protocols to protect this 
sensitive data. We were especially concerned with protecting 
credit card account numbers and individuals' Social Security 
numbers. Data security issues must be addressed before 
embarking on audits involving data mining.
    In summary, data mining is a powerful tool that has 
increased our ability to effectively audit Federal programs. We 
are just beginning to make full use of data mining strategies. 
With the right mix of technology, human capital expertise, and 
data security measures, we believe that data mining will 
continue to improve our audit and investigative work for the 
Congress. Mr. Chairman, that ends my statement.
    Mr. Putnam. Thank you Mr. Kutz. And I want to thank all the 
witnesses for being so gracious and complying with our time 
limitations.
    [The prepared statement of Mr. Kutz follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.020
    
    [GRAPHIC] [TIFF OMITTED] T7229.021
    
    [GRAPHIC] [TIFF OMITTED] T7229.022
    
    [GRAPHIC] [TIFF OMITTED] T7229.023
    
    [GRAPHIC] [TIFF OMITTED] T7229.024
    
    [GRAPHIC] [TIFF OMITTED] T7229.025
    
    [GRAPHIC] [TIFF OMITTED] T7229.026
    
    [GRAPHIC] [TIFF OMITTED] T7229.027
    
    [GRAPHIC] [TIFF OMITTED] T7229.028
    
    [GRAPHIC] [TIFF OMITTED] T7229.029
    
    [GRAPHIC] [TIFF OMITTED] T7229.030
    
    [GRAPHIC] [TIFF OMITTED] T7229.031
    
    [GRAPHIC] [TIFF OMITTED] T7229.032
    
    [GRAPHIC] [TIFF OMITTED] T7229.033
    
    [GRAPHIC] [TIFF OMITTED] T7229.034
    
    [GRAPHIC] [TIFF OMITTED] T7229.035
    
    [GRAPHIC] [TIFF OMITTED] T7229.036
    
    [GRAPHIC] [TIFF OMITTED] T7229.037
    
    [GRAPHIC] [TIFF OMITTED] T7229.038
    
    [GRAPHIC] [TIFF OMITTED] T7229.039
    
    [GRAPHIC] [TIFF OMITTED] T7229.040
    
    Mr. Putnam. Our final witness is Jeffrey Rosen, a law 
professor at George Washington Law School. Mr. Rosen's area of 
expertise is in privacy and technology issues. He has written 
dozens of articles on the subject as well as a book. His 
testimony will be valuable as we look to the legal and ethical 
questions surrounding the use of data mining technology. 
Welcome.

 STATEMENT OF JEFFREY ROSEN, GEORGE WASHINGTON UNIVERSITY LAW 
        SCHOOL, LEGAL AFFAIRS EDITOR OF THE NEW REPUBLIC

    Mr. Rosen. Thank you, Mr. Chairman, and members of the 
subcommittee. It is an honor to be here. I am delighted that 
you are holding this hearing because the effort to strike a 
balance between privacy and security is a bipartisan issue and 
I am delighted that you are informing yourself about the 
complicated legal and technological choices that you face as 
these technologies are implemented.
    My thesis this morning is simple: It's possible through law 
and technology to design data mining systems that strike better 
rather than worse balances between privacy and security. But 
there is no guarantee that the executive branch will demand 
them or the technologist will provide them on their own. You 
therefore, ladies and gentlemen of the Congress, have a special 
responsibility to provide legal and technological oversight to 
ensure that the technologies are developed and deployed in ways 
that strike a good rather than a bad balance between privacy 
and security.
    Let me give you an example of the kind of design choice 
that I have in mind. And I want to focus just for the sake of 
argument on the Total Information Awareness Program that 
Congress has recently decided, at least for the foreseeable 
future, to block. Total information awareness provides a model 
for the kind of mass dataveillance that we have been discussing 
this morning and is being proposed in other contexts. Now, just 
a question of definition, ``mass dataveillance'' refers to the 
suspicionless surveillance of large groups of people. And that 
is different from personal dataveillance of the kind that 
Senator Dockery described which involves targeted surveillance 
of individuals who have been identified in advance as being 
unusually suspicious. Mass dataveillance poses special dangers. 
In some ways it poses some of the same dangers of the general 
warrants that the framers of the fourth amendment to the 
Constitution were especially concerned about prohibiting.
    When the government engages in mass dataveillance without 
individualized suspicion, there is a danger of unlimited 
discretion, as the government searches through masses of 
personal information and searches suspicious activity without 
specifying in advance the people, places, or things it expects 
to find. Both general warrants and mass dataveillance run the 
risk of allowing fishing expeditions in which the government is 
trolling for crimes rather than particular criminals, violating 
the privacy of millions of innocent people in the hope of 
finding a handful of unknown and unidentified terrorists. At 
the same time there is an important question of effectiveness.
    And I want you to think pragmatically about these 
technologies. Will they work in the national security arena? 
Unlike people who commit credit card fraud of the kind that Mr. 
Kutz described, credit card fraud is a form of systematic, 
repetitive, and predictable behavior that fits a consistent 
profile identified by millions of transactions. There is no 
special reason to believe that terrorists in the future will 
resemble those in the past. By trying to pick 11 out of 300 
million people out of a computer profile, you may be looking 
for a needle in a haystack, but the shape and the color of the 
needle keep changing and, as a result, the profiles may produce 
great numbers of false positives: those people wrongly 
identified as terrorists.
    I want you to think about the privacy issues and the 
effectiveness issues. Does the technology that works in a 
credit card arena make sense to apply in the national security 
arena? Assuming that these technologies will be deployed in 
different spheres, I urge you to recognize that they can be 
designed in better or worse ways. The Total Information 
Awareness Office itself recognized this and proposed technology 
that it called ``selective revelation,'' which proposed to 
minimize personally identifiable information while allowing 
data mining and analysis on a large scale. The insight of 
selective revelation is useful and may provide models for ways 
privacy and liberty could be protected at the same time.
    The Total Information Awareness Office had a project called 
Ginisys that was exploring ways of separating identifying 
information from personal transactions and only allowing the 
link to be recreated when there is legal authority to do so. 
This might allow, for example, the Centers for Disease Control 
to have access to medical information while other groups do 
not.
    Using this model of selective revelation, Congress could 
think about creating laws and technology that separate 
identifying information from the data itself.
    And Mr. Forman talked about the searches in existence with 
current law. My strong belief is current law is not adequate, 
the kind of complicated regulation that faces us, and you need 
to think creatively about rising to this new challenge by 
developing new oversight bodies and new technologies to ensure 
the protection of privacy. But just hypothetically we could 
imagine what those regulations would look like. Congress could 
create a special oversight court with the authority to decide 
when identifying data obtained during mass dataveillance may be 
connected to transactional information. After intelligence 
analysts have identified a series of transactions that they 
think might be evidence of a terrorist plan or suggest that a 
particular individual is unusually suspicious, they could 
petition the oversight body for authorization to identify the 
individuals concerned. In deciding whether or not to grant the 
request, Congress could direct the court to satisfy itself that 
the crime for which the evidence has been presented is a 
serious threat of force or violence rather than a low-level or 
trivial crime, and that the evidence suggests a link between 
the suspects and terrorists. If the court granted the order, 
then the analyst could link the identifying information and 
they could share the information with State and local bodies 
and so forth.
    And there are other needs for regulation. You might have to 
create standards for citizen oversights. Citizens should be 
able to correct their data if it's incorrect or misused. And 
fair information practices would give citizens the right to 
know the information that the government has collected. So, you 
see the general model. The search is anonymous unless there is 
cause to believe that a particular individual is suspicious, 
and then there is oversight to make sure that the individuals 
are identified in connection with serious crimes. Merely to 
describe the complexity of this regulation is to raise 
legitimate questions about whether Congress is ready to adopt 
them.
    But Congress has met its oversight responsibilities in the 
past. The most important checks on poorly designed technologies 
of surveillance since September 11th have come from Congress 
ranging from the decision to block total information awareness 
in its current form to the insistence on creating oversight 
mechanisms for the Carnivore e-mail program. I urge Congress to 
accept the task of learning about the design choices inherent 
in these technologies. You have it in your power to strike a 
balance between liberty and security, and all you need now is 
the will. Thank you very much.
    Mr. Putnam. Thank you Mr. Rosen.
    [The prepared statement of Mr. Rosen follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.041
    
    [GRAPHIC] [TIFF OMITTED] T7229.042
    
    [GRAPHIC] [TIFF OMITTED] T7229.043
    
    [GRAPHIC] [TIFF OMITTED] T7229.044
    
    Mr. Putnam. I certainly believe our witnesses have set the 
table and created an environment for some outstanding dialog.
    The gentlelady from Michigan has another appointment so I 
will recognize her to lead off with our questions.
    Mrs. Miller. Thank you, Mr. Chairman. I think my question 
is for Mr. Kutz.
    As I heard you talk about some of the various audits that 
your agency is currently engaged in, you talked about nine 
different audits that you are getting involved with, Energy 
labs and DOD, etc., and certainly the testimony you gave about 
the credit card fraud is startling. It is sickening. Those are 
the kinds of things I think make people crazy about what is 
happening at the Federal level. But you know, last week the 
Congress had a very exhaustive debate about a budget resolution 
and there was a lot of talk about waste, fraud, and abuse and 
the kinds of problems in large numbers numerically that we 
could get at to look at some reduction in our budgeting 
process.
    And I heard a lot of conversation last week--and I don't 
know if this is one of your nine universes or not--but in the 
area of Social Security, that there is as much as 10 percent of 
the Social Security payments that are going to people who are 
either deceased or for some reason do not qualify. And I don't 
know if that is an area that you are auditing in your universe 
there; and, if so, what kind of numbers are we talking about 
and how would you do a construct to do the data mining? Do you 
have any idea of how you might begin to proceed to take a look 
at that type of waste, fraud, and abuse?
    Mr. Kutz. Social Security is not one that we have on our 
plate right now. We typically do our work at the request of 
various Members of Congress or committees or subcommittees, and 
that is not one we have been asked to do at this point.
    Some of the ways you can use the technology for that, for 
example, have been used by the Inspector General to look for 
people who are receiving benefits that are over 90 or 100 years 
old, and those are potential indicators of a family that might 
be keeping the checks and didn't report the death to Social 
Security and therefore received improper payments.
    There are certainly lots of different queries and methods 
you could use. And I believe the Inspector General has done a 
lot of that, and I believe it has been used extensively there.
    Also for Medicare, there has been extensive use of data 
mining technologies to find fraud, waste, and abuse and also to 
project the amount. Annually, the various agencies project how 
much is going out the door in improper payments and, as you 
know, there are tens of billions of dollars. And we are talking 
about real money here, which is why we need good internal 
control systems to minimize this waste, fraud, and abuse.
    Mr. Forman. If I may, let me point out two projects in 
particular. One is 1 of the 24 E-Government Initiatives that is 
called the E-Vital Project. And so much of this is tied to, for 
example, the Social Security Administration getting timely 
notification when a person has passed on. That is explicitly 
the target of the E-Vital Project that continues to have good 
traction in the States that have been moving the death records 
and other medical records on-line. It is a slow process. And as 
you may recall, Michigan may have been one of the States. The 
State has charged the agency to provide that information to 
them. So there is some negotiation, because the cost should be 
reduced when we put in place that as a computer system.
    The other project is called PARIS, the Public Assistance 
Reporting Information System, and that is a joint Federal/State 
information network that was set up explicitly to allow for 
data matching and mining on interagency-related benefits 
program. So that would cover things like Supplemental Security 
Income, the TANF program, Medicaid, Food Stamps, and Veterans 
Affairs Program.
    Mrs. Miller. In regards to the Social Security link that 
the States have as they interact with the Federal Government, 
isn't it true now--because I think every State is required to 
solicit the Social Security number of every licensed driver--
that is something new in the last several years, and all of the 
States are required to link to the Social Security 
Administration because of that? Has that been helpful in 
information sharing?
    Mr. Forman. You know, to be quite honest, I think 
ultimately, while there is a requirement to share information, 
the reality is a big chunk of the benefit here in terms of 
identifying people who are getting Social Security income but 
have passed on comes back to the ability of States to share 
information on the death certificates in a timely manner. And 
some of the States and local county offices where that 
information initially starts just haven't been electrified yet.
    Mrs. Miller. My experience had been with the Social 
Security link that we had in Michigan--I know some of the other 
States were mentioning this as well--there was no way to verify 
the Social Security number, so someone could give you any 
digits that they wanted to. There was no way for the States to 
verify that the Social Security number was in fact a valid 
Social Security number. That is a problem, I think.
    Mr. Forman. There has been some progress made on that, and 
I know we looked at this a month ago when we did a review. I 
would ask, if it is OK with the chairman, that we get back to 
you on the Social Security Administration progress on that.
    Mr. Putnam. We have been joined by the big Chair, the 
chairman of the full committee. Mr. Davis, do you have any 
comments or questions?
    Mr. Davis. I will be very brief. I think data mining is 
critical. If you go back 100 years, a visionary at the start of 
the 20th century might have said, what is going to guide the 
economy in the 20th century? The visionary might have said, 
oil. And in fact, it was your entrepreneurs and your 
visionaries who figured out how you get the oil, identified 
where the oil was, how you get it out of the ground, how you 
refine it, how you get it to markets, dominated much of the 
economic activity of the 20th century.
    Here we are at the start of the 21st. What would a 
visionary say now? Really, the oil today is information. How 
would we get that information and get it out of the ground, so 
to speak; how do we refine it; how do we distribute it; what 
uses does it have? And it is those entrepreneurs that are going 
to in large part be the economic wunderkinds of the 21st 
century. Had we had the EPA and all of the regulations on oil 
in 1900, this stuff would still be in the ground. We never 
would not have gotten it out.
    My theory is we need to be slow about it coming in and 
overregulating. You let the marketplace and let the public and 
let the industry come up with its own protocols before the 
government comes in and starts imposing a regulatory and taxing 
regime that could stifle the growth and the potential for this. 
That is kind of the way I look at it. Certainly there is going 
to be a role for government down the way, and maybe in ways we 
don't even envision today, because I think we are just at the 
very beginning of a whole revolution. But that is kind of the 
way I have looked at it.
    And I don't know if you have any reaction. Mark Forman has 
been working with us on a number of issues. I don't know if 
anyone wants to react with that or disagree. Obviously, the 
professor is here and has his own view.
    Mr. Rosen. I guess I would just urge the chairman to ask 
whether the kind of data mining that is appropriate in the 
private sphere can be brought into the national security arena. 
Much of the history of our privacy laws for the past 50 years 
has been based on the idea that completely unregulated 
information sharing is not consistent with the values of the 
Constitution or of American citizens. We don't want every low-
level information officer in the field to know that I had a 
youthful indiscretion or I am late in my child support payments 
before I go onto an airplane, or that I am late on my credit 
card or maybe I have some IRS issues against me.
    Complete transparency of information, total unregulated 
use, which is what many Silicon Valley people are urging, 
wouldn't be consistent with the value of the fourth amendment. 
It wouldn't be consistent with current privacy laws which 
prohibit privacy sharing without good cause, and it also--and I 
want to urge the chairman to think about it--would it be 
effective? Is there any reason to believe that centralizing all 
of our public and private data bases and allowing for a risk 
prediction to be made would identify terrorists?
    It is not like credit card fraud. Credit card fraud is 
something you have 10 million examples of it and it takes 
predictable patterns. People who steal credit cards test them 
at service stations and then buy clothes at a mall. And because 
it happens so often, you can use the technology to predict 
credit card fraud.
    We have no reason to believe that the next terrorist attack 
is going to take the place of people who lived in Florida and 
went to flight schools. It could take many forms. I respect 
your libertarian instincts and the desire to use this 
technology as effectively as possible. I just would say that if 
you, the Congress, doesn't stand up for Constitutional values 
to ensure inefficiencies as well as centralization, I don't 
think the technologists of the executive branch will either.
    Mr. Davis. Most of this information has been public. It has 
just never been able to get collated and so rapidly deployed 
and disseminated. That's what scares people. It is something in 
the old days that could have taken 10 private detectives 6 
months going through records to find you can get like that.
    And as you spoke of in your testimony, it is a balance 
issue; and I don't know what that right balance is, but I am on 
the go-slow side rather than the overregulation side. We know, 
for example, that the terrorists on September 11th--the 
information that was out there between flight schools and 
arrests and Immigration. Had we been able to collate that 
information and get it in one place, we could have prevented it 
from happening.
    And some of you view this as an infringement on privacy, 
but I don't know what you say to the victims and the families 
of over 3,000 people that died that day. I don't know what the 
right balance is, and I agree, and that is why we need to hear 
from you and keep you at the table as we work our way through 
this brand-new territory. And that is why we appreciate you 
being here.
    And I am not sure we have that right balance today. And I 
am not sure, given the technologies that we have today, that we 
can even start writing rules, because who knows what 
technologies will be deployed and invented tomorrow that we may 
not be able to have any idea what their application could be? 
And I appreciate everybody's input and I appreciate you holding 
this very important hearing.
    Mr. Putnam. I believe the Senator had a response.
    Ms. Dockery. Thank you, Mr. Chairman, and I just wanted to 
comment that I agree very much with the Congressman, 
Congressman Davis, and to comment to the professor, we in 
Florida believe that the factual data analysis that we are 
using now is appropriate for tracking down terrorists, and we 
also believe that it led to the arrest recently of--a national 
news story you may have heard about of a professor at 
University of South Florida. And that was done through 
collection of information that was all part of our public 
records in the State of Florida that showed some connections.
    So we think that this is a valuable tool and we think we 
have shown in Florida its criminal possibilities. I will say 
that in Florida, we have one of the most open record laws in 
the country. We call it ``Government in the Sunshine,'' and it 
is kind of interesting that the people in Florida just in the 
past election voted a Constitutional amendment to require that 
anytime we provide an exception to the open records law, it 
would now require a two-thirds vote of both the House and the 
Senate to make that exemption. The open public records law 
actually helps law enforcement in Florida by making more and 
more records available for us to use in our factual data 
analysis.
    So to that extent I wholeheartedly support Congressman 
Davis's comments and would tell you that we probably need some 
regulation to prevent us from going overboard and to protect 
the forth amendment rights, but we should err on the side of 
allowing the technologies to prove themselves out before we 
overregulate an industry that is just beginning.
    Mr. Putnam. For the professor and anyone else who would 
like to respond, how would you compare data mining technology 
to the emerging technology of DNA as a law enforcement tool 25 
years ago?
    Mr. Rosen. I think DNA offers greater security benefits and 
fewer privacy threats for this reason. DNA is usually used in 
the kind of focused investigation of the kind that Senator 
Dockery was just suggesting: You have a clue and you can plug 
it into a data base and it can be used to exonerate or 
inculpate. And as long as there are restrictions on the use of 
DNA for secondary purposes, the government can't turn it over 
to insurance companies to deny me a job or make predictions 
about my future health, I don't have privacy concerns about it.
    Data mining, by contrast, of the kind that Roger Clark 
calls ``mass dataveillance'' rather than ``personal 
dataveillance,'' poses very different privacy issues. And I 
want to distinguish the two, because Senator Dockery just 
talked about how useful it is once you know something about an 
individual. This USF professor, you can plug him into a data 
base and draw connections. That is the same thing that was done 
with the sniper. When you have the tip in Alabama and plug it 
into the data bases and establish connections, that is useful 
and that doesn't raise grave privacy concerns because the 
individual has been identified in advance as suspicious.
    My concern is the kind of mass dataveillance, not only the 
total information awareness level, but the profiling systems 
that are being proposed at airports. And the reason I am 
concerned about them, this is the surveillance of the data of 
millions of innocent citizens. And it's just not a little bit 
of data. If the projects go forward, there are credit card 
records, phone calls, tax records, all public and private data; 
mass risk predictions based on this that could be used to 
prosecute people not for terrorism--which I'm all for--but for 
very low-level crimes.
    It is that kind of fishing expedition--it is the example of 
an unconstitutional search. At the time of the fourth 
amendment, what the framers were most concerned about was 
breaking into everyone's house looking for enemies of the 
government, reading their private diaries, looking at innocent 
information, in the course of seeing whether or not they were a 
critic of the king, and then arresting them for whatever you 
found in their House. That was a general search and it was 
unconstitutional because it exposed a lot of innocent 
information while looking at guilty information. That is what 
mass dataveillance does. And that's why, without Constitutional 
restrictions, I don't see how we could deny that there are 
privacy concerns.
    Mr. Putnam. A recent New York Times article, a Dr. Gilman 
Louie, CEO of InQTel, outlined in a recent speech two different 
approaches, one which he identified as the data mining approach 
which results in what he calls watch lists and what he 
indicated was too blunt an instrument; the second being data 
analysis which begins with some type of investigative lead and 
then uses software to scan for links between a person under 
investigation and known terrorists. I presume that is an 
approach you are advocating?
    Mr. Rosen. I like that approach and I respect Mr. Louie, 
who is sensitive to these issues, and he is distinguishing 
between focused data mining based on individualized suspicion 
and mass dataveillance.
    And the same model interestingly has been taken by the 
Foreign Intelligence Surveillance Court. Just yesterday the 
Supreme Court decided not to review that decision of the 
Foreign Intelligence Surveillance Court that said we don't have 
to worry about broad surveillance of people who have been 
identified in advance as agents of foreign powers because we 
suspect that they're bad guys. And if we then find that they're 
guilty of lower level crimes it's good to get them off the 
streets because we're pretty sure that they're suspicious. 
That's different, said the Foreign Intelligence Surveillance 
Court, from using this mass dataveillance to look at everyone 
without any cause to suspect them and going after them for 
lower level crimes.
    So I'm glad that Mr. Louie, who is at the forefront of the 
government's effort to merge technologies that have been 
developed in the private sector and apply them in the national 
security area, is sensitive to that distinction, too.
    Mr. Putnam. Let me direct that to our witness, Dr. Louie, 
who is not the person I was just quoting. You indicated in your 
testimony that data mining is a process, not a tool. Please 
elaborate on that in the context of Mr. Rosen's comments.
    Dr. Louie. Data mining goes--some of the focus that I keep 
hearing is the emphasis going back to patterns. Data mining 
deals with patterns, but I think the term ``patterns'' needs to 
be expanded a little bit to understand in terms of other ways 
of interpreting a pattern. A pattern can also be a series of 
events. A led to B, B led to C, and on down the line. If we are 
planning a--we'll call it a filtering mechanism to look at 
everybody, you have to establish some parameters of saying if 
we are looking for people who buy large quantities of potassium 
nitrate fertilizer and they are not in agriculture or 
landscaping and the like, maybe that should raise a flag. But 
all it does is just put up a flag, says this is of interest. 
And then if other events or other ties go back to it, then that 
should, we'll call it, raise a level of suspicion that maybe 
forwards it to somebody else to review. I think that's the way, 
we will call it, data mining in general can be applied in terms 
of looking for potential terrorists, whether it be something 
like Oklahoma City or something like September 11th.
    In terms of September 11th here we have another potentially 
interesting, we will call it, information exchange of 
Immigration's data base or when they applied for visas was, 
we'll call it, a little bit more broader in their perception of 
how they looked at the information coming in for, let's say, 
applications of visas. We have, we'll call it, the linguistic 
issue of how do you spell the name, what are the variations of 
the name, variations being, let's say, diminutive form of the 
name or a, we'll call it, a common substitution, Robert for 
Bob, John for Jack, you know, and down the line. If we had a 
way to compare that and also previous visas, abbreviations of 
the names, transposing of the name that would have identified, 
had these people come through our visa process before, where 
did they go, did that raise any suspicions.
    That's the way I see data mining being applied in terms of 
broad, we'll call it, filtering of information. Not tracking 
somebody necessarily, but raising, we'll call it, levels of 
questionable flags or activities that may lead to something. 
That way you are not tracking an individual, you're just 
tracking recent events. If that event tracks out and says all 
these events lead up to a suspicious activity, then we can go 
back and say, OK, where did all these names come in or what is 
the relationship of that. And that's up for the analysts. It's 
the same way we track money laundering, we track bank accounts. 
The banks are required to report any transaction of $10,000 or 
greater. So if I deposit $ 9,999 it's not going to trip the 
flag. But if, let's say, at the bank level they consolidate the 
end of the day receipts and they see that account exceeded that 
$10,000 maybe it should just raise a flag and make FINCEN aware 
that there was a transaction, didn't meet the criteria but it's 
just something maybe to watch. Either the bank watches it or 
FINCEN watches it.
    But that's the way I see you apply data mining. And in 
terms of--I believe that was Gilman Louie from In-Q-Tel.
    Mr. Putnam. Yes.
    Dr. Louie. I agree with his prospect and the way he 
outlines the way we should look at it. Data mining is an inert 
tool. You can take very thin slices and basically create a 
sandwich of a nice depth in order to act upon. And that's where 
we use the term ``actionable information.'' And one slice of 
information in itself, it may be totally insignificant and of 
no value. But it's the cumulative process of all the 
associations associated with that data point that become 
interesting. And you don't have to store it. You just have to 
essentially flag it. And when we have enough flags that trip, 
we'll call it, your suspicion level, then you look at it. You 
don't necessarily take an action on it, but evaluate it. And 
that's where the human aspect or the analysts and subject 
matter experts in that area can say this does look suspicious 
or this should be maybe questioned.
    Mr. Putnam. Mr. Forman.
    Mr. Forman. I think it's incredibly important to keep in 
mind that data mining is a productivity tool. Yes, it's part of 
a process, but at the end of the day our decision has to be is 
that a process that we want to have that is a more productive 
process. And that's, I think, one of the big differences to 
understand about the Total Information Awareness Initiative. 
That's an R&D project. That is not a Federal IT program. And 
when it hits the stage where somebody says, geez, we ought to 
buy something, it falls into the process by which we put out 
the standards associated with the business case. Are we going 
to get any productivity out of it?
    I have always kept in mind early in my years when I did a 
lot of data analysis and operations research this notion of 
garbage in, garbage out that Dr. Louie raised. I am very, very 
mindful, especially in this area of homeland security, where we 
have got dozens of data bases, merely hooking them together and 
applying an algorithm is not going to make the data there any 
better. Even so, merely allowing those islands of automation to 
exist and the business process that run off of those islands of 
automation aren't going to give us any greater homeland 
security. The core and the issue here is to find out do we have 
a better way, as we see in Florida, for the investigators to do 
their work. And are we happy that this is appropriate, given 
the Privacy Act, given the other laws that cover that. And 
there is a policy decision to be made there. That now is 
clearly required to be addressed in the business case process 
under the E-Government Act, and under OMB guidance we are 
updating it to comply with that.
    Mr. Putnam. Anyone else wish to comment on that? With 
regard to the private sector, is there an industry standard out 
there that is being used to guard privacy and security of the 
information in the data mining process? Solely in the private 
sector. Is there a single industry standard?
    Dr. Louie. There are no unified business industry 
guidelines as far as, we'll call it, protecting the privacy of 
the data. I think that most of our clients have relied on us to 
devise a, we'll call it, a privacy statement of how we are 
going to handle data, how we are going to handle the physical 
storage as well as dissemination of the information and how--
who will actually get to see and touch it. That's something 
that we have devised as being the consultants or the 
practitioners to different companies. But there are no formal 
guidelines. We have adapted the, we'll call it, guidelines as 
specified by the Society of Competitive Intelligence 
Professionals in terms of saying, OK, this is how we will 
handle the data. This is how we will ensure our clients' 
privacy and we will try to abide by that as a form of ethics.
    Mr. Forman. I would say from the standpoint of what we have 
seen, there are two standards that have existed over the last 
couple of years. Opt in and opt out. And I know we have looked 
an awful lot at those standards to see what would be 
appropriate for the Federal Government. Opt out being a company 
tells you you have got this data: If you want to continue with 
this on-line service or continue as a customer with us, we are 
going to show the data unless you tell us not to. And opt in is 
essentially like we see with the little cards at the Giant 
grocery store chains. If you get this card you get a lot of 
discounts; in return you give us information about your buying 
habits. And those discounts give you better products and so 
forth. And so, how the data is used and how the option is 
available to the consumer, I think they still have a couple of 
common standards that have been around for a couple of years.
    Mr. Putnam. Mr. Rosen.
    Mr. Rosen. But opt in and opt out wouldn't begin to be 
adequate to the challenge of the regulation you're thinking 
about now because much of this is data that you can't opt out 
of sharing. It's data such as credit card purchases that goes 
automatically to warehouses like TRW or telephone calls that go 
to the telephone company and that the court has held are not 
legally protected because of the circular reasoning that you 
voluntarily turned the information over for one purpose and 
can't withhold it for another. So I'd gather the kind of 
regulations that you want to be thinking about are the 
patchwork of laws that do currently regulate information 
sharing in the private sector, such as the Fair Credit 
Reporting Act that would prohibit the kind of personally 
identifiable financial information that can be shared. As I 
understand several of the data mining proposals, such as the 
Total Information Awareness Program, in its original form there 
was a suggestion that those laws should be relaxed and that the 
government should have access to data that's currently 
restricted by law, such as personally identifiable credit card 
information that can ordinarily be shared and the records of 
international telephone calls that are regulated by other 
statutes. So I wouldn't--with respect to the effort of using 
private sector regulations as a model to guide you in the new 
world that you face in Federal data mining, I don't think that 
a simple opt in standard which is based on this voluntariness 
notion would begin to do the trick. And that's why I think at 
some point you may down the line have to think about 
comprehensive reform at the level of the Privacy Act, which has 
proved inadequate for regulating the kind of things we are 
talking about now.
    Mr. Putnam. Speaking now about the public sector, what 
level of information sharing is currently allowable by law 
within and between all government agencies without a special or 
a specific warrant or request for that information? In other 
words, how much information sharing is there between HUD, VA, 
HHS, INS now from a technical potential and from a legal 
potential.
    Mr. Forman. There's very little information sharing. This 
issue came up about a year ago with the concept after program 
that was called gov.net, and there was a fear for cyber 
security purposes that we had to protect the sharing of 
information between agencies, and we found out there was 
virtually no sharing of information between agencies. There 
generally, it gets back to this issue that each agency built 
its own data base, it's own data store, if you want to use the 
parlance of today's hearing, to support its own mission. And 
the question is, when can you look across the agencies, when is 
there a need? Going back several years, two decades almost in 
the scientific community, there was sharing probably most 
extensive as it relates to what we now call geospatial 
information or geographic information systems. There are 
generally requirements associated to that that we handle via 
the computer security rules and models and the business case 
practices. Where we have seen a ramp-up of sharing between 
agencies has been in the data management area that I've alluded 
to in my testimony, and that happens to be with these major 
Welfare programs and it is generally by the PARIS Project. 
There's been explicit congressional authorization, literally 
laws authorizing that. We have asked for some additional legal 
authorities or additional data sharing, a creation of the 
matching data base that has current job data, but even that is 
only updated quarterly. We probably could do better than that.
    Mr. Putnam. So would a successful data mining or factual 
data analysis project that was attempting to identify a 
particular profile of a terrorist, for example, would they be 
able to access any and all Federal Governmental data bases 
without a specific change in the law? Or would they be able to 
do that as a result of the law's silence on the topic? First 
part of the question. The second part of the question is, as a 
technical matter, could it actually be done?
    Dr. Louie. On the technical side I say we could do that. We 
have for several government agencies, but the technical side of 
making it happen is not really the problem. The problem is the 
quality and trustworthiness of the information that's in those 
data bases, is I would say poor to--you know, it is amazing 
that they can conduct business.
    Mr. Putnam. Senator Dockery.
    Ms. Dockery. Thank you, Mr. Chairman. In Florida we require 
reasonable suspicion to be developed before we use factual data 
analysis, and then we abide by the standards established in 28 
Code of Federal Regulations. To answer your question about 
sharing intelligence information, Florida deals well with 
sharing information with other States. In fact, there's a pilot 
project, the Multistate Antiterrorism Information Exchange, 
called MATRIX, which is going to consist of 13 States in this 
pilot project. Our problem has been to share information with 
the Federal Government, both in terms of us willingly giving 
you information and you not being able to receive it and us 
trying to receive information from the Federal Government.
    One case in point, Florida has 16 million residents, but 60 
million tourists. We have a lot of people moving through the 
State and it would be very helpful to us if we could access the 
visa data base, particularly if we could have access to anyone 
who may be in Florida who has overstayed their visa and that 
could lead to a lot of useful information in making these 
connections. We do not keep dossiers on individuals. We look 
for linkages based on reasonable suspicion in assorted events 
and then we look for those linkages. Then just as soon as we 
see them they're gone. So it is not a matter of starting a file 
on an individual. It's looking at an activity and trying to 
find who had some access to something involved within that 
activity. But it would be very helpful to us and to other 
States if there was a better cooperation of sharing 
information.
    We have now linked almost everything in Florida together so 
we can access various agencies' data, but we cannot access 
anything from the Federal Government nor can they for us 
because the information that the State has is their possession. 
But we are willing to share it. We just don't have the 
technology to do so.
    Mr. Putnam. Mr. Forman.
    Mr. Forman. From a legal perspective, I believe there's a 
pretty broad coverage, let me refer to three laws in 
particular, the Privacy Act of 1974, the Computer Matching and 
Privacy Protection Act of 1988 and the E-Government Act of 
2002, all of which lay out the principles and the areas that 
must be addressed, ultimately leading up to what we would look 
for in the business case of privacy impact assessment. There is 
a policy decision that will have to be made. There's guidance 
from both OMB and the National Institute for Standards and 
Technology on that for Federal information systems to ensure 
appropriate protections of personal information. I think it's 
fair to review some of those cases and how that's being done. 
But the legal framework exists. This does not have to be built 
from the ground up, per say.
    I guess I'm more concerned about this on the technology 
side. These data bases were largely poorly crafted to start 
with. The business processes generally are nonexistent and when 
we try to share information which have different embedded rules 
in the data bases into a data warehouse and mine that data, I 
keep in the back of my head garbage in, garbage out, because I 
think that's the reality that we'll be forever patching 
together in the Federal arena. I believe that this at the end 
of the day is not so much a technology issue as we know. The 
technology exists. It's been used in many governments, 
including the U.S. Government, for years. The question comes 
down to can we figure out what's the right business process and 
who should be in charge or how we want to oversee that, pulling 
that information together and the person who says I've got a 
terrorist threat. The best framework for that so far as it 
links to terrorism is the Department of Homeland Security Act.
    Mr. Putnam. Mr. Rosen, do you have a comment?
    Mr. Rosen. It's an interesting question whether there are 
meaningful legal regulations on the sharing of data in the case 
of individualized suspicion. The Privacy Act has a broad law 
enforcement exception and a national security exception, so I'd 
imagine that when we're talking about personal dataveillance, 
focused on suspicious individuals, there wouldn't be meaningful 
legal restrictions on sharing. Mass dataveillance is a 
different question. And I think that the people who have 
analyzed this are divided about whether dataveillance along the 
total information awareness model would violate the Privacy 
Act. It's not clear whether the information that is being 
accessed would count as a system of records according to the 
Privacy Act, and the mere phrase itself shows how outdated that 
1970's idea, which presumes that information stored in 
different file cabinets is for regulating data sharing in the 
21st century. So--and then there's also the case that much of 
this data is already held in the private sector and law 
enforcement has a long history of piggybacking on the grand 
data warehouses like TRW, and so forth, in order to get 
information that it couldn't get on its own.
    All this is to say that if you're in any way concerned 
about restrictions on information sharing, as I hope that you 
will be to the degree that the PATRIOT Act and the homeland 
security bill create new provisions for information sharing and 
the interest of national security, you're going to have to 
think about this issue afresh and try to craft sensible 
regulations for these new technologies.
    Mr. Putnam. Do you presume then that under the current law, 
particularly the Privacy Act, that authorization of personal 
information that can be held by the IRS, for example, under the 
current law would not be eligible to be transferred to Homeland 
Security or INS or a different agency?
    Mr. Rosen. As I understand it. I'm not an expert on the 
IRS. The IRS has a series of complicated regulations that have 
ensured that it especially doesn't lightly share information 
with law enforcement. So both by practice and regulation, I am 
not sure that there'd be easy access to that data. But the 
mere--but you're right to focus on precisely that question and 
then extrapolate from there to other sensitive information that 
you might not want to be shared without cause, and then you 
will get a sense of the degree of the challenge that you face.
    Mr. Putnam. Well, Chairman Davis pointed out something that 
in many of these cases data mining is the collation of 
previously existing, perhaps even public data bases and 
collections of information and that the amalgamation of that 
data is what allows you to get a more useful outcome than the 
time and effort and energy involved in searching each one 
discretely. The blowup over TIA, characterizing it, I think, 
has been over this presumption of the next step of data 
collection between public and private and even into the more 
personal side of things in terms of habits and patterns based 
on purchases or travel destinations and things like that. But 
is there anything--is there any effort currently underway other 
than what had been a research and development project? Is there 
any active program in the Federal Government that is doing that 
type of surveillance or data mining?
    Mr. Rosen. I understand that the CAPPS II program, which is 
Computer Assisted Passenger Profiling Act--I think I have got 
the acronym right--is based on very much of a TIA model and is 
also trying to collate information which is already in the 
public's sphere and make risk predictions for particular 
passengers at airports. So that's why I think the TIA model is 
one that you will have to think about hard, and I think that 
the chairman's notion that all this information is already in 
the private domain and therefore is not of concern and can be 
analyzed perhaps misses the fact that once the analysis becomes 
granular there is a difference between having me watched on the 
street when I walk from door to door by a cop or a neighbor and 
the government planting a camera on my back that follows me 
from door to door and records each of my activities throughout 
the day. That reality, the fact that a level of instrusiveness 
is inconsistent with the values of a free society is one that 
our law is not well set up to deal with. The Supreme Court's 
test for invasion of privacy, as you know, Congressman, says 
the question there is a subjective expectation of privacy that 
society is prepared to accept as reasonable and as the 
invasions become more invasive people's expectations are 
lowered with a lowering of Constitutional protections. So I 
would resist the chairman's notion that as long as the 
information is out there, that any degree of collation and 
technical analysis is fair game because there is a point at 
which as you have said when very intimate personal information 
becomes available to the government on a massive scale that's 
quite different from some reporter going down to the courthouse 
and rummaging through a couple of paper records 50 years ago.
    Mr. Putnam. Mr. Forman.
    Mr. Forman. Well, in preparation for this hearing, I did a 
run on our major IT investments of the Federal Government. I 
did actually two runs, to identify all the data mining and then 
to identify all the data warehouses because why do a data 
warehouse if you're not going to mine the data. And zero 
projects showed up. So I didn't believe that. We don't have 
anything go on with regards to this. So I used a data mining 
tool, the search engine on first.gov and got well over 1,000 
hits. There's an awful lot of activity going on. Now the 
question that seems to me comes down to is do we have anything 
going on as an official IT investment that relates to kind of 
these random searches. And I'm not aware of any that Dr. Rosen 
is so concerned about. It doesn't mean that it's not out there. 
I really need to go back and dig deeper. I just have not found 
any yet. On the other hand, is there--are there some data 
mining applications that are similar to that and I think, yeah, 
you'd have to say that the credit card fraud is very similar. 
You know the pattern. Same thing on Medicare, Medicaid, 
mischarging. We know that we should be spending, for example, a 
certain amount for a certain type of procedure. If we see a 
company that is routinely overcharging us, we know that it's 
not an error, it's a systematic overcharging. And so that's a 
very similar type issue and I think in the areas of government 
accounts payables, where we know some tolerances and we can use 
data mining to identify people who are overcharging or 
fraudulently charging us. You do see that and that has gone 
through the privacy impact assessment reviews generally.
    Mr. Putnam. Senator Dockery, hasn't the State of Florida 
for some time used a data analysis, data sharing, data mining 
type technology to compare and even correlate employment 
records with child support payments to develop a list of folks 
who are behind in that and whether or not they are cheating the 
system?
    Ms. Dockery. Yes, that's one of many areas that Florida has 
used the technology. Also, in smuggling rings, money 
laundering, child molestations, so we--after September 11th it 
was the technology was already there and it was just a matter 
of adapting it to now apply it to homeland security.
    Mr. Putnam. So there's a history of civil uses as well as 
the criminal uses, at least in the State of Florida.
    Ms. Dockery. Exactly.
    Mr. Putnam. We have been joined by our ranking member, 
gentleman from Missouri, Mr. Clay, and I'd ask unanimous 
consent that he be able to enter his statement into the record. 
And without objection, show it done, and now recognize him for 
his statement and questions.
    Mr. Clay. Thank you very much, Mr. Chairman. Let me say, 
for Mr. Rosen, the Transportation Security Administration plans 
to use data mining to develop terrorist profiling for anyone 
who flies. And if Congress goes along with this proposal, what 
safeguard should be established at the same time to assure 
public rights similar to those provided in the Privacy Act? Let 
me also say that--do you believe that airlines are now using 
profiles when you go to the kiosk to get your boarding pass, 
and you put your card through the kiosk, don't you think that 
they examine some of your recent credit activity now and is 
profiling occurring now by the airlines?
    Mr. Rosen. I do, Congressman. As I understand CAPPS I, or 
the computer assisted profiling system that's now in use, it 
does indeed analyze publicly available information from the 
private and public sector and make risk predictions that can 
lead people to be taken aside for different searches. As I 
understand, CAPPS II would only increase this profiling by 
adding information to the data base. It's difficult to answer 
your question adequately, because the Transportation Security 
Administration is not forthcoming about exactly what 
information it's analyzing and how it's using it, and I think a 
crucial part of your oversight role should be to ensure that 
the data in the data base is transparent, not the algorithms. 
The transportation authority says, well, we can't tell you what 
algorithms we're using or the terrorists can beat the system. 
What Congress needs to know is not what the algorithms are, but 
is this data that the Federal Government is entitled to 
analyze.
    So when you think about how to regulate this new system, 
and this will be a pressing concern, even more so than total 
information awareness because that's been tabled for the 
moment, think about transparency, accountability. Citizens 
should be able to correct errors in their data base. We have 
heard a lot this morning about the poor quality of the data. 
Imagine being stopped repeatedly on the basis of inaccurate 
information and having no remedy, not even being told why 
you've been stopped. The application of fair information 
practices to the transportation arena is something that 
Congress urgently needs to think about because the Privacy Act 
in its incarnation is not adequate to the task.
    So I think that this should be a good model for you as you 
think about regulation.
    Mr. Clay. Thank you very much.
    Mr. Forman, along those same lines, airline security has 
had a troubled history of racial profiling, even before the 
attack on the World Trade Towers. During the 1991 Gulf war 
individuals with Middle Eastern names were forced off their 
flights despite the fact they were American citizens. Last year 
the ACLU testified before Congress of dozens of such incidents, 
individuals discriminated against in airports or on airplanes 
based on race and heritage. The same people who oversaw the 
private contractors who provided discriminatory security are 
now designing new systems. What is OMB doing to prevent racial 
profiling from continuing in air transportation?
    Mr. Forman. Well, let me put this into the context of the 
CAPPS II program. The CAPPS II program was not approved by OMB 
to proceed at the pace that they seem to want to proceed. I 
have a huge spotlight on that project right now. They're late 
in getting back to me the information that they need to 
proceed. So the issues that we're talking about, the issues 
that concern me essentially, CAPPS II could quickly become the 
80th watchlist. And I have to take a step back in my job and 
say, what value added do we get by yet another island of 
automation coming up with something farther away from something 
that's going to give us the productivity and effectiveness 
we're looking for. You know, the argument that I have heard in 
favor of CAPPS and CAPPS II essentially went back to the 
question of do you want this random? Because my father, my 
grandmother was pulled out of line. And it just didn't seem to 
make sense. So there has to be something better. And I think, 
and I allude to this in my testimony in the customs arena, in 
the package movement, we seem to figure out this risk paradigm. 
Now, I think that's what we are looking for. We're clearly not 
looking for a racial profiling. We are looking for a risk 
profiling. And there the data that I'm asking for, it's got to 
be in the business case, would give us both the technical 
programmatic reviews as well as the policy review. We don't 
have it yet.
    Mr. Clay. In this process you're looking for random, random 
profiling and not racial profiling or heritage?
    Mr. Forman. We are looking for risk based--.
    Mr. Clay. Risk based.
    Mr. Forman. Reduction. So not random profiling.
    Mr. Clay. So the 9-year-old little girl that goes through, 
you may not want to search her, through TSA. You may not want 
to search her?
    Mr. Forman. As a random selection, that would be correct.
    Mr. Clay. Or the 85-year-old grandmother?
    Mr. Forman. As a random selection, that would be correct. 
We are looking for clear documentation that they have actually 
figured out an approach that's going to improve the 
productivity. You know, we can spend hundreds of millions of 
dollars on a terrific IT system with very pretty screens or 
very fruitful data mining techniques. But at the end of the 
day, if it somehow does not lower the risk, to me, I would have 
to say that is not a good IT investment for the Federal 
Government and would recommend against that.
    Mr. Clay. OK. All right. Thank you.
    Mr. Kutz, does data mining need individual identities in 
order to detect patterns of unusual activity? And can the 
government develop profiles of unusual activity and then 
followup on the specifics with appropriate oversight?
    Mr. Kutz. Again, what--most of what we have done so far 
relates to credit card data bases, but we have gone beyond that 
certainly for the credit card data bases and these were 
government credit cards, ones issued by the--on behalf of the 
Federal Government to use for government purposes. We did have 
that information to basically analyze and put together patterns 
of activity, etc. But we have also gone beyond, I was going to 
mention an example last year. We testified before 
Representative Shays on the JS List suit, which is the current 
chem-bio suits that are being used in the Middle East. And what 
we identified there was that they were excessing and selling 
those goods on the Internet at the same time they were buying 
them. And so in that instance, we tried to identify who was 
buying these suits and whether or not they might be using them 
for something that would be against the government. So we try 
to identify, where it is appropriate, individual identities to 
followup for investigative purposes.
    Mr. Clay. Let me ask you a followup on the question I asked 
Mr. Rosen. What exactly do the airlines look for when we go to 
the kiosk and put our credit card through? What kind of 
financial activity are they looking at? Just out of curiosity.
    Mr. Kutz. I couldn't answer that question.
    Mr. Clay. You don't know. Does anyone on the panel know 
what they're looking at? I mean, is it one purchasing one-way 
tickets or what exactly.
    Mr. Rosen. We know from criminal procedure cases that 
there's certainly public information that they look for, one-
way tickets, certain points of origin passengers and the 
addresses and phone numbers that you check in with and the 
people that you also are traveling with, and information neuro 
network analysis can be done on that. But we are assuming that 
they're respecting legal limitations on, for example, looking 
at personally identifiable phone calls or personally 
identifiable credit card information. But finding out the 
precise answer to that, I know there are groups like some of 
the privacy groups in town have Freedom of Information Act 
requests to find out exactly what information is being used and 
they haven't found the TSA terribly forthcoming, as I 
understand it.
    Mr. Clay. Do you think they also look at recent purchases 
in retail outlets?
    Mr. Rosen. As I understand it, they would be restricted 
from doing that by the Federal Credit Reporting Act, but you 
need a closer parsing of the statute than I can give you for 
that.
    Mr. Clay. OK. Thank you very much.
    [The prepared statement of Hon. Wm. Lacy Clay follows:]

    [GRAPHIC] [TIFF OMITTED] T7229.045
    
    [GRAPHIC] [TIFF OMITTED] T7229.046
    
    [GRAPHIC] [TIFF OMITTED] T7229.047
    
    Mr. Putnam. The gentleman raises an interesting point. 
Immediately after September 11th I was pulled every single time 
I flew because I was not in a frequent flier program, we bought 
our tickets at the last minute because of the Congressional 
schedule and it was always one-way. And so I got the body 
cavity search just about every time I flew. And it's terribly 
frustrating and it begs some better type of profiling, 
particularly based on risk. And while some Members of Congress 
can be shady characters at times, hopefully we wouldn't fit the 
risk profile.
    Mr. Clay. Hopefully we wouldn't get stopped as often.
    Mr. Putnam. Well, hopefully, at least not quite as often. 
Every time got a little old.
    But let's get back to the people component of this, because 
I think everyone has agreed that at the end of the day, no 
matter what type of process there is and no matter what type of 
information or data is out there, at the end of the day it is 
going to require some analysis by a human being. And everyone 
in general has seemed to stress the need for quality data as 
well as those high quality analytical skills in the personnel.
    Can you expand on that a little bit and talk about where we 
are in terms of our human capital and the role that they play 
in obtaining acceptable results through this process?
    Mr. Forman. I think there are some very, very good examples 
of the training and culture change that has to take place here. 
When you move from a paper based--technically we call knowledge 
management environment--to an on-line you're going to use 
different interfaces. To do--to have that tool kit, if you 
will, generally, people have to become computer literate and 
willing to use computers. And that's where we see, especially 
in the law enforcement arena, a cultural, maybe generational 
change that we are working through. Certainly you'll see that 
at the FBI if you look at their use of the TRILOGY program and 
the culture of change that the Director is bringing. From my 
perspective, in the business case itself I look at that. I look 
to see are we investing in training and process reengineering, 
change management projects. And when I see generally data 
mining or tools that use these knowledge management systems and 
support systems tools without any training, that is a flag to 
us that this should go on the high risk list. Unfortunately, 
that has been the pattern of government. Somebody in the 
technology side invests in these tools and then they get ready 
to deploy and they find out culturally or from an education 
standpoint people don't want to use them. And as in the case of 
the INS, then we go on a binge of buying training services. So 
I'd say right now, training or the education part has been an 
afterthought and it's one that needs a lot more attention and 
funding from the up-front. We are trying to put that discipline 
in the process.
    Mr. Kutz. Mr. Chairman, I would add to that the software 
that we had to do the data mining that we have done in the 
fraud, waste and abuse type applications which is fantastic. 
It's flexible. We certainly train our people, etc. But the real 
element that makes it work is the people and the continuous 
learning that goes on with even using that software and the 
various programs. So we've kind of got a process where as we 
look at a system and a program, we understand the program, 
understand the controls, understand the vulnerabilities, and we 
use that too as a feedback into the actual data mining 
strategy, combining auditors and investigators again.
    I mentioned Mr. Ryan, who's with me today, who worked for 
the Secret Service doing money laundering and credit card 
crimes for decades. People with that kind of experience 
teaching younger people some of the things that they know 
really provides a great atmosphere for learning and developing 
all those human capital skills.
    Mr. Putnam. Have you an estimate of the savings that have 
been derived from that type of data sharing initiative?
    Mr. Kutz. From the data mining with respect to the fraud, 
waste and abuse?
    Mr. Putnam. From the financial management side, yes.
    Mr. Kutz. If you go back to the improper payments reporting 
that's gone on in Federal Government for years, I think that 
areas like Medicare have shown large decreases in estimated 
improper payments, and that's I think in part due to the data 
mining that's gone on there. Another program that's had a great 
deal of oversight in that area is the earned income tax credit, 
which had estimates of as much as $8 billion of improper or 
fraudulent type payments over the years. So there's certainly 
been savings. I don't think it's been quantified necessarily, 
but the focus of data mining and the focus on improper payments 
going out the door has led to better controls in the government 
and probably saved billions of dollars.
    Mr. Putnam. Senator Dockery.
    Ms. Dockery. Thank you, Mr. Chairman. You bring up a good 
point and one that piggybacks on to Congressman Davis. The 
information that we are using in tracking criminal activity and 
potential terrorist events takes into consideration what used 
to be information in various locations. By putting that all 
together, it cuts the time down from weeks or months to a 
matter of minutes. Once that information has identified a risk, 
that's when the investigations begin. So it still comes down to 
our human investigators, but instead of spending all their time 
digging through paper to find out where to start, they now have 
a starting point and spend their time more wisely looking at 
those individuals who have come up as a potential risk. So it 
does involve a lot of training. We do--the success of what we 
do with that information lies within our law enforcement, but 
this allows them to spend their time in the investigation and 
not in trying to put together a pattern.
    Mr. Putnam. How reliable is that data? How often is it 
maintained? How often is it upgraded? And we have certainly 
learned in our experience with the election that sometimes our 
data bases are a little old with respect to eligible voters and 
convicted felons and things like that. How good a job does the 
State do in maintaining that data base that they depend on?
    Ms. Dockery. Well, I am not an expert in that area, but I 
would say that we do have systems put in place to purge 
information. We have systems put into place to check 
information. And the sharing of the information allows us to 
hear from other sources in the law enforcement community that 
some information may be suspect. So I think our information is 
good. Keep in mind that when it lists people with risk factors, 
that doesn't point to that person as being guilty of anything. 
It points to that person as coming up as maybe a place to start 
the investigation.
    Mr. Putnam. Mr. Forman, you had referred to geospatial 
information earlier in your testimony. In my understanding that 
is 1 of the 24 E-Government initiatives, and that would involve 
an overlay of information from a variety of sources with regard 
to identifying the geography of data. In essence, you overlay 
the census data with USGS data and we can look at, you know, 
where the population threats are to sensitive estuaries or any 
of a million combinations of things by combining all the data 
that's collected and stacking it in a meaningful way to derive 
answers about what's going on. Isn't that data mining?
    Mr. Forman. Yeah. That very definitely will have to require 
data mining. There are two approaches to leveraging the 
redundant data sources. One is the concept of buy once and use 
many. We are definitely proceeding with that. But then where do 
you put that data? Is it some is maintained at National Weather 
Service, for example, or NOAA and some is maintained at the 
U.S. Geological Survey, some is maintained at Environmental 
Protection Agency? That kind of pier to pier computing model is 
the emerging concept of a virtual data warehouse in which case 
probably at that program office you would have the meditative 
description of where do I go to find this data, what is the 
standard, and access that. Regardless of whether it is a 
physical data warehouse or this virtual data warehouse to get 
access to that data, to make sense of it, data mining 
techniques will be used. They have been used, you know, for 
example, probably the best example today, if you go to the 
Census Web site, American Fact Finder, you can find out 
supposedly, I haven't done this, but the theory was you could 
find out how many kids of soccer age for second grade soccer 
teams, second and third grade soccer teams are in your track, 
you know, in your soccer league area. That wouldn't tell you by 
house, but that would tell you maybe by block or by 
subdivision.
    Mr. Putnam. The opportunities for the beneficial use strike 
me as endless. When you compare weather patterns with farm 
payments, with crop insurance, perils and things like that, 
then maybe we start raising the risk premiums for that area or 
maybe we adjust our farm payments so we don't let people plant 
in that area until El Nino clears up. I mean the opportunities 
are endless to derive information. The Federal Government 
spends a fortune collecting information and the fact that it is 
for the large part underutilized is distressing from a taxpayer 
perspective.
    Mr. Rosen, you mentioned earlier that perhaps we should 
consider the creation of a special court to consider these 
types of requests for specific searches, I believe.
    Mr. Rosen. I did. And, Congressman, I would distinguish the 
need for a special court when we are talking about the mass 
dataveillance of personally identifiable data with the kind of 
syndromic surveillance that you and Mr. Forman have just been 
talking about. This is indeed a wonderful resource, and there 
are no privacy issues when you're making general statements 
about weather patterns or census information that's not 
personally identifiable or the Centers for Disease Control 
using data mining to figure out when people are checking in in 
one area with an epidemic or, to give another example that I am 
very impressed by, the city of Chicago using data mining to 
figure out when crime patterns correspond with particular 
weather patterns and sports events and then they can deploy the 
cops to that area of town when there is a particular game on 
and that's really hot and then they can stop crime. These are 
wonderful things that don't raise any privacy issues at all. 
That's very different though from, and again if the jargon 
isn't helpful let's come up with another term, but mass 
dataveillance, suspicionless searches at airports, the total 
information awareness model, this is something that needs 
regulations.
    So my message has been this stuff isn't all good or all bad 
and the technology isn't evil, just be especially attuned to 
the privacy dangers of suspicionless searches that allow 
personal information to be collected in ways that are not 
currently available. And for that I think you do need--it 
doesn't have to be a special court. You could have a 
magistrate. You could have a congressional oversight body. 
There are all sorts of ways to do it. But you have to separate 
the model as the data is traceable but not identifiable. You 
can do those sort of general predictions and risk profiles that 
Mr. Forman is talking about, but you can't actually identify me 
as the person who's been buying fertilizer unless it really 
looks like I'm a terrorist because I've done some other things 
that are suspicious, too.
    Mr. Putnam. Well, I would remind you and the rest of the 
panel and the audience that on May 6th we will convene our next 
oversight hearing on this topic, specifically to address TIA 
CAPPS II and some other similar programs.
    With that, I will yield back to the gentleman from Missouri 
for any questions.
    Mr. Clay. Thank you, Mr. Chairman. Senator Dockery, I'd be 
interested to know what Florida does to protect individual 
rights. Does an individual have a right to know what 
information about them is included in the data analyzed in the 
factual data analysis? Does the individual have a right to 
correct the information in those data bases that is wrong? And 
what happens if an individual is singled out because of 
incorrect information in one of these data bases? Can you kind 
of expound on that for me?
    Ms. Dockery. Yes. Thank you. All the information that is in 
the data bases are part of Florida's open public records. So 
any individual is at any time able to check out those records 
and to clarify any misinformation on those records. We don't 
keep particular files on any individuals. We look for events, 
and risk factors may make somebody come up. Then it goes to a 
human being, an investigator to investigate that and they may 
find that just because the individual was identified as being--
fitting those risk profile that person was nowhere near the 
event. So there are a lot of safeguards built in. And of 
course, we abide by the Federal Code that I mentioned earlier.
    Mr. Clay. So the safeguards are there and they're helpful 
and people can followup and correct them?
    Ms. Dockery. Yes.
    Mr. Clay. That sounds like a pretty foolproof system. Thank 
you.
    Mr. Kutz, what would you recommend Congress do to stop the 
racial profiling that is going on in today's airline security? 
Do you have any recommendations?
    Mr. Kutz. No, that's not an area that I deal with so I 
can't comment on that.
    Mr. Clay. OK. Well, let me also ask you, you recently did 
some work for Congress where you identified several people 
getting treatment at veterans hospitals who were listed as 
deceased on Social Security records. With further 
investigation, you showed that the problem was errors in the 
Social Security records. Now, if TSA had those Social Security 
records in their data base, those people would be stopped from 
flying and they would have no way of knowing why or correcting 
the incorrect information. Would you agree that any system used 
by TSA has to allow for the public to know what information is 
being used to rate them and what other safeguards should be in 
place?
    Mr. Kutz. Your question gets back to the issue I think Mr. 
Forman talked about, about data quality in the Federal 
Government, and we did indeed find, and this was from military 
treatment facilities, we had compared people who were served at 
some military treatment facilities with a Social Security death 
file and there were some hits that came out of people that 
appeared to be dead that were not really dead. And so there 
were errors in the Social Security death file, and that 
certainly raises issues about what that file is used for. That 
file is certainly shared with others. It's sold to others. And 
the Social Security Inspector General has reported other 
examples of errors with that.
    So this issue of Federal Government data base reliability 
is a major challenge here in all applications of data mining 
going forward. And I had some experiences I was going to share 
with you on the IRS, where I used to be responsible for the IRS 
financial audit, and we found lots of instances there with the 
errors in the system there were people who were being pursued 
and having taxes collected from them but didn't owe any taxes. 
At the same time we were issuing lots of refunds to people who 
weren't due refunds.
    So, again you've got lots of issues with data quality and I 
would say that the Federal Government is decades behind the 
private sector in that area. I got to go to Bentonville, AR 
within the last year to visit the Wal-Mart headquarters and it 
was quite fascinating to see the technology that they use in 
their inventory supply chain management, and when I compare 
that to where the Federal Government is with its inventory 
management again it's just decades behind. And they were able 
to tell us at Wal-Mart headquarters how many tubes of 
toothpaste there were at the Fairfax Wal-Mart here in 1 minute. 
And not only that, but how many they had actually stocked in 
the last week, how many had been bought in the last week, just 
tremendous technology, whereas again in the Federal Government 
I'll go back to the JS List, the chem-bio suits used by our 
troops. Once those left the defense warehouses into the 
military services, complete visibility was lost and we were 
unable to determine where these chem-bio suits were, some from 
prior years that had been defective through a fraud scheme by a 
private sector company.
    Mr. Clay. You do make recommendations to the different 
agencies how to correct the errors that you all find?
    Mr. Kutz. Right. That's the value of data mining. It helps 
us to make valuable recommendations to Federal agencies to 
improve their control systems, etc., to try to minimize the 
risk of these things happening that I've just described.
    Mr. Clay. What was your recommendation to the Social 
Security Administration?
    Mr. Kutz. We didn't make any recommendations to them 
because the Inspector General had already made recommendations 
to them, and they are working to clean up that data base.
    Mr. Clay. I see. Thank you very much.
    Mr. Forman, would you support legislation that prohibited 
the TSA from using any system that used profiles based on race, 
religion, national origin, gender, sexual orientation or 
proxies for those characteristics?
    Mr. Forman. I forever remember my time on the Hill and a 
good staffer on detail from GAO who has been a staffer to this 
committee before, the devil's in the details. I'd have to see 
the specifics.
    Mr. Clay. See the specifics. OK. Thank you very much. And 
thank you, Mr. Chairman.
    Mr. Putnam. Thank you, Mr. Clay. And Mr. Kutz, when Mr. 
Forman gets done with the Federal Government, Bentonville, AR 
is going to be sending executives up here to tour the Federal 
Government to see how efficient we are. Isn't that right?
    Mr. Forman. Absolutely.
    Mr. Putnam. I want to thank the witnesses for their 
outstanding testimony and for the questions of the 
subcommittee. We will be focusing very, very directly on this 
topic throughout the 108th Congress. Our next hearing on the 
topic is May 6th to look at some of the specific issues that 
have been raised. But this is very clearly on my radar screen 
and something that we will continue to monitor very closely. It 
is an important issue. It holds the promise of tremendous 
potential benefits to our taxpayers in eliminating waste, fraud 
and abuse and bringing better financial management practice, 
and frankly it raises some red flags in terms of protecting 
those very same taxpayers' privacy and personal information. So 
we will do what we can to determine where that fine line is and 
attempt to walk it.
    So I understand Mr. Rosen has to be out to teach his class, 
but do any of you have one last question that you wish we had 
asked you that you want to answer?
    Senator Dockery.
    Ms. Dockery. It's not a question. But, Mr. Chairman, if I 
could just take this minute since I don't have the opportunity 
to speak to a congressional committee every day, I want to 
thank you on behalf of the States for what you do in Congress, 
to send money down to the States to allow us to do the job of 
protecting the residents in our State against any threat to our 
homeland security, and I would ask that in the future when 
moneys are coming down from the Federal Government, the more 
flexibility you could give us in spending those moneys and if 
you could have those moneys go through the State rather than 
directly to the local governments so that we can have a better 
feel for what's coming down and avoid duplication of effort. 
But thank you for all that you do for us and thank you for 
letting me participate today.
    Mr. Putnam. Thank you, Senator.
    Dr. Louie.
    Dr. Louie. Yeah. This is on-line data collection. The point 
about individual data elements are not necessarily very 
important in themselves, but you should also look at how this 
data is used as if it were classified material. Individual 
elements in themselves are not necessarily important. It's the 
combination of multiple elements that make it an interesting 
issue as far as questionable invasion of privacy or whether it 
raises flags about how that data is being used in the case of 
are we really profiling or are we looking at a risk assessment. 
Should we look at race and national origin? Probably yes. In 
themselves they are not necessarily the most important item, 
but in combination with other data elements they may raise a 
level of risk, and it needs to be considered in that manner. It 
needs to be viewed not as an individual component, but the sum 
of all the components looked at in terms of evaluating whether 
this information is something that warrants looking into or not 
looking into.
    So does it make it actionable? That's the way you need to 
look at the collection of data, not the individual elements 
necessarily.
    Thank you for the opportunity.
    Mr. Putnam. My pleasure. Thank you. Anyone else?
    Mr. Kutz. Yeah, I would just say I appreciate you inviting 
us to the hearing today. Since we work for Congress, we 
certainly believe data mining is a tool that's going to be able 
to help us better serve you and to do better audits and 
investigations on your behalf. So I appreciate that.
    Mr. Putnam. Thank you. Mr. Rosen. Mr. Forman. We appreciate 
your efforts. I'm reminded that in the event there are 
additional questions the record will remain open for 2 weeks 
for submitted answers. And with that, the meeting is adjourned.
    [Whereupon, at 11:30 a.m., the subcommittee was adjourned.]
    [Additional information submitted for the hearing record 
follows:]

[GRAPHIC] [TIFF OMITTED] T7229.048

[GRAPHIC] [TIFF OMITTED] T7229.049

[GRAPHIC] [TIFF OMITTED] T7229.050

[GRAPHIC] [TIFF OMITTED] T7229.051

[GRAPHIC] [TIFF OMITTED] T7229.052

[GRAPHIC] [TIFF OMITTED] T7229.053

[GRAPHIC] [TIFF OMITTED] T7229.054

[GRAPHIC] [TIFF OMITTED] T7229.055

[GRAPHIC] [TIFF OMITTED] T7229.056

[GRAPHIC] [TIFF OMITTED] T7229.057

[GRAPHIC] [TIFF OMITTED] T7229.058

[GRAPHIC] [TIFF OMITTED] T7229.059

[GRAPHIC] [TIFF OMITTED] T7229.060

[GRAPHIC] [TIFF OMITTED] T7229.061

