[House Hearing, 108 Congress]
[From the U.S. Government Publishing Office]
DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES
=======================================================================
HEARING
before the
SUBCOMMITTEE ON TECHNOLOGY, INFORMATION
POLICY, INTERGOVERNMENTAL RELATIONS AND
THE CENSUS
of the
COMMITTEE ON
GOVERNMENT REFORM
HOUSE OF REPRESENTATIVES
ONE HUNDRED EIGHTH CONGRESS
FIRST SESSION
__________
MARCH 25, 2003
__________
Serial No. 108-11
__________
Printed for the use of the Committee on Government Reform
Available via the World Wide Web: http://www.gpo.gov/congress/house
http://www.house.gov/reform
______
87-229 U.S. GOVERNMENT PRINTING OFFICE
WASHINGTON : 2003
____________________________________________________________________________
For Sale by the Superintendent of Documents, U.S. Government Printing Office
Internet: bookstore.gpr.gov Phone: toll free (866) 512-1800; (202) 512�091800
Fax: (202) 512�092250 Mail: Stop SSOP, Washington, DC 20402�090001
COMMITTEE ON GOVERNMENT REFORM
TOM DAVIS, Virginia, Chairman
DAN BURTON, Indiana HENRY A. WAXMAN, California
CHRISTOPHER SHAYS, Connecticut TOM LANTOS, California
ILEANA ROS-LEHTINEN, Florida MAJOR R. OWENS, New York
JOHN M. McHUGH, New York EDOLPHUS TOWNS, New York
JOHN L. MICA, Florida PAUL E. KANJORSKI, Pennsylvania
MARK E. SOUDER, Indiana CAROLYN B. MALONEY, New York
STEVEN C. LaTOURETTE, Ohio ELIJAH E. CUMMINGS, Maryland
DOUG OSE, California DENNIS J. KUCINICH, Ohio
RON LEWIS, Kentucky DANNY K. DAVIS, Illinois
JO ANN DAVIS, Virginia JOHN F. TIERNEY, Massachusetts
TODD RUSSELL PLATTS, Pennsylvania WM. LACY CLAY, Missouri
CHRIS CANNON, Utah DIANE E. WATSON, California
ADAM H. PUTNAM, Florida STEPHEN F. LYNCH, Massachusetts
EDWARD L. SCHROCK, Virginia CHRIS VAN HOLLEN, Maryland
JOHN J. DUNCAN, Jr., Tennessee LINDA T. SANCHEZ, California
JOHN SULLIVAN, Oklahoma C.A. ``DUTCH'' RUPPERSBERGER,
NATHAN DEAL, Georgia Maryland
CANDICE S. MILLER, Michigan ELEANOR HOLMES NORTON, District of
TIM MURPHY, Pennsylvania Columbia
MICHAEL R. TURNER, Ohio JIM COOPER, Tennessee
JOHN R. CARTER, Texas CHRIS BELL, Texas
WILLIAM J. JANKLOW, South Dakota ------
MARSHA BLACKBURN, Tennessee BERNARD SANDERS, Vermont
(Independent)
Peter Sirh, Staff Director
Melissa Wojciak, Deputy Staff Director
Randy Kaplan, Senior Counsel/Parliamentarian
Teresa Austin, Chief Clerk
Philip M. Schiliro, Minority Staff Director
Subcommittee on Technology, Information Policy, Intergovernmental
Relations and the Census
ADAM H. PUTNAM, Florida, Chairman
CANDICE S. MILLER, Michigan WM. LACY CLAY, Missouri
DOUG OSE, California DIANE E. WATSON, California
TIM MURPHY, Pennsylvania STEPHEN F. LYNCH, Massachusetts
MICHAEL R. TURNER, Ohio
Ex Officio
TOM DAVIS, Virginia HENRY A. WAXMAN, California
Bob Dix, Staff Director
Chip Walker, Professional Staff Member
Lori Martin, Professional Staff Member
Ursula Wojciechowski, Clerk
David McMillen, Minority Professional Staff Member
C O N T E N T S
----------
Page
Hearing held on March 25, 2003................................... 1
Statement of:
Dockery, State Senator Paula, majority whip, Florida State
Senate..................................................... 7
Forman, Mark A., Associate Director, Information Technology
and Electronic Government, Office of Management and Budget. 23
Kutz, Gregory, Director, Financial Management and Assurance,
U.S. General Accounting Office............................. 32
Louie, Jen Que, president, Nautilus Systems, Inc............. 15
Rosen, Jeffrey, George Washington University Law School,
legal affairs editor of the New Republic................... 55
Letters, statements, etc., submitted for the record by:
Clay, Hon. Wm. Lacy, a Representative in Congress from the
State of Missouri, prepared statement of................... 77
Dockery, State Senator Paula, majority whip, Florida State
Senate, prepared statement of.............................. 10
Forman, Mark A., Associate Director, Information Technology
and Electronic Government, Office of Management and Budget,
prepared statement of...................................... 26
Kutz, Gregory, Director, Financial Management and Assurance,
U.S. General Accounting Office, prepared statement of...... 34
Louie, Jen Que, president, Nautilus Systems, Inc., prepared
statement of............................................... 17
Putnam, Hon. Adam H., a Representative in Congress from the
State of Florida, prepared statement of.................... 4
Rosen, Jeffrey, George Washington University Law School,
legal affairs editor of the New Republic, prepared
statement of............................................... 58
DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES
----------
TUESDAY, MARCH 25, 2003
House of Representatives,
Subcommittee on Technology, Information Policy,
Intergovernmental Relations and the Census,
Committee on Government Reform,
Washington, DC.
The subcommittee met, pursuant to notice, at 9:30 a.m., in
room 2154, Rayburn House Office Building, Hon. Adam Putnam
(chairman of the subcommittee) presiding.
Present: Representatives Putnam, Miller, Turner, and Clay.
Staff present: Bob Dix, staff director; John Hambel, senior
counsel; Chip Walker and Lori Martin, professional staff
members; Ursula Wojciechowski, clerk; David McMillen, minority
professional staff member; Jean Gosa, minority clerk; and
Earley Green, minority chief clerk.
Mr. Putnam. A quorum being present, the Subcommittee on
Technology, Information Policy, Intergovernmental Relations and
the Census will come to order.
Good morning and welcome to the first in a planned series
of hearings addressing the important subject of data mining
technology or ``factual data analysis,'' as some might refer to
it.
Before we get into my opening statement, considering the
events of the world today and the enormous pressures that this
Congress and our President are under, I would ask that we pause
for a moment of silence.
[Moment of silence.]
Mr. Putnam. Thank you.
There are a number of proven uses for this data mining
technology which has played a prominent role in many arenas,
public and private, for years. This morning we will work to
define the technology itself and examine the parameters of its
application. There is no secret that some have expressed
concerns about the role of data mining, particularly in the
context of privacy intrusions. We will attempt to explore the
manner in which this technology will continue to be a valuable
tool in a variety of governmental uses, not just those of
national security, while also acknowledging the public interest
in protecting the privacy of personal information. Data mining
is a technology that facilitates the ability to sort through
large amounts of information through data base exploration,
extract specific information in accordance with defined
criteria, and identify patterns of interest to its user.
As I understand the technology, the user has the ability to
tailor a data mining program to a particular purpose by
selecting a number of different data bases to search and
setting the criteria for that search. Data mining technology
has been utilized successfully for many years in both public
and private sectors to identify and analyze data that might
otherwise be overlooked or inaccessible. Examples of the
variety of commercial or governmental uses associated with data
mining software would include businesses being able to develop
a targeted marketing campaign in an effort to identify
prospective customers; government agencies expanding
opportunities to track down tax evaders; detection of Medicaid
or Medicare fraud; and corporations using this tool to estimate
spending in revenue more accurately, just to name a few.
For example, a mortgage refinancing lender may seek to
determine potential candidates for their services by attempting
to identify mortgage holders who have lived in their homes for
a certain period of time in a particular geographic location
with a market value range of property at a certain level in
order to target a special refinancing rate offer. As you can
imagine, this type of technology is invaluable to a number of
institutions. Because it is such a vast and evolving field, the
subcommittee is very interested in exploring the uses and
effects of this technology in subsequent followup hearings to
address more particular applications.
While data mining may have many legitimate and worthwhile
uses, we must always be vigilant of any potential encroachment
on the privacy of the American public. We have great
responsibilities as elected officials. We must protect the
American ideals of life, liberty, and freedom. At times these
ideals would seem to come into conflict with one another, and
it's our job to ensure that we do all we can to protect the
public while maintaining the faith entrusted to us by the
Founding Fathers to protect the right of the people to privacy
and freedom. Ben Franklin once said, ``Those who would give up
freedom for security deserve neither.''
I would like to welcome the following witnesses who are
offering their expert testimony before us today: The Honorable
Paula Dockery, Florida State Senator; Dr. Jen Que Louie,
president of Nautilus Systems, Inc.; Mark Forman, Associate
Director of Information Technology and Electronic Government,
Office of Management and Budget, our Nation's CIO; Gregory
Kutz, Director of Financial Management and Assurance, General
Accounting Office; and Jeffrey Rosen, associate professor of
the George Washington University Law School, legal affairs
editor of the New Republic. Mr. Armey was unable to be with us
today.
Interest in expanding the use of this technology at the
Federal level of government has become more widespread as we
look to use modern technology to improve intergovernmental
communications and national security. From our oversight
perspective as the subcommittee, we have a special interest in
learning the pros and cons to data mining technology as well as
how its use could be or is being expanded at the Federal level.
We appreciate the participation of today's witnesses as
they provide tremendous information to the subcommittee on this
important topic, and we thank you again for taking the time out
of your busy schedules. Today's hearing can be viewed live via
WebCast by going to reform.house.gov and clicking on the link
under ``Live Committee Broadcast.''
As we await the ranking member from Missouri, I want to
recognize our vice chair, Candace Miller from Michigan, for her
opening statement. Gentlelady from Michigan.
[The prepared statement of Hon. Adam H. Putnam follows:]
[GRAPHIC] [TIFF OMITTED] T7229.001
[GRAPHIC] [TIFF OMITTED] T7229.002
Mrs. Miller. Thank you, Mr. Chairman.
I want to thank the witnesses for coming today, and Mr.
Forman, good to see you again. I'm sure this committee will be
seeing certainly a lot of you.
As I mentioned at the last committee hearing, I am so
particularly interested in the subjects, and this data mining
is a fascinating one. I had been the Secretary of State in
Michigan where not only did I have the elections there with all
the registered voters, I also did the motor vehicle
administrative kinds of things. We had a big data base in our
State with everybody who had a boat, a snowmobile, and a
trailer and a car and a truck and everything, and there was
always a lot of consternation about what was government doing
with this information; who had the information; for what
purposes. If you wanted to get licensed in Michigan, you had to
give me certain amounts of information. But what was government
doing with it and what was the citizens' expectation of what we
would do with all of that data?
There was a time when our State--and I know many States
still do this--sell the information. It is a huge revenue
source, of course. But I don't think citizens are normally
expecting that the government will be selling their personal
and private information. And so there is a consternation about
who can access the information, how will it be massaged, how
will it be utilized, and certainly on the part of the citizens,
invasion of personal privacy by ``Big Brother,'' by government.
As we march down the information highway, sometimes there
is a slippery slope there that I think all of us in government
at the Federal level, the State level, the county level, anyone
that has interaction with these various data, that we always
keep that uppermost in our mind about invasion of personal
privacy.
With that being said, the technology is certainly out there
and it can be utilized to make huge advances in society, and
there are so many things in every layer of government that
could be done so much better if we were able to use the
technology properly. So I am very pleased to see you all today.
Thank you for coming. I certainly look forward to hearing your
testimony this morning. Thank you.
Mr. Putnam. I thank the gentlelady. She brings tremendous
experience from her days as Secretary of State and work in
bringing that office into the Information Age.
We are joined by a former mayor, the gentleman from Ohio,
Mr. Turner. For your opening statement you are recognized.
Mr. Turner. Thank you, Mr. Chairman. I am particularly
interested in this area. NCR is located in Dayton, OH, which is
a leading technology company in this issue of data mining for
the private sector. And recently they hosted a forum on the
issue of data mining applications, taking them from the private
sector and applying them to government issues. And it was an
interesting discussion because they began in telling us that
Wal-Mart, at the end of the day, can tell us how many socks
they have sold; but we are not necessarily able to tell
ourselves, in reference to foreign visitors, how many visas
have expired today and who they are.
So the possible applications of data mining on very simple
tasks that clearly do not violate issues of privacy is a wide
open field which we need to pursue vigorously.
Also the issue that was fascinating to me in their
discussion is how you look at the process of data mining, not
looking first at what data that you have, but looking at what
questions do you want answered, and that the issue of
technology is there. The issue of the application of technology
is demonstrated in the private sector; the issue before us in
government is to begin the process of asking what questions do
we need to know answers to and then turning to the experts in
data mining that have applied it in the private sector to
assist us so we can have those answers in the public sector.
Thank you.
Mr. Putnam. I thank the gentleman.
We will now take the testimony from the witnesses. Each has
been very gracious to prepare written testimony which will be
included in the record of this hearing. And I have asked each
of you to summarize your presentation into 5 minutes, if you
could, to leave ample time for questions and answers. Witnesses
will notice that there is a timer with a light on the witness
table. Green light means you begin your remarks, the yellow
light means it's time to wrap up, and the red light means that
we hit the ejection seat.
In order to be sensitive to everyone's time schedule, we
ask that you cooperate with us in our time schedule. As is the
policy of the Committee on Government Reform, all witnesses
will be sworn in. So I'll ask you to rise, please, and raise
your right hands.
[Witnesses sworn.]
Mr. Putnam. All witnesses responded in the affirmative.
Thank you.
I would like to introduce our witnesses first and then call
on them for their testimony, followed by questions. We begin
our panel with an old colleague of mine and a very dear friend
from Florida, State Senator Paula Dockery. Florida is one of
the States where data mining techniques have been used in
several areas, and quite successfully. Senator Dockery's
experience will lend a very helpful perspective to us today.
She serves as majority whip in the Senate as well as chairman
of the Committee on Homeland Security and Seaports. Senator
Dockery, welcome to the committee and we look forward to your
testimony, please.
STATEMENT OF STATE SENATOR PAULA DOCKERY, MAJORITY WHIP,
FLORIDA STATE SENATE
Ms. Dockery. Thank you, Mr. Chairman, and good morning, Mr.
Chairman and members of the committee. Thank you very much for
the opportunity to be here today not only to share with you
what we think we are doing right in the State of Florida, but
also to be part of this distinguished panel and to learn from
the experts to my left. I apologize in advance. I'm going to be
reading so I can make my time limit, and I'm going to probably
have to read pretty fast because I timed it at 7 minutes. But I
would like to get started with that.
The issue of enhanced information sharing by our law
enforcement and public safety professionals is at the forefront
in our war against terrorism in our efforts to keep America
safe. Florida, I believe, has taken a strong leadership role in
this effort, one that can serve as a model for other States.
This model and its reliance on data mining is the focus of our
discussion today.
Florida uses the term ``factual data analysis'' to describe
this information processing system. This process includes the
collection of information from multiple sources. Once this
information is processed, analyzed, and evaluated, the
resulting products represents the intelligence needed to assist
law enforcement. Intelligence can then can be used in a
proactive and preventive approach to detect criminal patterns,
crime trends, modus operandi, financial criminal activity and
criminal organizations.
Data collection is much different today than in years past.
The number of data bases and the information contained there is
immense, as is the ability to effectively and efficiently
analyze available data in a timely manner. The results can be
overwhelming. Factual data analysis plays a crucial role in
filtering the vast quantity of information by separating the
significant data from the insignificant data. Some individuals
and groups voice concern for perceived loss of privacy and a
perceived attempt to foster the examination of private
information.
Florida's law enforcement efforts are aimed at utilizing
only that specific data which law enforcement already has a
legal right to use, while doing so in a proficient,
professional, and expeditious manner. Many safeguards have been
implemented to ensure appropriate use of information. These
include user name and password protection, user training,
agency user agreements, system audits, quality control reviews
and established purge criteria.
Florida's intelligence criminal systems are operated in
compliance with standards established by 28 Code of Federal
Regulations, Part 23. This regulation was written to protect
the privacy rights of individuals and to encourage and expedite
the exchange of criminal intelligence information between and
among law enforcement agencies. The regulation provides
operational guidance for law enforcement agencies in five
primary areas.
Prior to the September 11th attacks, Florida utilized
factual data analysis on criminal investigations through the
Financial Crime Analysis Center at the Florida Department of
Law Enforcement. The Center integrates and analyzes financial
data in partnership with local and Federal criminal justice
agencies to identify and combat financial crimes.
The Center has developed a ``data warehouse'' which
contains information from various sources already available to
law enforcement. As part of the analytical process, the Center
utilizes specialized software to identify anomalies associated
with financial transactions. Analytical personnel and
investigators then examine the results to determine if the
information is related to a crime. The software currently used
by law enforcement agencies provides a graphical representation
of suspicious activity identified by financial services
companies. This method ensures that the user does not see
individual records, only the result, a safeguard that we
believe is very important.
The pattern of behavior is a key element of the decision
process of whether to investigate further. Users of this system
are trained to identify behaviors of known criminal activity
during all stages of money laundering. It is important to note
that by FDLE guidelines, reasonable suspicion is necessary
before initiating an investigation.
When reasonable suspicion is developed, analyzed data are
supplied to local State and Federal law enforcement agencies as
well as to other States for possible investigation. This
proactive approach results in increased team work amongst law
enforcement entities as well as a force multiplier effect for
the investigative process. FDLE agents regularly travel to
other States to investigate common targets.
Arizona and Florida are known as the two most effective
States in conducting these types of proactive investigations.
After the September 11th attacks, FDLE integrated this
process and applied it toward the fight against terrorism. FDLE
employed the assistance of public corporations that have access
to civil data records. In certain domestic security related
situations, FDLE has contracted with nationally recognized
public search businesses to analyze the records based on
criteria supplied by law enforcement. After the data is
processed, the results are provided to law enforcement for
further review. To ensure that the results are as indicative as
possible, a mathematical analysis is used and includes as many
as 14 criteria, producing a probability score for criminal
behavior. Prior to additional investigation or dissemination,
intelligence analysts and investigators examine only the
results with the highest scores. This information can be used
to identify, locate, target and monitor terrorists and other
criminals. This ability is essential if future terrorist events
are to be prevented.
Florida has partnered with a vendor, Seisint Technologies,
to provide the data analysis tools using both public and
private data. Over several years, Seisint Technologies has
acquired technology and data for multiple sources useful to law
enforcement. Following the terrorist attacks of September 11th,
Seisint focused on helping local State and Federal law
enforcement agencies locate and track individuals who might be
a threat to the United States. As a result of their partnership
with Florida law enforcement, a customized investigative tool
was developed. This system has already proven useful in that a
review of the known information intelligence and reported
activities of the 19 hijackers associated with the terrorist
events of September 11th identified several common and
associated variables. This system has proven useful in Florida,
but the need for timely sharing and exchange of information
nationwide remains a critical need.
Mr. Putnam. Thank you Senator Dockery.
[The prepared statement of Ms. Dockery follows:]
[GRAPHIC] [TIFF OMITTED] T7229.003
[GRAPHIC] [TIFF OMITTED] T7229.004
[GRAPHIC] [TIFF OMITTED] T7229.005
[GRAPHIC] [TIFF OMITTED] T7229.006
[GRAPHIC] [TIFF OMITTED] T7229.007
Mr. Putnam. I would like to introduce our next witness, Dr.
Jen Que Louie. He has spent over 25 years working with data
analysis systems, specifically with large data base systems,
data warehousing and data mining. Some of his projects include
designing, developing, and refining military logistics and C3I
capability models for the Department of Defense. He has
designed and implemented medical system diagnostic and analysis
programs, knowledge- and rules-based business systems, work
flow process and analysis systems, image management storage and
retrieval systems, and emergency management information
systems. Dr. Louie is president of Nautilus Systems, which is
located in Fairfax, VA. We look forward to your testimony.
Welcome to the subcommittee.
STATEMENT OF JEN QUE LOUIE, PRESIDENT, NAUTILUS SYSTEMS, INC.
Dr. Louie. Good morning, Mr. Chairman and distinguished
members of the subcommittee. Thank you for the opportunity to
testify today on data mining current applications and future
possibilities. Other than my prepared statement, this is a
quick summarization of data mining in general.
It is difficult to come up with a universal definition for
data mining. One consistent focus of data mining has been
basically that it is an analytic process with an ultimate goal
of prediction. You are looking to find something that is going
to be actionable, that is going to get you somewhere. In a
nutshell, data mining is an extraction of knowledge or
information from data. And at first glance, this may not seem
like a very powerful utility, but unlike mere data, knowledge
leads to incisive decisions and previously unknown
relationships that could have a bearing on your decision
process.
Data mining, unfortunately, like artificial intelligence of
the early eighties, is getting a lot of media hype and we will
call it slightly exaggerated benefits or feasibility of it. And
what I usually tell my clients is the first fallacy is data
mining tools. Data mining is a process. It is not a specific
tool, and the process will generally raise more questions than
it does produce answers. And while data mining does have the
ability to uncover patterns that can be remarkable, it still
requires a human with skills, analytical skills, to interpret
the meaning of what patterns you are looking at.
And my usual examples are a Dilbert cartoon where the
marketing person is telling the CEO, ``Our product is always
seen with people who have flu-like systems.'' And the product
development team is the reason they have flu-like systems; it
is because they are taking the product. So how you interpret
the data, how you apply it is an important part of how you
apply data mining.
Data mining is sometimes advertised and portrayed as being
an autonomous process; that once you have these rules that you
don't require analysts, and that is another fallacy. Another
fallacy is that it will pay for itself very rapidly. While
there is sometimes, we will call it articles, portraying very
high returns for the investment in data mining, those are not
very common. And yes, you can achieve a lot of return on your
investment with data mining. Credit card fraud is one. Tax
evasion is another. Money laundering. There are several tools
that are out in the market that require a lot of extensive
capabilities. Our company has worked with FinCEN on clearing a
lot of their caseloads. Those, I would say, are great paybacks
for the amount of money invested in those areas.
Data mining also sometimes raises the question about
missing data. Sometimes the data that's missing is more
interesting than the data that is there, and that provides some
other insights. Meeting your data mining expectations, planning
is the single most important step in any data mining effort.
You have to know and understand what the consumers of your
information product need and basically deliver it. Once you
determine what that is, the next thing in your investment in
your data mining effort is the environment that you run it in.
It should be what we call the best you can get, the fastest you
can get, the most storage you can get, and always allow
yourself plenty of time to review and analyze the data and look
at all the facets that are there in order to determine that you
are delivering the right message, and it is actionable in the
direction that user needs that information to be.
So, my quick summation: Data analysis is concerned with the
discovery and examination of patterns and associations found
with data. There are various ways to achieve this objective,
but all share the same fundamental notion that patterns
examined are present in the data. Also remember that what is
not in data can be just as interesting in certain situations,
and more useful to know.
Data mining is a process that involves multiple analytical
tools, methodologies driven by the needs of the information
product's consumer. The quality of information is directly
proportional to the trustworthiness and quality of that data.
The confidence of the prediction is dependent upon the data
mining practitioner's subject matter expertise and insight to
deliver actionable results. The data mining process is highly
computational, takes time; therefore, planning the approach and
selection of tools is influenced by the needs of the consumer.
Thank you.
Mr. Putnam. Thank you very much, Dr. Louie.
[The prepared statement of Dr. Louie follows:]
[GRAPHIC] [TIFF OMITTED] T7229.008
[GRAPHIC] [TIFF OMITTED] T7229.009
[GRAPHIC] [TIFF OMITTED] T7229.010
[GRAPHIC] [TIFF OMITTED] T7229.011
[GRAPHIC] [TIFF OMITTED] T7229.012
[GRAPHIC] [TIFF OMITTED] T7229.013
Mr. Putnam. Our next witness is Mark Forman. He served as
Associate Director for Information Technology in E-Government
for the Office of Management and Budget, a position he has held
since June 2001. He is effectively in charge of information
technology oversight for the entire Federal Government. And
his--he has a background in the private sector from Unysis and
IBM as well as work at the Senate Governmental Affairs
Committee staff. He is an invaluable resource on all of our IT
issues, and we believe his insight from the Federal perspective
will be enlightening to us as well. So with that, Mr. Forman,
you are recognized.
STATEMENT OF MARK A. FORMAN, ASSOCIATE DIRECTOR, INFORMATION
TECHNOLOGY AND ELECTRONIC GOVERNMENT, OFFICE OF MANAGEMENT AND
BUDGET
Mr. Forman. Thank you, Mr. Chairman, and members of the
subcommittee. Thank you for the opportunity to appear and to
discuss the administration's views on data mining. And I also
want to thank you for taking a very rational, well-balanced
approach in exploring data mining issues and opportunities.
While there are many definitions of data mining, the
committee's definition is generally accepted and we believe
helpful in defining the issues and its challenges.
I would like to start by talking about private sector uses
how we are using it in the Federal Government, and then the
challenges and opportunities. The private sector uses data
mining to make sense of a wide breadth of data. Some examples
are customer relationship management. Applied to customer
relationship management, data mining is used to analyze
disparate customer data and provide insights into customer
needs and wants. Companies that use data mining shorten
response time to market changes, which allows for better
alignment of their products with the customer needs. They do
this to increase revenue performance and allocate investment to
products that meet customer demand effectively.
Fraud detection. Companies use software that provide
comprehensive transaction-level financial reporting and
analysis to support automatic fraud detection and proactive
alerting.
Retail analysis and supply chain analysis. Companies such
as Wal-Mart are broadly recognized for analyzing sales trends.
Retail analysis and supply chain analysis can be used to
predict the effectiveness of promotions, decide which products
to stock in each store, and help managers understand cost and
revenue trends in order to adjust pricing and promotion in
anticipation of changes in marketplace conditions.
Medical analysis and diagnostics. The health care industry
uses analysis to predict the effectiveness of surgical
procedures, medical tests and medications. High-risk segments
of the population can be identified and targeted for proactive
treatment. The result is improved quality of life for patients,
reduced stress on hospitals and insurance providers using such
activities as proactive approaches to healing, I think it is
fair to say, and I have many more examples of the commercial
use of data mining. All of them deal with how fast we can
understand what customers need, and the Federal Government
would be well advanced to be able to respond more quickly to
what our citizens need.
So I will turn now to the government applications of data
mining and go through some of the examples and more of the
effects, both the way we deal with the citizens and how we
manage the government.
The Federal Government analyzes data that has been
collected from the public for several purposes, including
determining the eligibility of applicants for Federal benefits,
detecting potential instances of fraud, waste, and abuse in
Federal programs and for law enforcement activities. Some of
this analysis is facilitated by data mining.
So let us talk through a few of the examples. First,
financial management. Poor management practices create
opportunities for a wide range of fraud and abuse in the use of
government travel and purchase cards. Several agency inspector
general investigations have used data mining-type tools to
document inappropriate purchases and misuse of cards. OMB is
taking and will continue to take substantive affirmative steps
to ensure agencies improve their internal control systems to
monitor expenditures appropriately.
Human resource management. One of the 24 E-Government
initiatives, which we call the Enterprise H.R. Integration, and
which is managed by the Office of Personnel Management, is
leading the effort to provide a governmentwide data warehouse
of H.R. information to minimize the workload as employees move
from one department to another. A key component of this is the
E-Clearance project. OPM and its partner agencies on the E-
Clearance project are using data mining to more quickly access
information which speeds up the overall security clearance
investigation process.
Reducing erroneous payments and fraud detection. Data
analysis accomplished by the matching of electronic data bases
between government agencies has been an important and
successful tool for identifying improper payments under Federal
benefit and loan programs, as well as detecting potential
instances of fraud, waste, and abuse in the Federal programs.
As highlighted in the President's 2004 budget, agencies are now
required to report the extent of erroneous payments made in the
major benefit program. Through the President's Management
Agenda Initiative for improving financial performance, we are
getting a hand on the problem of erroneous payments.
Furthermore, the administration has proposed several pieces of
legislation regarding the administration's authority to share
data that will greatly improve efforts erroneous payments.
Policy analysis. The quality of policy decisions is a
function of our ability to correctly analyze enormous amounts
of data that describe a problem faced by modern society. For
example, the Department of Education mines data from a variety
of student financial aid systems, permitting professionals to
analyze Federal education programs quickly and easily without
the time expense and burden on citizens.
Law enforcement and homeland security. Federal agencies
have found data mining techniques to be an important tool for
assisting law enforcement in combating terrorism. For example,
a system such as the Department of Homeland Security's Bureau
of Customs and Border Protection operates the Automated
Commercial Environment which utilizes a series of data mining
tools to strengthen border security efforts.
Benefits and pitfalls. While the use of data mining to
access timely data and to identify relationships that were
previously known as powerful tools for identifying errors,
fraud, threats, etc., the application of such techniques to
personal information raises serious questions about privacy and
how it should be protected. In my written statement I focused
on two areas. First, the data analysis must be consistent with
law. We monitor that with business cases. Second, the Federal
Information Security Management Act further requires protection
of the data under security processes and techniques. Mr.
Chairman, thank you.
Mr. Putnam. Thank you very much.
[The prepared statement of Mr. Forman follows:]
[GRAPHIC] [TIFF OMITTED] T7229.014
[GRAPHIC] [TIFF OMITTED] T7229.015
[GRAPHIC] [TIFF OMITTED] T7229.016
[GRAPHIC] [TIFF OMITTED] T7229.017
[GRAPHIC] [TIFF OMITTED] T7229.018
[GRAPHIC] [TIFF OMITTED] T7229.019
Mr. Putnam. For insight from a Federal agency that uses
data pattern analysis, we have Gregory Kutz, Director of
Financial Management and Assurance at the General Accounting
Office. As a Director in the Financial Management Assurance
Team, Mr. Kutz is responsible for financial management issues
relating to the Department of Defense, NASA, the State
Department, and AID. He has also been recently involved in
preparation of reports issued by GAO and testimony relating to
credit card fraud and abuse at DOD, financial and operational
management issues at the IRS, financial condition and cost
recovery practices of the Department of Energy's Power
Marketing Administration, the Tennessee Valley Authority, and
AMTRAK.
You have been very busy. We look forward to your testimony.
STATEMENT OF GREGORY KUTZ, DIRECTOR, FINANCIAL MANAGEMENT AND
ASSURANCE, U.S. GENERAL ACCOUNTING OFFICE
Mr. Kutz. Thank you, Mr. Chairman, and members of the
subcommittee. I'm here to talk about our use of data mining in
audits of Federal programs. To date we have used data mining
primarily as an integral part of our audits of credit card
programs.
My testimony has two parts: First, the use of data mining
in our audits and investigations; and second, future uses of
data mining and related challenges.
First, our strategy is to use data mining to put a face on
issues of breakdowns in internal controls. It allows us to go
beyond simply saying that a program is vulnerable. For example,
data mining allowed us to report that government credit cards
were used for escort services, women's lingerie, prostitution,
gambling, cruises, and Los Angeles Lakers tickets.
Our data mining has helped us to identify specific
instances of fraud, waste, and abuse. The posterboard shows
several examples of government travel card abuse that we
identified through data mining, including the purchase of a
used car from Budget Rental Car; adult entertainment charges,
including gentlemen's clubs; Internet and casino gambling,
including an individual who charged $14,000 to pay for his
blackjack gambling habit and reimbursed travel money used to
pay for closing costs on a home purchase. For each of these
examples, we used various data mining inquiries to identify the
transactions and completed the case with auditor and
investigator followup.
The second posterboard is an excerpt from a government
purchase card statement. As you can see, somebody went on a
Christmas shopping spree. This bill, which includes nearly
$12,000 of fraudulent charges, was identified using data
mining. We identified these fraudulent transactions because of
the suspicious vendors and because of the timing of the
transactions. We used these findings in conjunction with
systematic internal control testing to make recommendations to
Federal agencies to develop effective systems and controls that
provide reasonable assurance that fraud, waste, and abuse are
minimized.
An important element of our success with data mining is the
synergy of auditors and investigators working together. Our
auditors have expertise in financial systems, data
manipulation, and evaluating internal control systems. Our
investigators bring a much different perspective. For example,
Special Agent Ryan, who is with me today, has several decades
of experience working on financial crimes for the Secret
Service. Investigators and auditors work together to assess
system vulnerabilities and develop our data mining strategies.
Moving on to my second point, our data mining work for the
Congress is expanding. Currently, we have a number of audits
underway that use data mining, including nine that I am
directly responsible for. Some examples of our expanded data
mining audits include DOD vendor payments, Army military pay
systems, HUD housing programs and Department of Energy national
laboratories. As we move forward, challenges will include data
reliability and security issues.
For the credit card work to date, we have used commercial
bank data bases to do our data mining, which we found to be
highly reliable. However, as we move beyond the credit cards,
one major challenge is the poor quality of Federal Government
data bases. In most cases, data base quality issues can be
overcome, but they result in less productive data mining and a
greater cost to our work.
Data security and privacy protection is another challenge.
For example, in handling large data bases of credit card
transactions, we developed strict protocols to protect this
sensitive data. We were especially concerned with protecting
credit card account numbers and individuals' Social Security
numbers. Data security issues must be addressed before
embarking on audits involving data mining.
In summary, data mining is a powerful tool that has
increased our ability to effectively audit Federal programs. We
are just beginning to make full use of data mining strategies.
With the right mix of technology, human capital expertise, and
data security measures, we believe that data mining will
continue to improve our audit and investigative work for the
Congress. Mr. Chairman, that ends my statement.
Mr. Putnam. Thank you Mr. Kutz. And I want to thank all the
witnesses for being so gracious and complying with our time
limitations.
[The prepared statement of Mr. Kutz follows:]
[GRAPHIC] [TIFF OMITTED] T7229.020
[GRAPHIC] [TIFF OMITTED] T7229.021
[GRAPHIC] [TIFF OMITTED] T7229.022
[GRAPHIC] [TIFF OMITTED] T7229.023
[GRAPHIC] [TIFF OMITTED] T7229.024
[GRAPHIC] [TIFF OMITTED] T7229.025
[GRAPHIC] [TIFF OMITTED] T7229.026
[GRAPHIC] [TIFF OMITTED] T7229.027
[GRAPHIC] [TIFF OMITTED] T7229.028
[GRAPHIC] [TIFF OMITTED] T7229.029
[GRAPHIC] [TIFF OMITTED] T7229.030
[GRAPHIC] [TIFF OMITTED] T7229.031
[GRAPHIC] [TIFF OMITTED] T7229.032
[GRAPHIC] [TIFF OMITTED] T7229.033
[GRAPHIC] [TIFF OMITTED] T7229.034
[GRAPHIC] [TIFF OMITTED] T7229.035
[GRAPHIC] [TIFF OMITTED] T7229.036
[GRAPHIC] [TIFF OMITTED] T7229.037
[GRAPHIC] [TIFF OMITTED] T7229.038
[GRAPHIC] [TIFF OMITTED] T7229.039
[GRAPHIC] [TIFF OMITTED] T7229.040
Mr. Putnam. Our final witness is Jeffrey Rosen, a law
professor at George Washington Law School. Mr. Rosen's area of
expertise is in privacy and technology issues. He has written
dozens of articles on the subject as well as a book. His
testimony will be valuable as we look to the legal and ethical
questions surrounding the use of data mining technology.
Welcome.
STATEMENT OF JEFFREY ROSEN, GEORGE WASHINGTON UNIVERSITY LAW
SCHOOL, LEGAL AFFAIRS EDITOR OF THE NEW REPUBLIC
Mr. Rosen. Thank you, Mr. Chairman, and members of the
subcommittee. It is an honor to be here. I am delighted that
you are holding this hearing because the effort to strike a
balance between privacy and security is a bipartisan issue and
I am delighted that you are informing yourself about the
complicated legal and technological choices that you face as
these technologies are implemented.
My thesis this morning is simple: It's possible through law
and technology to design data mining systems that strike better
rather than worse balances between privacy and security. But
there is no guarantee that the executive branch will demand
them or the technologist will provide them on their own. You
therefore, ladies and gentlemen of the Congress, have a special
responsibility to provide legal and technological oversight to
ensure that the technologies are developed and deployed in ways
that strike a good rather than a bad balance between privacy
and security.
Let me give you an example of the kind of design choice
that I have in mind. And I want to focus just for the sake of
argument on the Total Information Awareness Program that
Congress has recently decided, at least for the foreseeable
future, to block. Total information awareness provides a model
for the kind of mass dataveillance that we have been discussing
this morning and is being proposed in other contexts. Now, just
a question of definition, ``mass dataveillance'' refers to the
suspicionless surveillance of large groups of people. And that
is different from personal dataveillance of the kind that
Senator Dockery described which involves targeted surveillance
of individuals who have been identified in advance as being
unusually suspicious. Mass dataveillance poses special dangers.
In some ways it poses some of the same dangers of the general
warrants that the framers of the fourth amendment to the
Constitution were especially concerned about prohibiting.
When the government engages in mass dataveillance without
individualized suspicion, there is a danger of unlimited
discretion, as the government searches through masses of
personal information and searches suspicious activity without
specifying in advance the people, places, or things it expects
to find. Both general warrants and mass dataveillance run the
risk of allowing fishing expeditions in which the government is
trolling for crimes rather than particular criminals, violating
the privacy of millions of innocent people in the hope of
finding a handful of unknown and unidentified terrorists. At
the same time there is an important question of effectiveness.
And I want you to think pragmatically about these
technologies. Will they work in the national security arena?
Unlike people who commit credit card fraud of the kind that Mr.
Kutz described, credit card fraud is a form of systematic,
repetitive, and predictable behavior that fits a consistent
profile identified by millions of transactions. There is no
special reason to believe that terrorists in the future will
resemble those in the past. By trying to pick 11 out of 300
million people out of a computer profile, you may be looking
for a needle in a haystack, but the shape and the color of the
needle keep changing and, as a result, the profiles may produce
great numbers of false positives: those people wrongly
identified as terrorists.
I want you to think about the privacy issues and the
effectiveness issues. Does the technology that works in a
credit card arena make sense to apply in the national security
arena? Assuming that these technologies will be deployed in
different spheres, I urge you to recognize that they can be
designed in better or worse ways. The Total Information
Awareness Office itself recognized this and proposed technology
that it called ``selective revelation,'' which proposed to
minimize personally identifiable information while allowing
data mining and analysis on a large scale. The insight of
selective revelation is useful and may provide models for ways
privacy and liberty could be protected at the same time.
The Total Information Awareness Office had a project called
Ginisys that was exploring ways of separating identifying
information from personal transactions and only allowing the
link to be recreated when there is legal authority to do so.
This might allow, for example, the Centers for Disease Control
to have access to medical information while other groups do
not.
Using this model of selective revelation, Congress could
think about creating laws and technology that separate
identifying information from the data itself.
And Mr. Forman talked about the searches in existence with
current law. My strong belief is current law is not adequate,
the kind of complicated regulation that faces us, and you need
to think creatively about rising to this new challenge by
developing new oversight bodies and new technologies to ensure
the protection of privacy. But just hypothetically we could
imagine what those regulations would look like. Congress could
create a special oversight court with the authority to decide
when identifying data obtained during mass dataveillance may be
connected to transactional information. After intelligence
analysts have identified a series of transactions that they
think might be evidence of a terrorist plan or suggest that a
particular individual is unusually suspicious, they could
petition the oversight body for authorization to identify the
individuals concerned. In deciding whether or not to grant the
request, Congress could direct the court to satisfy itself that
the crime for which the evidence has been presented is a
serious threat of force or violence rather than a low-level or
trivial crime, and that the evidence suggests a link between
the suspects and terrorists. If the court granted the order,
then the analyst could link the identifying information and
they could share the information with State and local bodies
and so forth.
And there are other needs for regulation. You might have to
create standards for citizen oversights. Citizens should be
able to correct their data if it's incorrect or misused. And
fair information practices would give citizens the right to
know the information that the government has collected. So, you
see the general model. The search is anonymous unless there is
cause to believe that a particular individual is suspicious,
and then there is oversight to make sure that the individuals
are identified in connection with serious crimes. Merely to
describe the complexity of this regulation is to raise
legitimate questions about whether Congress is ready to adopt
them.
But Congress has met its oversight responsibilities in the
past. The most important checks on poorly designed technologies
of surveillance since September 11th have come from Congress
ranging from the decision to block total information awareness
in its current form to the insistence on creating oversight
mechanisms for the Carnivore e-mail program. I urge Congress to
accept the task of learning about the design choices inherent
in these technologies. You have it in your power to strike a
balance between liberty and security, and all you need now is
the will. Thank you very much.
Mr. Putnam. Thank you Mr. Rosen.
[The prepared statement of Mr. Rosen follows:]
[GRAPHIC] [TIFF OMITTED] T7229.041
[GRAPHIC] [TIFF OMITTED] T7229.042
[GRAPHIC] [TIFF OMITTED] T7229.043
[GRAPHIC] [TIFF OMITTED] T7229.044
Mr. Putnam. I certainly believe our witnesses have set the
table and created an environment for some outstanding dialog.
The gentlelady from Michigan has another appointment so I
will recognize her to lead off with our questions.
Mrs. Miller. Thank you, Mr. Chairman. I think my question
is for Mr. Kutz.
As I heard you talk about some of the various audits that
your agency is currently engaged in, you talked about nine
different audits that you are getting involved with, Energy
labs and DOD, etc., and certainly the testimony you gave about
the credit card fraud is startling. It is sickening. Those are
the kinds of things I think make people crazy about what is
happening at the Federal level. But you know, last week the
Congress had a very exhaustive debate about a budget resolution
and there was a lot of talk about waste, fraud, and abuse and
the kinds of problems in large numbers numerically that we
could get at to look at some reduction in our budgeting
process.
And I heard a lot of conversation last week--and I don't
know if this is one of your nine universes or not--but in the
area of Social Security, that there is as much as 10 percent of
the Social Security payments that are going to people who are
either deceased or for some reason do not qualify. And I don't
know if that is an area that you are auditing in your universe
there; and, if so, what kind of numbers are we talking about
and how would you do a construct to do the data mining? Do you
have any idea of how you might begin to proceed to take a look
at that type of waste, fraud, and abuse?
Mr. Kutz. Social Security is not one that we have on our
plate right now. We typically do our work at the request of
various Members of Congress or committees or subcommittees, and
that is not one we have been asked to do at this point.
Some of the ways you can use the technology for that, for
example, have been used by the Inspector General to look for
people who are receiving benefits that are over 90 or 100 years
old, and those are potential indicators of a family that might
be keeping the checks and didn't report the death to Social
Security and therefore received improper payments.
There are certainly lots of different queries and methods
you could use. And I believe the Inspector General has done a
lot of that, and I believe it has been used extensively there.
Also for Medicare, there has been extensive use of data
mining technologies to find fraud, waste, and abuse and also to
project the amount. Annually, the various agencies project how
much is going out the door in improper payments and, as you
know, there are tens of billions of dollars. And we are talking
about real money here, which is why we need good internal
control systems to minimize this waste, fraud, and abuse.
Mr. Forman. If I may, let me point out two projects in
particular. One is 1 of the 24 E-Government Initiatives that is
called the E-Vital Project. And so much of this is tied to, for
example, the Social Security Administration getting timely
notification when a person has passed on. That is explicitly
the target of the E-Vital Project that continues to have good
traction in the States that have been moving the death records
and other medical records on-line. It is a slow process. And as
you may recall, Michigan may have been one of the States. The
State has charged the agency to provide that information to
them. So there is some negotiation, because the cost should be
reduced when we put in place that as a computer system.
The other project is called PARIS, the Public Assistance
Reporting Information System, and that is a joint Federal/State
information network that was set up explicitly to allow for
data matching and mining on interagency-related benefits
program. So that would cover things like Supplemental Security
Income, the TANF program, Medicaid, Food Stamps, and Veterans
Affairs Program.
Mrs. Miller. In regards to the Social Security link that
the States have as they interact with the Federal Government,
isn't it true now--because I think every State is required to
solicit the Social Security number of every licensed driver--
that is something new in the last several years, and all of the
States are required to link to the Social Security
Administration because of that? Has that been helpful in
information sharing?
Mr. Forman. You know, to be quite honest, I think
ultimately, while there is a requirement to share information,
the reality is a big chunk of the benefit here in terms of
identifying people who are getting Social Security income but
have passed on comes back to the ability of States to share
information on the death certificates in a timely manner. And
some of the States and local county offices where that
information initially starts just haven't been electrified yet.
Mrs. Miller. My experience had been with the Social
Security link that we had in Michigan--I know some of the other
States were mentioning this as well--there was no way to verify
the Social Security number, so someone could give you any
digits that they wanted to. There was no way for the States to
verify that the Social Security number was in fact a valid
Social Security number. That is a problem, I think.
Mr. Forman. There has been some progress made on that, and
I know we looked at this a month ago when we did a review. I
would ask, if it is OK with the chairman, that we get back to
you on the Social Security Administration progress on that.
Mr. Putnam. We have been joined by the big Chair, the
chairman of the full committee. Mr. Davis, do you have any
comments or questions?
Mr. Davis. I will be very brief. I think data mining is
critical. If you go back 100 years, a visionary at the start of
the 20th century might have said, what is going to guide the
economy in the 20th century? The visionary might have said,
oil. And in fact, it was your entrepreneurs and your
visionaries who figured out how you get the oil, identified
where the oil was, how you get it out of the ground, how you
refine it, how you get it to markets, dominated much of the
economic activity of the 20th century.
Here we are at the start of the 21st. What would a
visionary say now? Really, the oil today is information. How
would we get that information and get it out of the ground, so
to speak; how do we refine it; how do we distribute it; what
uses does it have? And it is those entrepreneurs that are going
to in large part be the economic wunderkinds of the 21st
century. Had we had the EPA and all of the regulations on oil
in 1900, this stuff would still be in the ground. We never
would not have gotten it out.
My theory is we need to be slow about it coming in and
overregulating. You let the marketplace and let the public and
let the industry come up with its own protocols before the
government comes in and starts imposing a regulatory and taxing
regime that could stifle the growth and the potential for this.
That is kind of the way I look at it. Certainly there is going
to be a role for government down the way, and maybe in ways we
don't even envision today, because I think we are just at the
very beginning of a whole revolution. But that is kind of the
way I have looked at it.
And I don't know if you have any reaction. Mark Forman has
been working with us on a number of issues. I don't know if
anyone wants to react with that or disagree. Obviously, the
professor is here and has his own view.
Mr. Rosen. I guess I would just urge the chairman to ask
whether the kind of data mining that is appropriate in the
private sphere can be brought into the national security arena.
Much of the history of our privacy laws for the past 50 years
has been based on the idea that completely unregulated
information sharing is not consistent with the values of the
Constitution or of American citizens. We don't want every low-
level information officer in the field to know that I had a
youthful indiscretion or I am late in my child support payments
before I go onto an airplane, or that I am late on my credit
card or maybe I have some IRS issues against me.
Complete transparency of information, total unregulated
use, which is what many Silicon Valley people are urging,
wouldn't be consistent with the value of the fourth amendment.
It wouldn't be consistent with current privacy laws which
prohibit privacy sharing without good cause, and it also--and I
want to urge the chairman to think about it--would it be
effective? Is there any reason to believe that centralizing all
of our public and private data bases and allowing for a risk
prediction to be made would identify terrorists?
It is not like credit card fraud. Credit card fraud is
something you have 10 million examples of it and it takes
predictable patterns. People who steal credit cards test them
at service stations and then buy clothes at a mall. And because
it happens so often, you can use the technology to predict
credit card fraud.
We have no reason to believe that the next terrorist attack
is going to take the place of people who lived in Florida and
went to flight schools. It could take many forms. I respect
your libertarian instincts and the desire to use this
technology as effectively as possible. I just would say that if
you, the Congress, doesn't stand up for Constitutional values
to ensure inefficiencies as well as centralization, I don't
think the technologists of the executive branch will either.
Mr. Davis. Most of this information has been public. It has
just never been able to get collated and so rapidly deployed
and disseminated. That's what scares people. It is something in
the old days that could have taken 10 private detectives 6
months going through records to find you can get like that.
And as you spoke of in your testimony, it is a balance
issue; and I don't know what that right balance is, but I am on
the go-slow side rather than the overregulation side. We know,
for example, that the terrorists on September 11th--the
information that was out there between flight schools and
arrests and Immigration. Had we been able to collate that
information and get it in one place, we could have prevented it
from happening.
And some of you view this as an infringement on privacy,
but I don't know what you say to the victims and the families
of over 3,000 people that died that day. I don't know what the
right balance is, and I agree, and that is why we need to hear
from you and keep you at the table as we work our way through
this brand-new territory. And that is why we appreciate you
being here.
And I am not sure we have that right balance today. And I
am not sure, given the technologies that we have today, that we
can even start writing rules, because who knows what
technologies will be deployed and invented tomorrow that we may
not be able to have any idea what their application could be?
And I appreciate everybody's input and I appreciate you holding
this very important hearing.
Mr. Putnam. I believe the Senator had a response.
Ms. Dockery. Thank you, Mr. Chairman, and I just wanted to
comment that I agree very much with the Congressman,
Congressman Davis, and to comment to the professor, we in
Florida believe that the factual data analysis that we are
using now is appropriate for tracking down terrorists, and we
also believe that it led to the arrest recently of--a national
news story you may have heard about of a professor at
University of South Florida. And that was done through
collection of information that was all part of our public
records in the State of Florida that showed some connections.
So we think that this is a valuable tool and we think we
have shown in Florida its criminal possibilities. I will say
that in Florida, we have one of the most open record laws in
the country. We call it ``Government in the Sunshine,'' and it
is kind of interesting that the people in Florida just in the
past election voted a Constitutional amendment to require that
anytime we provide an exception to the open records law, it
would now require a two-thirds vote of both the House and the
Senate to make that exemption. The open public records law
actually helps law enforcement in Florida by making more and
more records available for us to use in our factual data
analysis.
So to that extent I wholeheartedly support Congressman
Davis's comments and would tell you that we probably need some
regulation to prevent us from going overboard and to protect
the forth amendment rights, but we should err on the side of
allowing the technologies to prove themselves out before we
overregulate an industry that is just beginning.
Mr. Putnam. For the professor and anyone else who would
like to respond, how would you compare data mining technology
to the emerging technology of DNA as a law enforcement tool 25
years ago?
Mr. Rosen. I think DNA offers greater security benefits and
fewer privacy threats for this reason. DNA is usually used in
the kind of focused investigation of the kind that Senator
Dockery was just suggesting: You have a clue and you can plug
it into a data base and it can be used to exonerate or
inculpate. And as long as there are restrictions on the use of
DNA for secondary purposes, the government can't turn it over
to insurance companies to deny me a job or make predictions
about my future health, I don't have privacy concerns about it.
Data mining, by contrast, of the kind that Roger Clark
calls ``mass dataveillance'' rather than ``personal
dataveillance,'' poses very different privacy issues. And I
want to distinguish the two, because Senator Dockery just
talked about how useful it is once you know something about an
individual. This USF professor, you can plug him into a data
base and draw connections. That is the same thing that was done
with the sniper. When you have the tip in Alabama and plug it
into the data bases and establish connections, that is useful
and that doesn't raise grave privacy concerns because the
individual has been identified in advance as suspicious.
My concern is the kind of mass dataveillance, not only the
total information awareness level, but the profiling systems
that are being proposed at airports. And the reason I am
concerned about them, this is the surveillance of the data of
millions of innocent citizens. And it's just not a little bit
of data. If the projects go forward, there are credit card
records, phone calls, tax records, all public and private data;
mass risk predictions based on this that could be used to
prosecute people not for terrorism--which I'm all for--but for
very low-level crimes.
It is that kind of fishing expedition--it is the example of
an unconstitutional search. At the time of the fourth
amendment, what the framers were most concerned about was
breaking into everyone's house looking for enemies of the
government, reading their private diaries, looking at innocent
information, in the course of seeing whether or not they were a
critic of the king, and then arresting them for whatever you
found in their House. That was a general search and it was
unconstitutional because it exposed a lot of innocent
information while looking at guilty information. That is what
mass dataveillance does. And that's why, without Constitutional
restrictions, I don't see how we could deny that there are
privacy concerns.
Mr. Putnam. A recent New York Times article, a Dr. Gilman
Louie, CEO of InQTel, outlined in a recent speech two different
approaches, one which he identified as the data mining approach
which results in what he calls watch lists and what he
indicated was too blunt an instrument; the second being data
analysis which begins with some type of investigative lead and
then uses software to scan for links between a person under
investigation and known terrorists. I presume that is an
approach you are advocating?
Mr. Rosen. I like that approach and I respect Mr. Louie,
who is sensitive to these issues, and he is distinguishing
between focused data mining based on individualized suspicion
and mass dataveillance.
And the same model interestingly has been taken by the
Foreign Intelligence Surveillance Court. Just yesterday the
Supreme Court decided not to review that decision of the
Foreign Intelligence Surveillance Court that said we don't have
to worry about broad surveillance of people who have been
identified in advance as agents of foreign powers because we
suspect that they're bad guys. And if we then find that they're
guilty of lower level crimes it's good to get them off the
streets because we're pretty sure that they're suspicious.
That's different, said the Foreign Intelligence Surveillance
Court, from using this mass dataveillance to look at everyone
without any cause to suspect them and going after them for
lower level crimes.
So I'm glad that Mr. Louie, who is at the forefront of the
government's effort to merge technologies that have been
developed in the private sector and apply them in the national
security area, is sensitive to that distinction, too.
Mr. Putnam. Let me direct that to our witness, Dr. Louie,
who is not the person I was just quoting. You indicated in your
testimony that data mining is a process, not a tool. Please
elaborate on that in the context of Mr. Rosen's comments.
Dr. Louie. Data mining goes--some of the focus that I keep
hearing is the emphasis going back to patterns. Data mining
deals with patterns, but I think the term ``patterns'' needs to
be expanded a little bit to understand in terms of other ways
of interpreting a pattern. A pattern can also be a series of
events. A led to B, B led to C, and on down the line. If we are
planning a--we'll call it a filtering mechanism to look at
everybody, you have to establish some parameters of saying if
we are looking for people who buy large quantities of potassium
nitrate fertilizer and they are not in agriculture or
landscaping and the like, maybe that should raise a flag. But
all it does is just put up a flag, says this is of interest.
And then if other events or other ties go back to it, then that
should, we'll call it, raise a level of suspicion that maybe
forwards it to somebody else to review. I think that's the way,
we will call it, data mining in general can be applied in terms
of looking for potential terrorists, whether it be something
like Oklahoma City or something like September 11th.
In terms of September 11th here we have another potentially
interesting, we will call it, information exchange of
Immigration's data base or when they applied for visas was,
we'll call it, a little bit more broader in their perception of
how they looked at the information coming in for, let's say,
applications of visas. We have, we'll call it, the linguistic
issue of how do you spell the name, what are the variations of
the name, variations being, let's say, diminutive form of the
name or a, we'll call it, a common substitution, Robert for
Bob, John for Jack, you know, and down the line. If we had a
way to compare that and also previous visas, abbreviations of
the names, transposing of the name that would have identified,
had these people come through our visa process before, where
did they go, did that raise any suspicions.
That's the way I see data mining being applied in terms of
broad, we'll call it, filtering of information. Not tracking
somebody necessarily, but raising, we'll call it, levels of
questionable flags or activities that may lead to something.
That way you are not tracking an individual, you're just
tracking recent events. If that event tracks out and says all
these events lead up to a suspicious activity, then we can go
back and say, OK, where did all these names come in or what is
the relationship of that. And that's up for the analysts. It's
the same way we track money laundering, we track bank accounts.
The banks are required to report any transaction of $10,000 or
greater. So if I deposit $ 9,999 it's not going to trip the
flag. But if, let's say, at the bank level they consolidate the
end of the day receipts and they see that account exceeded that
$10,000 maybe it should just raise a flag and make FINCEN aware
that there was a transaction, didn't meet the criteria but it's
just something maybe to watch. Either the bank watches it or
FINCEN watches it.
But that's the way I see you apply data mining. And in
terms of--I believe that was Gilman Louie from In-Q-Tel.
Mr. Putnam. Yes.
Dr. Louie. I agree with his prospect and the way he
outlines the way we should look at it. Data mining is an inert
tool. You can take very thin slices and basically create a
sandwich of a nice depth in order to act upon. And that's where
we use the term ``actionable information.'' And one slice of
information in itself, it may be totally insignificant and of
no value. But it's the cumulative process of all the
associations associated with that data point that become
interesting. And you don't have to store it. You just have to
essentially flag it. And when we have enough flags that trip,
we'll call it, your suspicion level, then you look at it. You
don't necessarily take an action on it, but evaluate it. And
that's where the human aspect or the analysts and subject
matter experts in that area can say this does look suspicious
or this should be maybe questioned.
Mr. Putnam. Mr. Forman.
Mr. Forman. I think it's incredibly important to keep in
mind that data mining is a productivity tool. Yes, it's part of
a process, but at the end of the day our decision has to be is
that a process that we want to have that is a more productive
process. And that's, I think, one of the big differences to
understand about the Total Information Awareness Initiative.
That's an R&D project. That is not a Federal IT program. And
when it hits the stage where somebody says, geez, we ought to
buy something, it falls into the process by which we put out
the standards associated with the business case. Are we going
to get any productivity out of it?
I have always kept in mind early in my years when I did a
lot of data analysis and operations research this notion of
garbage in, garbage out that Dr. Louie raised. I am very, very
mindful, especially in this area of homeland security, where we
have got dozens of data bases, merely hooking them together and
applying an algorithm is not going to make the data there any
better. Even so, merely allowing those islands of automation to
exist and the business process that run off of those islands of
automation aren't going to give us any greater homeland
security. The core and the issue here is to find out do we have
a better way, as we see in Florida, for the investigators to do
their work. And are we happy that this is appropriate, given
the Privacy Act, given the other laws that cover that. And
there is a policy decision to be made there. That now is
clearly required to be addressed in the business case process
under the E-Government Act, and under OMB guidance we are
updating it to comply with that.
Mr. Putnam. Anyone else wish to comment on that? With
regard to the private sector, is there an industry standard out
there that is being used to guard privacy and security of the
information in the data mining process? Solely in the private
sector. Is there a single industry standard?
Dr. Louie. There are no unified business industry
guidelines as far as, we'll call it, protecting the privacy of
the data. I think that most of our clients have relied on us to
devise a, we'll call it, a privacy statement of how we are
going to handle data, how we are going to handle the physical
storage as well as dissemination of the information and how--
who will actually get to see and touch it. That's something
that we have devised as being the consultants or the
practitioners to different companies. But there are no formal
guidelines. We have adapted the, we'll call it, guidelines as
specified by the Society of Competitive Intelligence
Professionals in terms of saying, OK, this is how we will
handle the data. This is how we will ensure our clients'
privacy and we will try to abide by that as a form of ethics.
Mr. Forman. I would say from the standpoint of what we have
seen, there are two standards that have existed over the last
couple of years. Opt in and opt out. And I know we have looked
an awful lot at those standards to see what would be
appropriate for the Federal Government. Opt out being a company
tells you you have got this data: If you want to continue with
this on-line service or continue as a customer with us, we are
going to show the data unless you tell us not to. And opt in is
essentially like we see with the little cards at the Giant
grocery store chains. If you get this card you get a lot of
discounts; in return you give us information about your buying
habits. And those discounts give you better products and so
forth. And so, how the data is used and how the option is
available to the consumer, I think they still have a couple of
common standards that have been around for a couple of years.
Mr. Putnam. Mr. Rosen.
Mr. Rosen. But opt in and opt out wouldn't begin to be
adequate to the challenge of the regulation you're thinking
about now because much of this is data that you can't opt out
of sharing. It's data such as credit card purchases that goes
automatically to warehouses like TRW or telephone calls that go
to the telephone company and that the court has held are not
legally protected because of the circular reasoning that you
voluntarily turned the information over for one purpose and
can't withhold it for another. So I'd gather the kind of
regulations that you want to be thinking about are the
patchwork of laws that do currently regulate information
sharing in the private sector, such as the Fair Credit
Reporting Act that would prohibit the kind of personally
identifiable financial information that can be shared. As I
understand several of the data mining proposals, such as the
Total Information Awareness Program, in its original form there
was a suggestion that those laws should be relaxed and that the
government should have access to data that's currently
restricted by law, such as personally identifiable credit card
information that can ordinarily be shared and the records of
international telephone calls that are regulated by other
statutes. So I wouldn't--with respect to the effort of using
private sector regulations as a model to guide you in the new
world that you face in Federal data mining, I don't think that
a simple opt in standard which is based on this voluntariness
notion would begin to do the trick. And that's why I think at
some point you may down the line have to think about
comprehensive reform at the level of the Privacy Act, which has
proved inadequate for regulating the kind of things we are
talking about now.
Mr. Putnam. Speaking now about the public sector, what
level of information sharing is currently allowable by law
within and between all government agencies without a special or
a specific warrant or request for that information? In other
words, how much information sharing is there between HUD, VA,
HHS, INS now from a technical potential and from a legal
potential.
Mr. Forman. There's very little information sharing. This
issue came up about a year ago with the concept after program
that was called gov.net, and there was a fear for cyber
security purposes that we had to protect the sharing of
information between agencies, and we found out there was
virtually no sharing of information between agencies. There
generally, it gets back to this issue that each agency built
its own data base, it's own data store, if you want to use the
parlance of today's hearing, to support its own mission. And
the question is, when can you look across the agencies, when is
there a need? Going back several years, two decades almost in
the scientific community, there was sharing probably most
extensive as it relates to what we now call geospatial
information or geographic information systems. There are
generally requirements associated to that that we handle via
the computer security rules and models and the business case
practices. Where we have seen a ramp-up of sharing between
agencies has been in the data management area that I've alluded
to in my testimony, and that happens to be with these major
Welfare programs and it is generally by the PARIS Project.
There's been explicit congressional authorization, literally
laws authorizing that. We have asked for some additional legal
authorities or additional data sharing, a creation of the
matching data base that has current job data, but even that is
only updated quarterly. We probably could do better than that.
Mr. Putnam. So would a successful data mining or factual
data analysis project that was attempting to identify a
particular profile of a terrorist, for example, would they be
able to access any and all Federal Governmental data bases
without a specific change in the law? Or would they be able to
do that as a result of the law's silence on the topic? First
part of the question. The second part of the question is, as a
technical matter, could it actually be done?
Dr. Louie. On the technical side I say we could do that. We
have for several government agencies, but the technical side of
making it happen is not really the problem. The problem is the
quality and trustworthiness of the information that's in those
data bases, is I would say poor to--you know, it is amazing
that they can conduct business.
Mr. Putnam. Senator Dockery.
Ms. Dockery. Thank you, Mr. Chairman. In Florida we require
reasonable suspicion to be developed before we use factual data
analysis, and then we abide by the standards established in 28
Code of Federal Regulations. To answer your question about
sharing intelligence information, Florida deals well with
sharing information with other States. In fact, there's a pilot
project, the Multistate Antiterrorism Information Exchange,
called MATRIX, which is going to consist of 13 States in this
pilot project. Our problem has been to share information with
the Federal Government, both in terms of us willingly giving
you information and you not being able to receive it and us
trying to receive information from the Federal Government.
One case in point, Florida has 16 million residents, but 60
million tourists. We have a lot of people moving through the
State and it would be very helpful to us if we could access the
visa data base, particularly if we could have access to anyone
who may be in Florida who has overstayed their visa and that
could lead to a lot of useful information in making these
connections. We do not keep dossiers on individuals. We look
for linkages based on reasonable suspicion in assorted events
and then we look for those linkages. Then just as soon as we
see them they're gone. So it is not a matter of starting a file
on an individual. It's looking at an activity and trying to
find who had some access to something involved within that
activity. But it would be very helpful to us and to other
States if there was a better cooperation of sharing
information.
We have now linked almost everything in Florida together so
we can access various agencies' data, but we cannot access
anything from the Federal Government nor can they for us
because the information that the State has is their possession.
But we are willing to share it. We just don't have the
technology to do so.
Mr. Putnam. Mr. Forman.
Mr. Forman. From a legal perspective, I believe there's a
pretty broad coverage, let me refer to three laws in
particular, the Privacy Act of 1974, the Computer Matching and
Privacy Protection Act of 1988 and the E-Government Act of
2002, all of which lay out the principles and the areas that
must be addressed, ultimately leading up to what we would look
for in the business case of privacy impact assessment. There is
a policy decision that will have to be made. There's guidance
from both OMB and the National Institute for Standards and
Technology on that for Federal information systems to ensure
appropriate protections of personal information. I think it's
fair to review some of those cases and how that's being done.
But the legal framework exists. This does not have to be built
from the ground up, per say.
I guess I'm more concerned about this on the technology
side. These data bases were largely poorly crafted to start
with. The business processes generally are nonexistent and when
we try to share information which have different embedded rules
in the data bases into a data warehouse and mine that data, I
keep in the back of my head garbage in, garbage out, because I
think that's the reality that we'll be forever patching
together in the Federal arena. I believe that this at the end
of the day is not so much a technology issue as we know. The
technology exists. It's been used in many governments,
including the U.S. Government, for years. The question comes
down to can we figure out what's the right business process and
who should be in charge or how we want to oversee that, pulling
that information together and the person who says I've got a
terrorist threat. The best framework for that so far as it
links to terrorism is the Department of Homeland Security Act.
Mr. Putnam. Mr. Rosen, do you have a comment?
Mr. Rosen. It's an interesting question whether there are
meaningful legal regulations on the sharing of data in the case
of individualized suspicion. The Privacy Act has a broad law
enforcement exception and a national security exception, so I'd
imagine that when we're talking about personal dataveillance,
focused on suspicious individuals, there wouldn't be meaningful
legal restrictions on sharing. Mass dataveillance is a
different question. And I think that the people who have
analyzed this are divided about whether dataveillance along the
total information awareness model would violate the Privacy
Act. It's not clear whether the information that is being
accessed would count as a system of records according to the
Privacy Act, and the mere phrase itself shows how outdated that
1970's idea, which presumes that information stored in
different file cabinets is for regulating data sharing in the
21st century. So--and then there's also the case that much of
this data is already held in the private sector and law
enforcement has a long history of piggybacking on the grand
data warehouses like TRW, and so forth, in order to get
information that it couldn't get on its own.
All this is to say that if you're in any way concerned
about restrictions on information sharing, as I hope that you
will be to the degree that the PATRIOT Act and the homeland
security bill create new provisions for information sharing and
the interest of national security, you're going to have to
think about this issue afresh and try to craft sensible
regulations for these new technologies.
Mr. Putnam. Do you presume then that under the current law,
particularly the Privacy Act, that authorization of personal
information that can be held by the IRS, for example, under the
current law would not be eligible to be transferred to Homeland
Security or INS or a different agency?
Mr. Rosen. As I understand it. I'm not an expert on the
IRS. The IRS has a series of complicated regulations that have
ensured that it especially doesn't lightly share information
with law enforcement. So both by practice and regulation, I am
not sure that there'd be easy access to that data. But the
mere--but you're right to focus on precisely that question and
then extrapolate from there to other sensitive information that
you might not want to be shared without cause, and then you
will get a sense of the degree of the challenge that you face.
Mr. Putnam. Well, Chairman Davis pointed out something that
in many of these cases data mining is the collation of
previously existing, perhaps even public data bases and
collections of information and that the amalgamation of that
data is what allows you to get a more useful outcome than the
time and effort and energy involved in searching each one
discretely. The blowup over TIA, characterizing it, I think,
has been over this presumption of the next step of data
collection between public and private and even into the more
personal side of things in terms of habits and patterns based
on purchases or travel destinations and things like that. But
is there anything--is there any effort currently underway other
than what had been a research and development project? Is there
any active program in the Federal Government that is doing that
type of surveillance or data mining?
Mr. Rosen. I understand that the CAPPS II program, which is
Computer Assisted Passenger Profiling Act--I think I have got
the acronym right--is based on very much of a TIA model and is
also trying to collate information which is already in the
public's sphere and make risk predictions for particular
passengers at airports. So that's why I think the TIA model is
one that you will have to think about hard, and I think that
the chairman's notion that all this information is already in
the private domain and therefore is not of concern and can be
analyzed perhaps misses the fact that once the analysis becomes
granular there is a difference between having me watched on the
street when I walk from door to door by a cop or a neighbor and
the government planting a camera on my back that follows me
from door to door and records each of my activities throughout
the day. That reality, the fact that a level of instrusiveness
is inconsistent with the values of a free society is one that
our law is not well set up to deal with. The Supreme Court's
test for invasion of privacy, as you know, Congressman, says
the question there is a subjective expectation of privacy that
society is prepared to accept as reasonable and as the
invasions become more invasive people's expectations are
lowered with a lowering of Constitutional protections. So I
would resist the chairman's notion that as long as the
information is out there, that any degree of collation and
technical analysis is fair game because there is a point at
which as you have said when very intimate personal information
becomes available to the government on a massive scale that's
quite different from some reporter going down to the courthouse
and rummaging through a couple of paper records 50 years ago.
Mr. Putnam. Mr. Forman.
Mr. Forman. Well, in preparation for this hearing, I did a
run on our major IT investments of the Federal Government. I
did actually two runs, to identify all the data mining and then
to identify all the data warehouses because why do a data
warehouse if you're not going to mine the data. And zero
projects showed up. So I didn't believe that. We don't have
anything go on with regards to this. So I used a data mining
tool, the search engine on first.gov and got well over 1,000
hits. There's an awful lot of activity going on. Now the
question that seems to me comes down to is do we have anything
going on as an official IT investment that relates to kind of
these random searches. And I'm not aware of any that Dr. Rosen
is so concerned about. It doesn't mean that it's not out there.
I really need to go back and dig deeper. I just have not found
any yet. On the other hand, is there--are there some data
mining applications that are similar to that and I think, yeah,
you'd have to say that the credit card fraud is very similar.
You know the pattern. Same thing on Medicare, Medicaid,
mischarging. We know that we should be spending, for example, a
certain amount for a certain type of procedure. If we see a
company that is routinely overcharging us, we know that it's
not an error, it's a systematic overcharging. And so that's a
very similar type issue and I think in the areas of government
accounts payables, where we know some tolerances and we can use
data mining to identify people who are overcharging or
fraudulently charging us. You do see that and that has gone
through the privacy impact assessment reviews generally.
Mr. Putnam. Senator Dockery, hasn't the State of Florida
for some time used a data analysis, data sharing, data mining
type technology to compare and even correlate employment
records with child support payments to develop a list of folks
who are behind in that and whether or not they are cheating the
system?
Ms. Dockery. Yes, that's one of many areas that Florida has
used the technology. Also, in smuggling rings, money
laundering, child molestations, so we--after September 11th it
was the technology was already there and it was just a matter
of adapting it to now apply it to homeland security.
Mr. Putnam. So there's a history of civil uses as well as
the criminal uses, at least in the State of Florida.
Ms. Dockery. Exactly.
Mr. Putnam. We have been joined by our ranking member,
gentleman from Missouri, Mr. Clay, and I'd ask unanimous
consent that he be able to enter his statement into the record.
And without objection, show it done, and now recognize him for
his statement and questions.
Mr. Clay. Thank you very much, Mr. Chairman. Let me say,
for Mr. Rosen, the Transportation Security Administration plans
to use data mining to develop terrorist profiling for anyone
who flies. And if Congress goes along with this proposal, what
safeguard should be established at the same time to assure
public rights similar to those provided in the Privacy Act? Let
me also say that--do you believe that airlines are now using
profiles when you go to the kiosk to get your boarding pass,
and you put your card through the kiosk, don't you think that
they examine some of your recent credit activity now and is
profiling occurring now by the airlines?
Mr. Rosen. I do, Congressman. As I understand CAPPS I, or
the computer assisted profiling system that's now in use, it
does indeed analyze publicly available information from the
private and public sector and make risk predictions that can
lead people to be taken aside for different searches. As I
understand, CAPPS II would only increase this profiling by
adding information to the data base. It's difficult to answer
your question adequately, because the Transportation Security
Administration is not forthcoming about exactly what
information it's analyzing and how it's using it, and I think a
crucial part of your oversight role should be to ensure that
the data in the data base is transparent, not the algorithms.
The transportation authority says, well, we can't tell you what
algorithms we're using or the terrorists can beat the system.
What Congress needs to know is not what the algorithms are, but
is this data that the Federal Government is entitled to
analyze.
So when you think about how to regulate this new system,
and this will be a pressing concern, even more so than total
information awareness because that's been tabled for the
moment, think about transparency, accountability. Citizens
should be able to correct errors in their data base. We have
heard a lot this morning about the poor quality of the data.
Imagine being stopped repeatedly on the basis of inaccurate
information and having no remedy, not even being told why
you've been stopped. The application of fair information
practices to the transportation arena is something that
Congress urgently needs to think about because the Privacy Act
in its incarnation is not adequate to the task.
So I think that this should be a good model for you as you
think about regulation.
Mr. Clay. Thank you very much.
Mr. Forman, along those same lines, airline security has
had a troubled history of racial profiling, even before the
attack on the World Trade Towers. During the 1991 Gulf war
individuals with Middle Eastern names were forced off their
flights despite the fact they were American citizens. Last year
the ACLU testified before Congress of dozens of such incidents,
individuals discriminated against in airports or on airplanes
based on race and heritage. The same people who oversaw the
private contractors who provided discriminatory security are
now designing new systems. What is OMB doing to prevent racial
profiling from continuing in air transportation?
Mr. Forman. Well, let me put this into the context of the
CAPPS II program. The CAPPS II program was not approved by OMB
to proceed at the pace that they seem to want to proceed. I
have a huge spotlight on that project right now. They're late
in getting back to me the information that they need to
proceed. So the issues that we're talking about, the issues
that concern me essentially, CAPPS II could quickly become the
80th watchlist. And I have to take a step back in my job and
say, what value added do we get by yet another island of
automation coming up with something farther away from something
that's going to give us the productivity and effectiveness
we're looking for. You know, the argument that I have heard in
favor of CAPPS and CAPPS II essentially went back to the
question of do you want this random? Because my father, my
grandmother was pulled out of line. And it just didn't seem to
make sense. So there has to be something better. And I think,
and I allude to this in my testimony in the customs arena, in
the package movement, we seem to figure out this risk paradigm.
Now, I think that's what we are looking for. We're clearly not
looking for a racial profiling. We are looking for a risk
profiling. And there the data that I'm asking for, it's got to
be in the business case, would give us both the technical
programmatic reviews as well as the policy review. We don't
have it yet.
Mr. Clay. In this process you're looking for random, random
profiling and not racial profiling or heritage?
Mr. Forman. We are looking for risk based--.
Mr. Clay. Risk based.
Mr. Forman. Reduction. So not random profiling.
Mr. Clay. So the 9-year-old little girl that goes through,
you may not want to search her, through TSA. You may not want
to search her?
Mr. Forman. As a random selection, that would be correct.
Mr. Clay. Or the 85-year-old grandmother?
Mr. Forman. As a random selection, that would be correct.
We are looking for clear documentation that they have actually
figured out an approach that's going to improve the
productivity. You know, we can spend hundreds of millions of
dollars on a terrific IT system with very pretty screens or
very fruitful data mining techniques. But at the end of the
day, if it somehow does not lower the risk, to me, I would have
to say that is not a good IT investment for the Federal
Government and would recommend against that.
Mr. Clay. OK. All right. Thank you.
Mr. Kutz, does data mining need individual identities in
order to detect patterns of unusual activity? And can the
government develop profiles of unusual activity and then
followup on the specifics with appropriate oversight?
Mr. Kutz. Again, what--most of what we have done so far
relates to credit card data bases, but we have gone beyond that
certainly for the credit card data bases and these were
government credit cards, ones issued by the--on behalf of the
Federal Government to use for government purposes. We did have
that information to basically analyze and put together patterns
of activity, etc. But we have also gone beyond, I was going to
mention an example last year. We testified before
Representative Shays on the JS List suit, which is the current
chem-bio suits that are being used in the Middle East. And what
we identified there was that they were excessing and selling
those goods on the Internet at the same time they were buying
them. And so in that instance, we tried to identify who was
buying these suits and whether or not they might be using them
for something that would be against the government. So we try
to identify, where it is appropriate, individual identities to
followup for investigative purposes.
Mr. Clay. Let me ask you a followup on the question I asked
Mr. Rosen. What exactly do the airlines look for when we go to
the kiosk and put our credit card through? What kind of
financial activity are they looking at? Just out of curiosity.
Mr. Kutz. I couldn't answer that question.
Mr. Clay. You don't know. Does anyone on the panel know
what they're looking at? I mean, is it one purchasing one-way
tickets or what exactly.
Mr. Rosen. We know from criminal procedure cases that
there's certainly public information that they look for, one-
way tickets, certain points of origin passengers and the
addresses and phone numbers that you check in with and the
people that you also are traveling with, and information neuro
network analysis can be done on that. But we are assuming that
they're respecting legal limitations on, for example, looking
at personally identifiable phone calls or personally
identifiable credit card information. But finding out the
precise answer to that, I know there are groups like some of
the privacy groups in town have Freedom of Information Act
requests to find out exactly what information is being used and
they haven't found the TSA terribly forthcoming, as I
understand it.
Mr. Clay. Do you think they also look at recent purchases
in retail outlets?
Mr. Rosen. As I understand it, they would be restricted
from doing that by the Federal Credit Reporting Act, but you
need a closer parsing of the statute than I can give you for
that.
Mr. Clay. OK. Thank you very much.
[The prepared statement of Hon. Wm. Lacy Clay follows:]
[GRAPHIC] [TIFF OMITTED] T7229.045
[GRAPHIC] [TIFF OMITTED] T7229.046
[GRAPHIC] [TIFF OMITTED] T7229.047
Mr. Putnam. The gentleman raises an interesting point.
Immediately after September 11th I was pulled every single time
I flew because I was not in a frequent flier program, we bought
our tickets at the last minute because of the Congressional
schedule and it was always one-way. And so I got the body
cavity search just about every time I flew. And it's terribly
frustrating and it begs some better type of profiling,
particularly based on risk. And while some Members of Congress
can be shady characters at times, hopefully we wouldn't fit the
risk profile.
Mr. Clay. Hopefully we wouldn't get stopped as often.
Mr. Putnam. Well, hopefully, at least not quite as often.
Every time got a little old.
But let's get back to the people component of this, because
I think everyone has agreed that at the end of the day, no
matter what type of process there is and no matter what type of
information or data is out there, at the end of the day it is
going to require some analysis by a human being. And everyone
in general has seemed to stress the need for quality data as
well as those high quality analytical skills in the personnel.
Can you expand on that a little bit and talk about where we
are in terms of our human capital and the role that they play
in obtaining acceptable results through this process?
Mr. Forman. I think there are some very, very good examples
of the training and culture change that has to take place here.
When you move from a paper based--technically we call knowledge
management environment--to an on-line you're going to use
different interfaces. To do--to have that tool kit, if you
will, generally, people have to become computer literate and
willing to use computers. And that's where we see, especially
in the law enforcement arena, a cultural, maybe generational
change that we are working through. Certainly you'll see that
at the FBI if you look at their use of the TRILOGY program and
the culture of change that the Director is bringing. From my
perspective, in the business case itself I look at that. I look
to see are we investing in training and process reengineering,
change management projects. And when I see generally data
mining or tools that use these knowledge management systems and
support systems tools without any training, that is a flag to
us that this should go on the high risk list. Unfortunately,
that has been the pattern of government. Somebody in the
technology side invests in these tools and then they get ready
to deploy and they find out culturally or from an education
standpoint people don't want to use them. And as in the case of
the INS, then we go on a binge of buying training services. So
I'd say right now, training or the education part has been an
afterthought and it's one that needs a lot more attention and
funding from the up-front. We are trying to put that discipline
in the process.
Mr. Kutz. Mr. Chairman, I would add to that the software
that we had to do the data mining that we have done in the
fraud, waste and abuse type applications which is fantastic.
It's flexible. We certainly train our people, etc. But the real
element that makes it work is the people and the continuous
learning that goes on with even using that software and the
various programs. So we've kind of got a process where as we
look at a system and a program, we understand the program,
understand the controls, understand the vulnerabilities, and we
use that too as a feedback into the actual data mining
strategy, combining auditors and investigators again.
I mentioned Mr. Ryan, who's with me today, who worked for
the Secret Service doing money laundering and credit card
crimes for decades. People with that kind of experience
teaching younger people some of the things that they know
really provides a great atmosphere for learning and developing
all those human capital skills.
Mr. Putnam. Have you an estimate of the savings that have
been derived from that type of data sharing initiative?
Mr. Kutz. From the data mining with respect to the fraud,
waste and abuse?
Mr. Putnam. From the financial management side, yes.
Mr. Kutz. If you go back to the improper payments reporting
that's gone on in Federal Government for years, I think that
areas like Medicare have shown large decreases in estimated
improper payments, and that's I think in part due to the data
mining that's gone on there. Another program that's had a great
deal of oversight in that area is the earned income tax credit,
which had estimates of as much as $8 billion of improper or
fraudulent type payments over the years. So there's certainly
been savings. I don't think it's been quantified necessarily,
but the focus of data mining and the focus on improper payments
going out the door has led to better controls in the government
and probably saved billions of dollars.
Mr. Putnam. Senator Dockery.
Ms. Dockery. Thank you, Mr. Chairman. You bring up a good
point and one that piggybacks on to Congressman Davis. The
information that we are using in tracking criminal activity and
potential terrorist events takes into consideration what used
to be information in various locations. By putting that all
together, it cuts the time down from weeks or months to a
matter of minutes. Once that information has identified a risk,
that's when the investigations begin. So it still comes down to
our human investigators, but instead of spending all their time
digging through paper to find out where to start, they now have
a starting point and spend their time more wisely looking at
those individuals who have come up as a potential risk. So it
does involve a lot of training. We do--the success of what we
do with that information lies within our law enforcement, but
this allows them to spend their time in the investigation and
not in trying to put together a pattern.
Mr. Putnam. How reliable is that data? How often is it
maintained? How often is it upgraded? And we have certainly
learned in our experience with the election that sometimes our
data bases are a little old with respect to eligible voters and
convicted felons and things like that. How good a job does the
State do in maintaining that data base that they depend on?
Ms. Dockery. Well, I am not an expert in that area, but I
would say that we do have systems put in place to purge
information. We have systems put into place to check
information. And the sharing of the information allows us to
hear from other sources in the law enforcement community that
some information may be suspect. So I think our information is
good. Keep in mind that when it lists people with risk factors,
that doesn't point to that person as being guilty of anything.
It points to that person as coming up as maybe a place to start
the investigation.
Mr. Putnam. Mr. Forman, you had referred to geospatial
information earlier in your testimony. In my understanding that
is 1 of the 24 E-Government initiatives, and that would involve
an overlay of information from a variety of sources with regard
to identifying the geography of data. In essence, you overlay
the census data with USGS data and we can look at, you know,
where the population threats are to sensitive estuaries or any
of a million combinations of things by combining all the data
that's collected and stacking it in a meaningful way to derive
answers about what's going on. Isn't that data mining?
Mr. Forman. Yeah. That very definitely will have to require
data mining. There are two approaches to leveraging the
redundant data sources. One is the concept of buy once and use
many. We are definitely proceeding with that. But then where do
you put that data? Is it some is maintained at National Weather
Service, for example, or NOAA and some is maintained at the
U.S. Geological Survey, some is maintained at Environmental
Protection Agency? That kind of pier to pier computing model is
the emerging concept of a virtual data warehouse in which case
probably at that program office you would have the meditative
description of where do I go to find this data, what is the
standard, and access that. Regardless of whether it is a
physical data warehouse or this virtual data warehouse to get
access to that data, to make sense of it, data mining
techniques will be used. They have been used, you know, for
example, probably the best example today, if you go to the
Census Web site, American Fact Finder, you can find out
supposedly, I haven't done this, but the theory was you could
find out how many kids of soccer age for second grade soccer
teams, second and third grade soccer teams are in your track,
you know, in your soccer league area. That wouldn't tell you by
house, but that would tell you maybe by block or by
subdivision.
Mr. Putnam. The opportunities for the beneficial use strike
me as endless. When you compare weather patterns with farm
payments, with crop insurance, perils and things like that,
then maybe we start raising the risk premiums for that area or
maybe we adjust our farm payments so we don't let people plant
in that area until El Nino clears up. I mean the opportunities
are endless to derive information. The Federal Government
spends a fortune collecting information and the fact that it is
for the large part underutilized is distressing from a taxpayer
perspective.
Mr. Rosen, you mentioned earlier that perhaps we should
consider the creation of a special court to consider these
types of requests for specific searches, I believe.
Mr. Rosen. I did. And, Congressman, I would distinguish the
need for a special court when we are talking about the mass
dataveillance of personally identifiable data with the kind of
syndromic surveillance that you and Mr. Forman have just been
talking about. This is indeed a wonderful resource, and there
are no privacy issues when you're making general statements
about weather patterns or census information that's not
personally identifiable or the Centers for Disease Control
using data mining to figure out when people are checking in in
one area with an epidemic or, to give another example that I am
very impressed by, the city of Chicago using data mining to
figure out when crime patterns correspond with particular
weather patterns and sports events and then they can deploy the
cops to that area of town when there is a particular game on
and that's really hot and then they can stop crime. These are
wonderful things that don't raise any privacy issues at all.
That's very different though from, and again if the jargon
isn't helpful let's come up with another term, but mass
dataveillance, suspicionless searches at airports, the total
information awareness model, this is something that needs
regulations.
So my message has been this stuff isn't all good or all bad
and the technology isn't evil, just be especially attuned to
the privacy dangers of suspicionless searches that allow
personal information to be collected in ways that are not
currently available. And for that I think you do need--it
doesn't have to be a special court. You could have a
magistrate. You could have a congressional oversight body.
There are all sorts of ways to do it. But you have to separate
the model as the data is traceable but not identifiable. You
can do those sort of general predictions and risk profiles that
Mr. Forman is talking about, but you can't actually identify me
as the person who's been buying fertilizer unless it really
looks like I'm a terrorist because I've done some other things
that are suspicious, too.
Mr. Putnam. Well, I would remind you and the rest of the
panel and the audience that on May 6th we will convene our next
oversight hearing on this topic, specifically to address TIA
CAPPS II and some other similar programs.
With that, I will yield back to the gentleman from Missouri
for any questions.
Mr. Clay. Thank you, Mr. Chairman. Senator Dockery, I'd be
interested to know what Florida does to protect individual
rights. Does an individual have a right to know what
information about them is included in the data analyzed in the
factual data analysis? Does the individual have a right to
correct the information in those data bases that is wrong? And
what happens if an individual is singled out because of
incorrect information in one of these data bases? Can you kind
of expound on that for me?
Ms. Dockery. Yes. Thank you. All the information that is in
the data bases are part of Florida's open public records. So
any individual is at any time able to check out those records
and to clarify any misinformation on those records. We don't
keep particular files on any individuals. We look for events,
and risk factors may make somebody come up. Then it goes to a
human being, an investigator to investigate that and they may
find that just because the individual was identified as being--
fitting those risk profile that person was nowhere near the
event. So there are a lot of safeguards built in. And of
course, we abide by the Federal Code that I mentioned earlier.
Mr. Clay. So the safeguards are there and they're helpful
and people can followup and correct them?
Ms. Dockery. Yes.
Mr. Clay. That sounds like a pretty foolproof system. Thank
you.
Mr. Kutz, what would you recommend Congress do to stop the
racial profiling that is going on in today's airline security?
Do you have any recommendations?
Mr. Kutz. No, that's not an area that I deal with so I
can't comment on that.
Mr. Clay. OK. Well, let me also ask you, you recently did
some work for Congress where you identified several people
getting treatment at veterans hospitals who were listed as
deceased on Social Security records. With further
investigation, you showed that the problem was errors in the
Social Security records. Now, if TSA had those Social Security
records in their data base, those people would be stopped from
flying and they would have no way of knowing why or correcting
the incorrect information. Would you agree that any system used
by TSA has to allow for the public to know what information is
being used to rate them and what other safeguards should be in
place?
Mr. Kutz. Your question gets back to the issue I think Mr.
Forman talked about, about data quality in the Federal
Government, and we did indeed find, and this was from military
treatment facilities, we had compared people who were served at
some military treatment facilities with a Social Security death
file and there were some hits that came out of people that
appeared to be dead that were not really dead. And so there
were errors in the Social Security death file, and that
certainly raises issues about what that file is used for. That
file is certainly shared with others. It's sold to others. And
the Social Security Inspector General has reported other
examples of errors with that.
So this issue of Federal Government data base reliability
is a major challenge here in all applications of data mining
going forward. And I had some experiences I was going to share
with you on the IRS, where I used to be responsible for the IRS
financial audit, and we found lots of instances there with the
errors in the system there were people who were being pursued
and having taxes collected from them but didn't owe any taxes.
At the same time we were issuing lots of refunds to people who
weren't due refunds.
So, again you've got lots of issues with data quality and I
would say that the Federal Government is decades behind the
private sector in that area. I got to go to Bentonville, AR
within the last year to visit the Wal-Mart headquarters and it
was quite fascinating to see the technology that they use in
their inventory supply chain management, and when I compare
that to where the Federal Government is with its inventory
management again it's just decades behind. And they were able
to tell us at Wal-Mart headquarters how many tubes of
toothpaste there were at the Fairfax Wal-Mart here in 1 minute.
And not only that, but how many they had actually stocked in
the last week, how many had been bought in the last week, just
tremendous technology, whereas again in the Federal Government
I'll go back to the JS List, the chem-bio suits used by our
troops. Once those left the defense warehouses into the
military services, complete visibility was lost and we were
unable to determine where these chem-bio suits were, some from
prior years that had been defective through a fraud scheme by a
private sector company.
Mr. Clay. You do make recommendations to the different
agencies how to correct the errors that you all find?
Mr. Kutz. Right. That's the value of data mining. It helps
us to make valuable recommendations to Federal agencies to
improve their control systems, etc., to try to minimize the
risk of these things happening that I've just described.
Mr. Clay. What was your recommendation to the Social
Security Administration?
Mr. Kutz. We didn't make any recommendations to them
because the Inspector General had already made recommendations
to them, and they are working to clean up that data base.
Mr. Clay. I see. Thank you very much.
Mr. Forman, would you support legislation that prohibited
the TSA from using any system that used profiles based on race,
religion, national origin, gender, sexual orientation or
proxies for those characteristics?
Mr. Forman. I forever remember my time on the Hill and a
good staffer on detail from GAO who has been a staffer to this
committee before, the devil's in the details. I'd have to see
the specifics.
Mr. Clay. See the specifics. OK. Thank you very much. And
thank you, Mr. Chairman.
Mr. Putnam. Thank you, Mr. Clay. And Mr. Kutz, when Mr.
Forman gets done with the Federal Government, Bentonville, AR
is going to be sending executives up here to tour the Federal
Government to see how efficient we are. Isn't that right?
Mr. Forman. Absolutely.
Mr. Putnam. I want to thank the witnesses for their
outstanding testimony and for the questions of the
subcommittee. We will be focusing very, very directly on this
topic throughout the 108th Congress. Our next hearing on the
topic is May 6th to look at some of the specific issues that
have been raised. But this is very clearly on my radar screen
and something that we will continue to monitor very closely. It
is an important issue. It holds the promise of tremendous
potential benefits to our taxpayers in eliminating waste, fraud
and abuse and bringing better financial management practice,
and frankly it raises some red flags in terms of protecting
those very same taxpayers' privacy and personal information. So
we will do what we can to determine where that fine line is and
attempt to walk it.
So I understand Mr. Rosen has to be out to teach his class,
but do any of you have one last question that you wish we had
asked you that you want to answer?
Senator Dockery.
Ms. Dockery. It's not a question. But, Mr. Chairman, if I
could just take this minute since I don't have the opportunity
to speak to a congressional committee every day, I want to
thank you on behalf of the States for what you do in Congress,
to send money down to the States to allow us to do the job of
protecting the residents in our State against any threat to our
homeland security, and I would ask that in the future when
moneys are coming down from the Federal Government, the more
flexibility you could give us in spending those moneys and if
you could have those moneys go through the State rather than
directly to the local governments so that we can have a better
feel for what's coming down and avoid duplication of effort.
But thank you for all that you do for us and thank you for
letting me participate today.
Mr. Putnam. Thank you, Senator.
Dr. Louie.
Dr. Louie. Yeah. This is on-line data collection. The point
about individual data elements are not necessarily very
important in themselves, but you should also look at how this
data is used as if it were classified material. Individual
elements in themselves are not necessarily important. It's the
combination of multiple elements that make it an interesting
issue as far as questionable invasion of privacy or whether it
raises flags about how that data is being used in the case of
are we really profiling or are we looking at a risk assessment.
Should we look at race and national origin? Probably yes. In
themselves they are not necessarily the most important item,
but in combination with other data elements they may raise a
level of risk, and it needs to be considered in that manner. It
needs to be viewed not as an individual component, but the sum
of all the components looked at in terms of evaluating whether
this information is something that warrants looking into or not
looking into.
So does it make it actionable? That's the way you need to
look at the collection of data, not the individual elements
necessarily.
Thank you for the opportunity.
Mr. Putnam. My pleasure. Thank you. Anyone else?
Mr. Kutz. Yeah, I would just say I appreciate you inviting
us to the hearing today. Since we work for Congress, we
certainly believe data mining is a tool that's going to be able
to help us better serve you and to do better audits and
investigations on your behalf. So I appreciate that.
Mr. Putnam. Thank you. Mr. Rosen. Mr. Forman. We appreciate
your efforts. I'm reminded that in the event there are
additional questions the record will remain open for 2 weeks
for submitted answers. And with that, the meeting is adjourned.
[Whereupon, at 11:30 a.m., the subcommittee was adjourned.]
[Additional information submitted for the hearing record
follows:]
[GRAPHIC] [TIFF OMITTED] T7229.048
[GRAPHIC] [TIFF OMITTED] T7229.049
[GRAPHIC] [TIFF OMITTED] T7229.050
[GRAPHIC] [TIFF OMITTED] T7229.051
[GRAPHIC] [TIFF OMITTED] T7229.052
[GRAPHIC] [TIFF OMITTED] T7229.053
[GRAPHIC] [TIFF OMITTED] T7229.054
[GRAPHIC] [TIFF OMITTED] T7229.055
[GRAPHIC] [TIFF OMITTED] T7229.056
[GRAPHIC] [TIFF OMITTED] T7229.057
[GRAPHIC] [TIFF OMITTED] T7229.058
[GRAPHIC] [TIFF OMITTED] T7229.059
[GRAPHIC] [TIFF OMITTED] T7229.060
[GRAPHIC] [TIFF OMITTED] T7229.061