b"<html>\n<title> - DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES</title>\n<body><pre>[House Hearing, 108 Congress]\n[From the U.S. Government Printing Office]\n\n\n\n\n       DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES\n\n=======================================================================\n\n                                HEARING\n\n                               before the\n\n                SUBCOMMITTEE ON TECHNOLOGY, INFORMATION\n                POLICY, INTERGOVERNMENTAL RELATIONS AND\n                               THE CENSUS\n\n                                 of the\n\n                              COMMITTEE ON\n                           GOVERNMENT REFORM\n\n                        HOUSE OF REPRESENTATIVES\n\n                      ONE HUNDRED EIGHTH CONGRESS\n\n                             FIRST SESSION\n\n                               __________\n\n                             MARCH 25, 2003\n\n                               __________\n\n                           Serial No. 108-11\n\n                               __________\n\n       Printed for the use of the Committee on Government Reform\n\n\n  Available via the World Wide Web: http://www.gpo.gov/congress/house\n                      http://www.house.gov/reform\n\n                                 ______\n\n87-229              U.S. GOVERNMENT PRINTING OFFICE\n                            WASHINGTON : 2003\n____________________________________________________________________________\nFor Sale by the Superintendent of Documents, U.S. Government Printing Office\nInternet: bookstore.gpr.gov  Phone: toll free (866) 512-1800; (202) 512\xef\xbf\xbd091800  \nFax: (202) 512\xef\xbf\xbd092250 Mail: Stop SSOP, Washington, DC 20402\xef\xbf\xbd090001\n\n                     COMMITTEE ON GOVERNMENT REFORM\n\n                     TOM DAVIS, Virginia, Chairman\nDAN BURTON, Indiana                  HENRY A. WAXMAN, California\nCHRISTOPHER SHAYS, Connecticut       TOM LANTOS, California\nILEANA ROS-LEHTINEN, Florida         MAJOR R. OWENS, New York\nJOHN M. McHUGH, New York             EDOLPHUS TOWNS, New York\nJOHN L. MICA, Florida                PAUL E. KANJORSKI, Pennsylvania\nMARK E. SOUDER, Indiana              CAROLYN B. MALONEY, New York\nSTEVEN C. LaTOURETTE, Ohio           ELIJAH E. CUMMINGS, Maryland\nDOUG OSE, California                 DENNIS J. KUCINICH, Ohio\nRON LEWIS, Kentucky                  DANNY K. DAVIS, Illinois\nJO ANN DAVIS, Virginia               JOHN F. TIERNEY, Massachusetts\nTODD RUSSELL PLATTS, Pennsylvania    WM. LACY CLAY, Missouri\nCHRIS CANNON, Utah                   DIANE E. WATSON, California\nADAM H. PUTNAM, Florida              STEPHEN F. LYNCH, Massachusetts\nEDWARD L. SCHROCK, Virginia          CHRIS VAN HOLLEN, Maryland\nJOHN J. DUNCAN, Jr., Tennessee       LINDA T. SANCHEZ, California\nJOHN SULLIVAN, Oklahoma              C.A. ``DUTCH'' RUPPERSBERGER, \nNATHAN DEAL, Georgia                     Maryland\nCANDICE S. MILLER, Michigan          ELEANOR HOLMES NORTON, District of \nTIM MURPHY, Pennsylvania                 Columbia\nMICHAEL R. TURNER, Ohio              JIM COOPER, Tennessee\nJOHN R. CARTER, Texas                CHRIS BELL, Texas\nWILLIAM J. JANKLOW, South Dakota                 ------\nMARSHA BLACKBURN, Tennessee          BERNARD SANDERS, Vermont \n                                         (Independent)\n\n                       Peter Sirh, Staff Director\n                 Melissa Wojciak, Deputy Staff Director\n              Randy Kaplan, Senior Counsel/Parliamentarian\n                       Teresa Austin, Chief Clerk\n              Philip M. Schiliro, Minority Staff Director\n\n   Subcommittee on Technology, Information Policy, Intergovernmental \n                        Relations and the Census\n\n                   ADAM H. PUTNAM, Florida, Chairman\nCANDICE S. MILLER, Michigan          WM. LACY CLAY, Missouri\nDOUG OSE, California                 DIANE E. WATSON, California\nTIM MURPHY, Pennsylvania             STEPHEN F. LYNCH, Massachusetts\nMICHAEL R. TURNER, Ohio\n\n                               Ex Officio\n\nTOM DAVIS, Virginia                  HENRY A. WAXMAN, California\n                        Bob Dix, Staff Director\n                 Chip Walker, Professional Staff Member\n                 Lori Martin, Professional Staff Member\n                      Ursula Wojciechowski, Clerk\n           David McMillen, Minority Professional Staff Member\n\n\n                            C O N T E N T S\n\n                              ----------                              \n                                                                   Page\nHearing held on March 25, 2003...................................     1\nStatement of:\n    Dockery, State Senator Paula, majority whip, Florida State \n      Senate.....................................................     7\n    Forman, Mark A., Associate Director, Information Technology \n      and Electronic Government, Office of Management and Budget.    23\n    Kutz, Gregory, Director, Financial Management and Assurance, \n      U.S. General Accounting Office.............................    32\n    Louie, Jen Que, president, Nautilus Systems, Inc.............    15\n    Rosen, Jeffrey, George Washington University Law School, \n      legal affairs editor of the New Republic...................    55\nLetters, statements, etc., submitted for the record by:\n    Clay, Hon. Wm. Lacy, a Representative in Congress from the \n      State of Missouri, prepared statement of...................    77\n    Dockery, State Senator Paula, majority whip, Florida State \n      Senate, prepared statement of..............................    10\n    Forman, Mark A., Associate Director, Information Technology \n      and Electronic Government, Office of Management and Budget, \n      prepared statement of......................................    26\n    Kutz, Gregory, Director, Financial Management and Assurance, \n      U.S. General Accounting Office, prepared statement of......    34\n    Louie, Jen Que, president, Nautilus Systems, Inc., prepared \n      statement of...............................................    17\n    Putnam, Hon. Adam H., a Representative in Congress from the \n      State of Florida, prepared statement of....................     4\n    Rosen, Jeffrey, George Washington University Law School, \n      legal affairs editor of the New Republic, prepared \n      statement of...............................................    58\n\n \n       DATA MINING: CURRENT APPLICATIONS AND FUTURE POSSIBILITIES\n\n                              ----------                              \n\n\n                        TUESDAY, MARCH 25, 2003\n\n                  House of Representatives,\n   Subcommittee on Technology, Information Policy, \n        Intergovernmental Relations and the Census,\n                            Committee on Government Reform,\n                                                    Washington, DC.\n    The subcommittee met, pursuant to notice, at 9:30 a.m., in \nroom 2154, Rayburn House Office Building, Hon. Adam Putnam \n(chairman of the subcommittee) presiding.\n    Present: Representatives Putnam, Miller, Turner, and Clay.\n    Staff present: Bob Dix, staff director; John Hambel, senior \ncounsel; Chip Walker and Lori Martin, professional staff \nmembers; Ursula Wojciechowski, clerk; David McMillen, minority \nprofessional staff member; Jean Gosa, minority clerk; and \nEarley Green, minority chief clerk.\n    Mr. Putnam. A quorum being present, the Subcommittee on \nTechnology, Information Policy, Intergovernmental Relations and \nthe Census will come to order.\n    Good morning and welcome to the first in a planned series \nof hearings addressing the important subject of data mining \ntechnology or ``factual data analysis,'' as some might refer to \nit.\n    Before we get into my opening statement, considering the \nevents of the world today and the enormous pressures that this \nCongress and our President are under, I would ask that we pause \nfor a moment of silence.\n    [Moment of silence.]\n    Mr. Putnam. Thank you.\n    There are a number of proven uses for this data mining \ntechnology which has played a prominent role in many arenas, \npublic and private, for years. This morning we will work to \ndefine the technology itself and examine the parameters of its \napplication. There is no secret that some have expressed \nconcerns about the role of data mining, particularly in the \ncontext of privacy intrusions. We will attempt to explore the \nmanner in which this technology will continue to be a valuable \ntool in a variety of governmental uses, not just those of \nnational security, while also acknowledging the public interest \nin protecting the privacy of personal information. Data mining \nis a technology that facilitates the ability to sort through \nlarge amounts of information through data base exploration, \nextract specific information in accordance with defined \ncriteria, and identify patterns of interest to its user.\n    As I understand the technology, the user has the ability to \ntailor a data mining program to a particular purpose by \nselecting a number of different data bases to search and \nsetting the criteria for that search. Data mining technology \nhas been utilized successfully for many years in both public \nand private sectors to identify and analyze data that might \notherwise be overlooked or inaccessible. Examples of the \nvariety of commercial or governmental uses associated with data \nmining software would include businesses being able to develop \na targeted marketing campaign in an effort to identify \nprospective customers; government agencies expanding \nopportunities to track down tax evaders; detection of Medicaid \nor Medicare fraud; and corporations using this tool to estimate \nspending in revenue more accurately, just to name a few.\n    For example, a mortgage refinancing lender may seek to \ndetermine potential candidates for their services by attempting \nto identify mortgage holders who have lived in their homes for \na certain period of time in a particular geographic location \nwith a market value range of property at a certain level in \norder to target a special refinancing rate offer. As you can \nimagine, this type of technology is invaluable to a number of \ninstitutions. Because it is such a vast and evolving field, the \nsubcommittee is very interested in exploring the uses and \neffects of this technology in subsequent followup hearings to \naddress more particular applications.\n    While data mining may have many legitimate and worthwhile \nuses, we must always be vigilant of any potential encroachment \non the privacy of the American public. We have great \nresponsibilities as elected officials. We must protect the \nAmerican ideals of life, liberty, and freedom. At times these \nideals would seem to come into conflict with one another, and \nit's our job to ensure that we do all we can to protect the \npublic while maintaining the faith entrusted to us by the \nFounding Fathers to protect the right of the people to privacy \nand freedom. Ben Franklin once said, ``Those who would give up \nfreedom for security deserve neither.''\n    I would like to welcome the following witnesses who are \noffering their expert testimony before us today: The Honorable \nPaula Dockery, Florida State Senator; Dr. Jen Que Louie, \npresident of Nautilus Systems, Inc.; Mark Forman, Associate \nDirector of Information Technology and Electronic Government, \nOffice of Management and Budget, our Nation's CIO; Gregory \nKutz, Director of Financial Management and Assurance, General \nAccounting Office; and Jeffrey Rosen, associate professor of \nthe George Washington University Law School, legal affairs \neditor of the New Republic. Mr. Armey was unable to be with us \ntoday.\n    Interest in expanding the use of this technology at the \nFederal level of government has become more widespread as we \nlook to use modern technology to improve intergovernmental \ncommunications and national security. From our oversight \nperspective as the subcommittee, we have a special interest in \nlearning the pros and cons to data mining technology as well as \nhow its use could be or is being expanded at the Federal level.\n    We appreciate the participation of today's witnesses as \nthey provide tremendous information to the subcommittee on this \nimportant topic, and we thank you again for taking the time out \nof your busy schedules. Today's hearing can be viewed live via \nWebCast by going to reform.house.gov and clicking on the link \nunder ``Live Committee Broadcast.''\n    As we await the ranking member from Missouri, I want to \nrecognize our vice chair, Candace Miller from Michigan, for her \nopening statement. Gentlelady from Michigan.\n    [The prepared statement of Hon. Adam H. Putnam follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.001\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.002\n    \n    Mrs. Miller. Thank you, Mr. Chairman.\n    I want to thank the witnesses for coming today, and Mr. \nForman, good to see you again. I'm sure this committee will be \nseeing certainly a lot of you.\n    As I mentioned at the last committee hearing, I am so \nparticularly interested in the subjects, and this data mining \nis a fascinating one. I had been the Secretary of State in \nMichigan where not only did I have the elections there with all \nthe registered voters, I also did the motor vehicle \nadministrative kinds of things. We had a big data base in our \nState with everybody who had a boat, a snowmobile, and a \ntrailer and a car and a truck and everything, and there was \nalways a lot of consternation about what was government doing \nwith this information; who had the information; for what \npurposes. If you wanted to get licensed in Michigan, you had to \ngive me certain amounts of information. But what was government \ndoing with it and what was the citizens' expectation of what we \nwould do with all of that data?\n    There was a time when our State--and I know many States \nstill do this--sell the information. It is a huge revenue \nsource, of course. But I don't think citizens are normally \nexpecting that the government will be selling their personal \nand private information. And so there is a consternation about \nwho can access the information, how will it be massaged, how \nwill it be utilized, and certainly on the part of the citizens, \ninvasion of personal privacy by ``Big Brother,'' by government.\n    As we march down the information highway, sometimes there \nis a slippery slope there that I think all of us in government \nat the Federal level, the State level, the county level, anyone \nthat has interaction with these various data, that we always \nkeep that uppermost in our mind about invasion of personal \nprivacy.\n    With that being said, the technology is certainly out there \nand it can be utilized to make huge advances in society, and \nthere are so many things in every layer of government that \ncould be done so much better if we were able to use the \ntechnology properly. So I am very pleased to see you all today. \nThank you for coming. I certainly look forward to hearing your \ntestimony this morning. Thank you.\n    Mr. Putnam. I thank the gentlelady. She brings tremendous \nexperience from her days as Secretary of State and work in \nbringing that office into the Information Age.\n    We are joined by a former mayor, the gentleman from Ohio, \nMr. Turner. For your opening statement you are recognized.\n    Mr. Turner. Thank you, Mr. Chairman. I am particularly \ninterested in this area. NCR is located in Dayton, OH, which is \na leading technology company in this issue of data mining for \nthe private sector. And recently they hosted a forum on the \nissue of data mining applications, taking them from the private \nsector and applying them to government issues. And it was an \ninteresting discussion because they began in telling us that \nWal-Mart, at the end of the day, can tell us how many socks \nthey have sold; but we are not necessarily able to tell \nourselves, in reference to foreign visitors, how many visas \nhave expired today and who they are.\n    So the possible applications of data mining on very simple \ntasks that clearly do not violate issues of privacy is a wide \nopen field which we need to pursue vigorously.\n    Also the issue that was fascinating to me in their \ndiscussion is how you look at the process of data mining, not \nlooking first at what data that you have, but looking at what \nquestions do you want answered, and that the issue of \ntechnology is there. The issue of the application of technology \nis demonstrated in the private sector; the issue before us in \ngovernment is to begin the process of asking what questions do \nwe need to know answers to and then turning to the experts in \ndata mining that have applied it in the private sector to \nassist us so we can have those answers in the public sector.\n    Thank you.\n    Mr. Putnam. I thank the gentleman.\n    We will now take the testimony from the witnesses. Each has \nbeen very gracious to prepare written testimony which will be \nincluded in the record of this hearing. And I have asked each \nof you to summarize your presentation into 5 minutes, if you \ncould, to leave ample time for questions and answers. Witnesses \nwill notice that there is a timer with a light on the witness \ntable. Green light means you begin your remarks, the yellow \nlight means it's time to wrap up, and the red light means that \nwe hit the ejection seat.\n    In order to be sensitive to everyone's time schedule, we \nask that you cooperate with us in our time schedule. As is the \npolicy of the Committee on Government Reform, all witnesses \nwill be sworn in. So I'll ask you to rise, please, and raise \nyour right hands.\n    [Witnesses sworn.]\n    Mr. Putnam. All witnesses responded in the affirmative. \nThank you.\n    I would like to introduce our witnesses first and then call \non them for their testimony, followed by questions. We begin \nour panel with an old colleague of mine and a very dear friend \nfrom Florida, State Senator Paula Dockery. Florida is one of \nthe States where data mining techniques have been used in \nseveral areas, and quite successfully. Senator Dockery's \nexperience will lend a very helpful perspective to us today. \nShe serves as majority whip in the Senate as well as chairman \nof the Committee on Homeland Security and Seaports. Senator \nDockery, welcome to the committee and we look forward to your \ntestimony, please.\n\n   STATEMENT OF STATE SENATOR PAULA DOCKERY, MAJORITY WHIP, \n                      FLORIDA STATE SENATE\n\n    Ms. Dockery. Thank you, Mr. Chairman, and good morning, Mr. \nChairman and members of the committee. Thank you very much for \nthe opportunity to be here today not only to share with you \nwhat we think we are doing right in the State of Florida, but \nalso to be part of this distinguished panel and to learn from \nthe experts to my left. I apologize in advance. I'm going to be \nreading so I can make my time limit, and I'm going to probably \nhave to read pretty fast because I timed it at 7 minutes. But I \nwould like to get started with that.\n    The issue of enhanced information sharing by our law \nenforcement and public safety professionals is at the forefront \nin our war against terrorism in our efforts to keep America \nsafe. Florida, I believe, has taken a strong leadership role in \nthis effort, one that can serve as a model for other States. \nThis model and its reliance on data mining is the focus of our \ndiscussion today.\n    Florida uses the term ``factual data analysis'' to describe \nthis information processing system. This process includes the \ncollection of information from multiple sources. Once this \ninformation is processed, analyzed, and evaluated, the \nresulting products represents the intelligence needed to assist \nlaw enforcement. Intelligence can then can be used in a \nproactive and preventive approach to detect criminal patterns, \ncrime trends, modus operandi, financial criminal activity and \ncriminal organizations.\n    Data collection is much different today than in years past. \nThe number of data bases and the information contained there is \nimmense, as is the ability to effectively and efficiently \nanalyze available data in a timely manner. The results can be \noverwhelming. Factual data analysis plays a crucial role in \nfiltering the vast quantity of information by separating the \nsignificant data from the insignificant data. Some individuals \nand groups voice concern for perceived loss of privacy and a \nperceived attempt to foster the examination of private \ninformation.\n    Florida's law enforcement efforts are aimed at utilizing \nonly that specific data which law enforcement already has a \nlegal right to use, while doing so in a proficient, \nprofessional, and expeditious manner. Many safeguards have been \nimplemented to ensure appropriate use of information. These \ninclude user name and password protection, user training, \nagency user agreements, system audits, quality control reviews \nand established purge criteria.\n    Florida's intelligence criminal systems are operated in \ncompliance with standards established by 28 Code of Federal \nRegulations, Part 23. This regulation was written to protect \nthe privacy rights of individuals and to encourage and expedite \nthe exchange of criminal intelligence information between and \namong law enforcement agencies. The regulation provides \noperational guidance for law enforcement agencies in five \nprimary areas.\n    Prior to the September 11th attacks, Florida utilized \nfactual data analysis on criminal investigations through the \nFinancial Crime Analysis Center at the Florida Department of \nLaw Enforcement. The Center integrates and analyzes financial \ndata in partnership with local and Federal criminal justice \nagencies to identify and combat financial crimes.\n    The Center has developed a ``data warehouse'' which \ncontains information from various sources already available to \nlaw enforcement. As part of the analytical process, the Center \nutilizes specialized software to identify anomalies associated \nwith financial transactions. Analytical personnel and \ninvestigators then examine the results to determine if the \ninformation is related to a crime. The software currently used \nby law enforcement agencies provides a graphical representation \nof suspicious activity identified by financial services \ncompanies. This method ensures that the user does not see \nindividual records, only the result, a safeguard that we \nbelieve is very important.\n    The pattern of behavior is a key element of the decision \nprocess of whether to investigate further. Users of this system \nare trained to identify behaviors of known criminal activity \nduring all stages of money laundering. It is important to note \nthat by FDLE guidelines, reasonable suspicion is necessary \nbefore initiating an investigation.\n    When reasonable suspicion is developed, analyzed data are \nsupplied to local State and Federal law enforcement agencies as \nwell as to other States for possible investigation. This \nproactive approach results in increased team work amongst law \nenforcement entities as well as a force multiplier effect for \nthe investigative process. FDLE agents regularly travel to \nother States to investigate common targets.\n    Arizona and Florida are known as the two most effective \nStates in conducting these types of proactive investigations.\n    After the September 11th attacks, FDLE integrated this \nprocess and applied it toward the fight against terrorism. FDLE \nemployed the assistance of public corporations that have access \nto civil data records. In certain domestic security related \nsituations, FDLE has contracted with nationally recognized \npublic search businesses to analyze the records based on \ncriteria supplied by law enforcement. After the data is \nprocessed, the results are provided to law enforcement for \nfurther review. To ensure that the results are as indicative as \npossible, a mathematical analysis is used and includes as many \nas 14 criteria, producing a probability score for criminal \nbehavior. Prior to additional investigation or dissemination, \nintelligence analysts and investigators examine only the \nresults with the highest scores. This information can be used \nto identify, locate, target and monitor terrorists and other \ncriminals. This ability is essential if future terrorist events \nare to be prevented.\n    Florida has partnered with a vendor, Seisint Technologies, \nto provide the data analysis tools using both public and \nprivate data. Over several years, Seisint Technologies has \nacquired technology and data for multiple sources useful to law \nenforcement. Following the terrorist attacks of September 11th, \nSeisint focused on helping local State and Federal law \nenforcement agencies locate and track individuals who might be \na threat to the United States. As a result of their partnership \nwith Florida law enforcement, a customized investigative tool \nwas developed. This system has already proven useful in that a \nreview of the known information intelligence and reported \nactivities of the 19 hijackers associated with the terrorist \nevents of September 11th identified several common and \nassociated variables. This system has proven useful in Florida, \nbut the need for timely sharing and exchange of information \nnationwide remains a critical need.\n    Mr. Putnam. Thank you Senator Dockery.\n    [The prepared statement of Ms. Dockery follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.003\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.004\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.005\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.006\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.007\n    \n    Mr. Putnam. I would like to introduce our next witness, Dr. \nJen Que Louie. He has spent over 25 years working with data \nanalysis systems, specifically with large data base systems, \ndata warehousing and data mining. Some of his projects include \ndesigning, developing, and refining military logistics and C3I \ncapability models for the Department of Defense. He has \ndesigned and implemented medical system diagnostic and analysis \nprograms, knowledge- and rules-based business systems, work \nflow process and analysis systems, image management storage and \nretrieval systems, and emergency management information \nsystems. Dr. Louie is president of Nautilus Systems, which is \nlocated in Fairfax, VA. We look forward to your testimony. \nWelcome to the subcommittee.\n\n STATEMENT OF JEN QUE LOUIE, PRESIDENT, NAUTILUS SYSTEMS, INC.\n\n    Dr. Louie. Good morning, Mr. Chairman and distinguished \nmembers of the subcommittee. Thank you for the opportunity to \ntestify today on data mining current applications and future \npossibilities. Other than my prepared statement, this is a \nquick summarization of data mining in general.\n    It is difficult to come up with a universal definition for \ndata mining. One consistent focus of data mining has been \nbasically that it is an analytic process with an ultimate goal \nof prediction. You are looking to find something that is going \nto be actionable, that is going to get you somewhere. In a \nnutshell, data mining is an extraction of knowledge or \ninformation from data. And at first glance, this may not seem \nlike a very powerful utility, but unlike mere data, knowledge \nleads to incisive decisions and previously unknown \nrelationships that could have a bearing on your decision \nprocess.\n    Data mining, unfortunately, like artificial intelligence of \nthe early eighties, is getting a lot of media hype and we will \ncall it slightly exaggerated benefits or feasibility of it. And \nwhat I usually tell my clients is the first fallacy is data \nmining tools. Data mining is a process. It is not a specific \ntool, and the process will generally raise more questions than \nit does produce answers. And while data mining does have the \nability to uncover patterns that can be remarkable, it still \nrequires a human with skills, analytical skills, to interpret \nthe meaning of what patterns you are looking at.\n    And my usual examples are a Dilbert cartoon where the \nmarketing person is telling the CEO, ``Our product is always \nseen with people who have flu-like systems.'' And the product \ndevelopment team is the reason they have flu-like systems; it \nis because they are taking the product. So how you interpret \nthe data, how you apply it is an important part of how you \napply data mining.\n    Data mining is sometimes advertised and portrayed as being \nan autonomous process; that once you have these rules that you \ndon't require analysts, and that is another fallacy. Another \nfallacy is that it will pay for itself very rapidly. While \nthere is sometimes, we will call it articles, portraying very \nhigh returns for the investment in data mining, those are not \nvery common. And yes, you can achieve a lot of return on your \ninvestment with data mining. Credit card fraud is one. Tax \nevasion is another. Money laundering. There are several tools \nthat are out in the market that require a lot of extensive \ncapabilities. Our company has worked with FinCEN on clearing a \nlot of their caseloads. Those, I would say, are great paybacks \nfor the amount of money invested in those areas.\n    Data mining also sometimes raises the question about \nmissing data. Sometimes the data that's missing is more \ninteresting than the data that is there, and that provides some \nother insights. Meeting your data mining expectations, planning \nis the single most important step in any data mining effort. \nYou have to know and understand what the consumers of your \ninformation product need and basically deliver it. Once you \ndetermine what that is, the next thing in your investment in \nyour data mining effort is the environment that you run it in. \nIt should be what we call the best you can get, the fastest you \ncan get, the most storage you can get, and always allow \nyourself plenty of time to review and analyze the data and look \nat all the facets that are there in order to determine that you \nare delivering the right message, and it is actionable in the \ndirection that user needs that information to be.\n    So, my quick summation: Data analysis is concerned with the \ndiscovery and examination of patterns and associations found \nwith data. There are various ways to achieve this objective, \nbut all share the same fundamental notion that patterns \nexamined are present in the data. Also remember that what is \nnot in data can be just as interesting in certain situations, \nand more useful to know.\n    Data mining is a process that involves multiple analytical \ntools, methodologies driven by the needs of the information \nproduct's consumer. The quality of information is directly \nproportional to the trustworthiness and quality of that data. \nThe confidence of the prediction is dependent upon the data \nmining practitioner's subject matter expertise and insight to \ndeliver actionable results. The data mining process is highly \ncomputational, takes time; therefore, planning the approach and \nselection of tools is influenced by the needs of the consumer. \nThank you.\n    Mr. Putnam. Thank you very much, Dr. Louie.\n    [The prepared statement of Dr. Louie follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.008\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.009\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.010\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.011\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.012\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.013\n    \n    Mr. Putnam. Our next witness is Mark Forman. He served as \nAssociate Director for Information Technology in E-Government \nfor the Office of Management and Budget, a position he has held \nsince June 2001. He is effectively in charge of information \ntechnology oversight for the entire Federal Government. And \nhis--he has a background in the private sector from Unysis and \nIBM as well as work at the Senate Governmental Affairs \nCommittee staff. He is an invaluable resource on all of our IT \nissues, and we believe his insight from the Federal perspective \nwill be enlightening to us as well. So with that, Mr. Forman, \nyou are recognized.\n\n STATEMENT OF MARK A. FORMAN, ASSOCIATE DIRECTOR, INFORMATION \nTECHNOLOGY AND ELECTRONIC GOVERNMENT, OFFICE OF MANAGEMENT AND \n                             BUDGET\n\n    Mr. Forman. Thank you, Mr. Chairman, and members of the \nsubcommittee. Thank you for the opportunity to appear and to \ndiscuss the administration's views on data mining. And I also \nwant to thank you for taking a very rational, well-balanced \napproach in exploring data mining issues and opportunities. \nWhile there are many definitions of data mining, the \ncommittee's definition is generally accepted and we believe \nhelpful in defining the issues and its challenges.\n    I would like to start by talking about private sector uses \nhow we are using it in the Federal Government, and then the \nchallenges and opportunities. The private sector uses data \nmining to make sense of a wide breadth of data. Some examples \nare customer relationship management. Applied to customer \nrelationship management, data mining is used to analyze \ndisparate customer data and provide insights into customer \nneeds and wants. Companies that use data mining shorten \nresponse time to market changes, which allows for better \nalignment of their products with the customer needs. They do \nthis to increase revenue performance and allocate investment to \nproducts that meet customer demand effectively.\n    Fraud detection. Companies use software that provide \ncomprehensive transaction-level financial reporting and \nanalysis to support automatic fraud detection and proactive \nalerting.\n    Retail analysis and supply chain analysis. Companies such \nas Wal-Mart are broadly recognized for analyzing sales trends. \nRetail analysis and supply chain analysis can be used to \npredict the effectiveness of promotions, decide which products \nto stock in each store, and help managers understand cost and \nrevenue trends in order to adjust pricing and promotion in \nanticipation of changes in marketplace conditions.\n    Medical analysis and diagnostics. The health care industry \nuses analysis to predict the effectiveness of surgical \nprocedures, medical tests and medications. High-risk segments \nof the population can be identified and targeted for proactive \ntreatment. The result is improved quality of life for patients, \nreduced stress on hospitals and insurance providers using such \nactivities as proactive approaches to healing, I think it is \nfair to say, and I have many more examples of the commercial \nuse of data mining. All of them deal with how fast we can \nunderstand what customers need, and the Federal Government \nwould be well advanced to be able to respond more quickly to \nwhat our citizens need.\n    So I will turn now to the government applications of data \nmining and go through some of the examples and more of the \neffects, both the way we deal with the citizens and how we \nmanage the government.\n    The Federal Government analyzes data that has been \ncollected from the public for several purposes, including \ndetermining the eligibility of applicants for Federal benefits, \ndetecting potential instances of fraud, waste, and abuse in \nFederal programs and for law enforcement activities. Some of \nthis analysis is facilitated by data mining.\n    So let us talk through a few of the examples. First, \nfinancial management. Poor management practices create \nopportunities for a wide range of fraud and abuse in the use of \ngovernment travel and purchase cards. Several agency inspector \ngeneral investigations have used data mining-type tools to \ndocument inappropriate purchases and misuse of cards. OMB is \ntaking and will continue to take substantive affirmative steps \nto ensure agencies improve their internal control systems to \nmonitor expenditures appropriately.\n    Human resource management. One of the 24 E-Government \ninitiatives, which we call the Enterprise H.R. Integration, and \nwhich is managed by the Office of Personnel Management, is \nleading the effort to provide a governmentwide data warehouse \nof H.R. information to minimize the workload as employees move \nfrom one department to another. A key component of this is the \nE-Clearance project. OPM and its partner agencies on the E-\nClearance project are using data mining to more quickly access \ninformation which speeds up the overall security clearance \ninvestigation process.\n    Reducing erroneous payments and fraud detection. Data \nanalysis accomplished by the matching of electronic data bases \nbetween government agencies has been an important and \nsuccessful tool for identifying improper payments under Federal \nbenefit and loan programs, as well as detecting potential \ninstances of fraud, waste, and abuse in the Federal programs. \nAs highlighted in the President's 2004 budget, agencies are now \nrequired to report the extent of erroneous payments made in the \nmajor benefit program. Through the President's Management \nAgenda Initiative for improving financial performance, we are \ngetting a hand on the problem of erroneous payments. \nFurthermore, the administration has proposed several pieces of \nlegislation regarding the administration's authority to share \ndata that will greatly improve efforts erroneous payments.\n    Policy analysis. The quality of policy decisions is a \nfunction of our ability to correctly analyze enormous amounts \nof data that describe a problem faced by modern society. For \nexample, the Department of Education mines data from a variety \nof student financial aid systems, permitting professionals to \nanalyze Federal education programs quickly and easily without \nthe time expense and burden on citizens.\n    Law enforcement and homeland security. Federal agencies \nhave found data mining techniques to be an important tool for \nassisting law enforcement in combating terrorism. For example, \na system such as the Department of Homeland Security's Bureau \nof Customs and Border Protection operates the Automated \nCommercial Environment which utilizes a series of data mining \ntools to strengthen border security efforts.\n    Benefits and pitfalls. While the use of data mining to \naccess timely data and to identify relationships that were \npreviously known as powerful tools for identifying errors, \nfraud, threats, etc., the application of such techniques to \npersonal information raises serious questions about privacy and \nhow it should be protected. In my written statement I focused \non two areas. First, the data analysis must be consistent with \nlaw. We monitor that with business cases. Second, the Federal \nInformation Security Management Act further requires protection \nof the data under security processes and techniques. Mr. \nChairman, thank you.\n    Mr. Putnam. Thank you very much.\n    [The prepared statement of Mr. Forman follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.014\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.015\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.016\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.017\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.018\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.019\n    \n    Mr. Putnam. For insight from a Federal agency that uses \ndata pattern analysis, we have Gregory Kutz, Director of \nFinancial Management and Assurance at the General Accounting \nOffice. As a Director in the Financial Management Assurance \nTeam, Mr. Kutz is responsible for financial management issues \nrelating to the Department of Defense, NASA, the State \nDepartment, and AID. He has also been recently involved in \npreparation of reports issued by GAO and testimony relating to \ncredit card fraud and abuse at DOD, financial and operational \nmanagement issues at the IRS, financial condition and cost \nrecovery practices of the Department of Energy's Power \nMarketing Administration, the Tennessee Valley Authority, and \nAMTRAK.\n    You have been very busy. We look forward to your testimony.\n\n STATEMENT OF GREGORY KUTZ, DIRECTOR, FINANCIAL MANAGEMENT AND \n           ASSURANCE, U.S. GENERAL ACCOUNTING OFFICE\n\n    Mr. Kutz. Thank you, Mr. Chairman, and members of the \nsubcommittee. I'm here to talk about our use of data mining in \naudits of Federal programs. To date we have used data mining \nprimarily as an integral part of our audits of credit card \nprograms.\n    My testimony has two parts: First, the use of data mining \nin our audits and investigations; and second, future uses of \ndata mining and related challenges.\n    First, our strategy is to use data mining to put a face on \nissues of breakdowns in internal controls. It allows us to go \nbeyond simply saying that a program is vulnerable. For example, \ndata mining allowed us to report that government credit cards \nwere used for escort services, women's lingerie, prostitution, \ngambling, cruises, and Los Angeles Lakers tickets.\n    Our data mining has helped us to identify specific \ninstances of fraud, waste, and abuse. The posterboard shows \nseveral examples of government travel card abuse that we \nidentified through data mining, including the purchase of a \nused car from Budget Rental Car; adult entertainment charges, \nincluding gentlemen's clubs; Internet and casino gambling, \nincluding an individual who charged $14,000 to pay for his \nblackjack gambling habit and reimbursed travel money used to \npay for closing costs on a home purchase. For each of these \nexamples, we used various data mining inquiries to identify the \ntransactions and completed the case with auditor and \ninvestigator followup.\n    The second posterboard is an excerpt from a government \npurchase card statement. As you can see, somebody went on a \nChristmas shopping spree. This bill, which includes nearly \n$12,000 of fraudulent charges, was identified using data \nmining. We identified these fraudulent transactions because of \nthe suspicious vendors and because of the timing of the \ntransactions. We used these findings in conjunction with \nsystematic internal control testing to make recommendations to \nFederal agencies to develop effective systems and controls that \nprovide reasonable assurance that fraud, waste, and abuse are \nminimized.\n    An important element of our success with data mining is the \nsynergy of auditors and investigators working together. Our \nauditors have expertise in financial systems, data \nmanipulation, and evaluating internal control systems. Our \ninvestigators bring a much different perspective. For example, \nSpecial Agent Ryan, who is with me today, has several decades \nof experience working on financial crimes for the Secret \nService. Investigators and auditors work together to assess \nsystem vulnerabilities and develop our data mining strategies.\n    Moving on to my second point, our data mining work for the \nCongress is expanding. Currently, we have a number of audits \nunderway that use data mining, including nine that I am \ndirectly responsible for. Some examples of our expanded data \nmining audits include DOD vendor payments, Army military pay \nsystems, HUD housing programs and Department of Energy national \nlaboratories. As we move forward, challenges will include data \nreliability and security issues.\n    For the credit card work to date, we have used commercial \nbank data bases to do our data mining, which we found to be \nhighly reliable. However, as we move beyond the credit cards, \none major challenge is the poor quality of Federal Government \ndata bases. In most cases, data base quality issues can be \novercome, but they result in less productive data mining and a \ngreater cost to our work.\n    Data security and privacy protection is another challenge. \nFor example, in handling large data bases of credit card \ntransactions, we developed strict protocols to protect this \nsensitive data. We were especially concerned with protecting \ncredit card account numbers and individuals' Social Security \nnumbers. Data security issues must be addressed before \nembarking on audits involving data mining.\n    In summary, data mining is a powerful tool that has \nincreased our ability to effectively audit Federal programs. We \nare just beginning to make full use of data mining strategies. \nWith the right mix of technology, human capital expertise, and \ndata security measures, we believe that data mining will \ncontinue to improve our audit and investigative work for the \nCongress. Mr. Chairman, that ends my statement.\n    Mr. Putnam. Thank you Mr. Kutz. And I want to thank all the \nwitnesses for being so gracious and complying with our time \nlimitations.\n    [The prepared statement of Mr. Kutz follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.020\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.021\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.022\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.023\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.024\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.025\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.026\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.027\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.028\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.029\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.030\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.031\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.032\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.033\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.034\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.035\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.036\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.037\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.038\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.039\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.040\n    \n    Mr. Putnam. Our final witness is Jeffrey Rosen, a law \nprofessor at George Washington Law School. Mr. Rosen's area of \nexpertise is in privacy and technology issues. He has written \ndozens of articles on the subject as well as a book. His \ntestimony will be valuable as we look to the legal and ethical \nquestions surrounding the use of data mining technology. \nWelcome.\n\n STATEMENT OF JEFFREY ROSEN, GEORGE WASHINGTON UNIVERSITY LAW \n        SCHOOL, LEGAL AFFAIRS EDITOR OF THE NEW REPUBLIC\n\n    Mr. Rosen. Thank you, Mr. Chairman, and members of the \nsubcommittee. It is an honor to be here. I am delighted that \nyou are holding this hearing because the effort to strike a \nbalance between privacy and security is a bipartisan issue and \nI am delighted that you are informing yourself about the \ncomplicated legal and technological choices that you face as \nthese technologies are implemented.\n    My thesis this morning is simple: It's possible through law \nand technology to design data mining systems that strike better \nrather than worse balances between privacy and security. But \nthere is no guarantee that the executive branch will demand \nthem or the technologist will provide them on their own. You \ntherefore, ladies and gentlemen of the Congress, have a special \nresponsibility to provide legal and technological oversight to \nensure that the technologies are developed and deployed in ways \nthat strike a good rather than a bad balance between privacy \nand security.\n    Let me give you an example of the kind of design choice \nthat I have in mind. And I want to focus just for the sake of \nargument on the Total Information Awareness Program that \nCongress has recently decided, at least for the foreseeable \nfuture, to block. Total information awareness provides a model \nfor the kind of mass dataveillance that we have been discussing \nthis morning and is being proposed in other contexts. Now, just \na question of definition, ``mass dataveillance'' refers to the \nsuspicionless surveillance of large groups of people. And that \nis different from personal dataveillance of the kind that \nSenator Dockery described which involves targeted surveillance \nof individuals who have been identified in advance as being \nunusually suspicious. Mass dataveillance poses special dangers. \nIn some ways it poses some of the same dangers of the general \nwarrants that the framers of the fourth amendment to the \nConstitution were especially concerned about prohibiting.\n    When the government engages in mass dataveillance without \nindividualized suspicion, there is a danger of unlimited \ndiscretion, as the government searches through masses of \npersonal information and searches suspicious activity without \nspecifying in advance the people, places, or things it expects \nto find. Both general warrants and mass dataveillance run the \nrisk of allowing fishing expeditions in which the government is \ntrolling for crimes rather than particular criminals, violating \nthe privacy of millions of innocent people in the hope of \nfinding a handful of unknown and unidentified terrorists. At \nthe same time there is an important question of effectiveness.\n    And I want you to think pragmatically about these \ntechnologies. Will they work in the national security arena? \nUnlike people who commit credit card fraud of the kind that Mr. \nKutz described, credit card fraud is a form of systematic, \nrepetitive, and predictable behavior that fits a consistent \nprofile identified by millions of transactions. There is no \nspecial reason to believe that terrorists in the future will \nresemble those in the past. By trying to pick 11 out of 300 \nmillion people out of a computer profile, you may be looking \nfor a needle in a haystack, but the shape and the color of the \nneedle keep changing and, as a result, the profiles may produce \ngreat numbers of false positives: those people wrongly \nidentified as terrorists.\n    I want you to think about the privacy issues and the \neffectiveness issues. Does the technology that works in a \ncredit card arena make sense to apply in the national security \narena? Assuming that these technologies will be deployed in \ndifferent spheres, I urge you to recognize that they can be \ndesigned in better or worse ways. The Total Information \nAwareness Office itself recognized this and proposed technology \nthat it called ``selective revelation,'' which proposed to \nminimize personally identifiable information while allowing \ndata mining and analysis on a large scale. The insight of \nselective revelation is useful and may provide models for ways \nprivacy and liberty could be protected at the same time.\n    The Total Information Awareness Office had a project called \nGinisys that was exploring ways of separating identifying \ninformation from personal transactions and only allowing the \nlink to be recreated when there is legal authority to do so. \nThis might allow, for example, the Centers for Disease Control \nto have access to medical information while other groups do \nnot.\n    Using this model of selective revelation, Congress could \nthink about creating laws and technology that separate \nidentifying information from the data itself.\n    And Mr. Forman talked about the searches in existence with \ncurrent law. My strong belief is current law is not adequate, \nthe kind of complicated regulation that faces us, and you need \nto think creatively about rising to this new challenge by \ndeveloping new oversight bodies and new technologies to ensure \nthe protection of privacy. But just hypothetically we could \nimagine what those regulations would look like. Congress could \ncreate a special oversight court with the authority to decide \nwhen identifying data obtained during mass dataveillance may be \nconnected to transactional information. After intelligence \nanalysts have identified a series of transactions that they \nthink might be evidence of a terrorist plan or suggest that a \nparticular individual is unusually suspicious, they could \npetition the oversight body for authorization to identify the \nindividuals concerned. In deciding whether or not to grant the \nrequest, Congress could direct the court to satisfy itself that \nthe crime for which the evidence has been presented is a \nserious threat of force or violence rather than a low-level or \ntrivial crime, and that the evidence suggests a link between \nthe suspects and terrorists. If the court granted the order, \nthen the analyst could link the identifying information and \nthey could share the information with State and local bodies \nand so forth.\n    And there are other needs for regulation. You might have to \ncreate standards for citizen oversights. Citizens should be \nable to correct their data if it's incorrect or misused. And \nfair information practices would give citizens the right to \nknow the information that the government has collected. So, you \nsee the general model. The search is anonymous unless there is \ncause to believe that a particular individual is suspicious, \nand then there is oversight to make sure that the individuals \nare identified in connection with serious crimes. Merely to \ndescribe the complexity of this regulation is to raise \nlegitimate questions about whether Congress is ready to adopt \nthem.\n    But Congress has met its oversight responsibilities in the \npast. The most important checks on poorly designed technologies \nof surveillance since September 11th have come from Congress \nranging from the decision to block total information awareness \nin its current form to the insistence on creating oversight \nmechanisms for the Carnivore e-mail program. I urge Congress to \naccept the task of learning about the design choices inherent \nin these technologies. You have it in your power to strike a \nbalance between liberty and security, and all you need now is \nthe will. Thank you very much.\n    Mr. Putnam. Thank you Mr. Rosen.\n    [The prepared statement of Mr. Rosen follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.041\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.042\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.043\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.044\n    \n    Mr. Putnam. I certainly believe our witnesses have set the \ntable and created an environment for some outstanding dialog.\n    The gentlelady from Michigan has another appointment so I \nwill recognize her to lead off with our questions.\n    Mrs. Miller. Thank you, Mr. Chairman. I think my question \nis for Mr. Kutz.\n    As I heard you talk about some of the various audits that \nyour agency is currently engaged in, you talked about nine \ndifferent audits that you are getting involved with, Energy \nlabs and DOD, etc., and certainly the testimony you gave about \nthe credit card fraud is startling. It is sickening. Those are \nthe kinds of things I think make people crazy about what is \nhappening at the Federal level. But you know, last week the \nCongress had a very exhaustive debate about a budget resolution \nand there was a lot of talk about waste, fraud, and abuse and \nthe kinds of problems in large numbers numerically that we \ncould get at to look at some reduction in our budgeting \nprocess.\n    And I heard a lot of conversation last week--and I don't \nknow if this is one of your nine universes or not--but in the \narea of Social Security, that there is as much as 10 percent of \nthe Social Security payments that are going to people who are \neither deceased or for some reason do not qualify. And I don't \nknow if that is an area that you are auditing in your universe \nthere; and, if so, what kind of numbers are we talking about \nand how would you do a construct to do the data mining? Do you \nhave any idea of how you might begin to proceed to take a look \nat that type of waste, fraud, and abuse?\n    Mr. Kutz. Social Security is not one that we have on our \nplate right now. We typically do our work at the request of \nvarious Members of Congress or committees or subcommittees, and \nthat is not one we have been asked to do at this point.\n    Some of the ways you can use the technology for that, for \nexample, have been used by the Inspector General to look for \npeople who are receiving benefits that are over 90 or 100 years \nold, and those are potential indicators of a family that might \nbe keeping the checks and didn't report the death to Social \nSecurity and therefore received improper payments.\n    There are certainly lots of different queries and methods \nyou could use. And I believe the Inspector General has done a \nlot of that, and I believe it has been used extensively there.\n    Also for Medicare, there has been extensive use of data \nmining technologies to find fraud, waste, and abuse and also to \nproject the amount. Annually, the various agencies project how \nmuch is going out the door in improper payments and, as you \nknow, there are tens of billions of dollars. And we are talking \nabout real money here, which is why we need good internal \ncontrol systems to minimize this waste, fraud, and abuse.\n    Mr. Forman. If I may, let me point out two projects in \nparticular. One is 1 of the 24 E-Government Initiatives that is \ncalled the E-Vital Project. And so much of this is tied to, for \nexample, the Social Security Administration getting timely \nnotification when a person has passed on. That is explicitly \nthe target of the E-Vital Project that continues to have good \ntraction in the States that have been moving the death records \nand other medical records on-line. It is a slow process. And as \nyou may recall, Michigan may have been one of the States. The \nState has charged the agency to provide that information to \nthem. So there is some negotiation, because the cost should be \nreduced when we put in place that as a computer system.\n    The other project is called PARIS, the Public Assistance \nReporting Information System, and that is a joint Federal/State \ninformation network that was set up explicitly to allow for \ndata matching and mining on interagency-related benefits \nprogram. So that would cover things like Supplemental Security \nIncome, the TANF program, Medicaid, Food Stamps, and Veterans \nAffairs Program.\n    Mrs. Miller. In regards to the Social Security link that \nthe States have as they interact with the Federal Government, \nisn't it true now--because I think every State is required to \nsolicit the Social Security number of every licensed driver--\nthat is something new in the last several years, and all of the \nStates are required to link to the Social Security \nAdministration because of that? Has that been helpful in \ninformation sharing?\n    Mr. Forman. You know, to be quite honest, I think \nultimately, while there is a requirement to share information, \nthe reality is a big chunk of the benefit here in terms of \nidentifying people who are getting Social Security income but \nhave passed on comes back to the ability of States to share \ninformation on the death certificates in a timely manner. And \nsome of the States and local county offices where that \ninformation initially starts just haven't been electrified yet.\n    Mrs. Miller. My experience had been with the Social \nSecurity link that we had in Michigan--I know some of the other \nStates were mentioning this as well--there was no way to verify \nthe Social Security number, so someone could give you any \ndigits that they wanted to. There was no way for the States to \nverify that the Social Security number was in fact a valid \nSocial Security number. That is a problem, I think.\n    Mr. Forman. There has been some progress made on that, and \nI know we looked at this a month ago when we did a review. I \nwould ask, if it is OK with the chairman, that we get back to \nyou on the Social Security Administration progress on that.\n    Mr. Putnam. We have been joined by the big Chair, the \nchairman of the full committee. Mr. Davis, do you have any \ncomments or questions?\n    Mr. Davis. I will be very brief. I think data mining is \ncritical. If you go back 100 years, a visionary at the start of \nthe 20th century might have said, what is going to guide the \neconomy in the 20th century? The visionary might have said, \noil. And in fact, it was your entrepreneurs and your \nvisionaries who figured out how you get the oil, identified \nwhere the oil was, how you get it out of the ground, how you \nrefine it, how you get it to markets, dominated much of the \neconomic activity of the 20th century.\n    Here we are at the start of the 21st. What would a \nvisionary say now? Really, the oil today is information. How \nwould we get that information and get it out of the ground, so \nto speak; how do we refine it; how do we distribute it; what \nuses does it have? And it is those entrepreneurs that are going \nto in large part be the economic wunderkinds of the 21st \ncentury. Had we had the EPA and all of the regulations on oil \nin 1900, this stuff would still be in the ground. We never \nwould not have gotten it out.\n    My theory is we need to be slow about it coming in and \noverregulating. You let the marketplace and let the public and \nlet the industry come up with its own protocols before the \ngovernment comes in and starts imposing a regulatory and taxing \nregime that could stifle the growth and the potential for this. \nThat is kind of the way I look at it. Certainly there is going \nto be a role for government down the way, and maybe in ways we \ndon't even envision today, because I think we are just at the \nvery beginning of a whole revolution. But that is kind of the \nway I have looked at it.\n    And I don't know if you have any reaction. Mark Forman has \nbeen working with us on a number of issues. I don't know if \nanyone wants to react with that or disagree. Obviously, the \nprofessor is here and has his own view.\n    Mr. Rosen. I guess I would just urge the chairman to ask \nwhether the kind of data mining that is appropriate in the \nprivate sphere can be brought into the national security arena. \nMuch of the history of our privacy laws for the past 50 years \nhas been based on the idea that completely unregulated \ninformation sharing is not consistent with the values of the \nConstitution or of American citizens. We don't want every low-\nlevel information officer in the field to know that I had a \nyouthful indiscretion or I am late in my child support payments \nbefore I go onto an airplane, or that I am late on my credit \ncard or maybe I have some IRS issues against me.\n    Complete transparency of information, total unregulated \nuse, which is what many Silicon Valley people are urging, \nwouldn't be consistent with the value of the fourth amendment. \nIt wouldn't be consistent with current privacy laws which \nprohibit privacy sharing without good cause, and it also--and I \nwant to urge the chairman to think about it--would it be \neffective? Is there any reason to believe that centralizing all \nof our public and private data bases and allowing for a risk \nprediction to be made would identify terrorists?\n    It is not like credit card fraud. Credit card fraud is \nsomething you have 10 million examples of it and it takes \npredictable patterns. People who steal credit cards test them \nat service stations and then buy clothes at a mall. And because \nit happens so often, you can use the technology to predict \ncredit card fraud.\n    We have no reason to believe that the next terrorist attack \nis going to take the place of people who lived in Florida and \nwent to flight schools. It could take many forms. I respect \nyour libertarian instincts and the desire to use this \ntechnology as effectively as possible. I just would say that if \nyou, the Congress, doesn't stand up for Constitutional values \nto ensure inefficiencies as well as centralization, I don't \nthink the technologists of the executive branch will either.\n    Mr. Davis. Most of this information has been public. It has \njust never been able to get collated and so rapidly deployed \nand disseminated. That's what scares people. It is something in \nthe old days that could have taken 10 private detectives 6 \nmonths going through records to find you can get like that.\n    And as you spoke of in your testimony, it is a balance \nissue; and I don't know what that right balance is, but I am on \nthe go-slow side rather than the overregulation side. We know, \nfor example, that the terrorists on September 11th--the \ninformation that was out there between flight schools and \narrests and Immigration. Had we been able to collate that \ninformation and get it in one place, we could have prevented it \nfrom happening.\n    And some of you view this as an infringement on privacy, \nbut I don't know what you say to the victims and the families \nof over 3,000 people that died that day. I don't know what the \nright balance is, and I agree, and that is why we need to hear \nfrom you and keep you at the table as we work our way through \nthis brand-new territory. And that is why we appreciate you \nbeing here.\n    And I am not sure we have that right balance today. And I \nam not sure, given the technologies that we have today, that we \ncan even start writing rules, because who knows what \ntechnologies will be deployed and invented tomorrow that we may \nnot be able to have any idea what their application could be? \nAnd I appreciate everybody's input and I appreciate you holding \nthis very important hearing.\n    Mr. Putnam. I believe the Senator had a response.\n    Ms. Dockery. Thank you, Mr. Chairman, and I just wanted to \ncomment that I agree very much with the Congressman, \nCongressman Davis, and to comment to the professor, we in \nFlorida believe that the factual data analysis that we are \nusing now is appropriate for tracking down terrorists, and we \nalso believe that it led to the arrest recently of--a national \nnews story you may have heard about of a professor at \nUniversity of South Florida. And that was done through \ncollection of information that was all part of our public \nrecords in the State of Florida that showed some connections.\n    So we think that this is a valuable tool and we think we \nhave shown in Florida its criminal possibilities. I will say \nthat in Florida, we have one of the most open record laws in \nthe country. We call it ``Government in the Sunshine,'' and it \nis kind of interesting that the people in Florida just in the \npast election voted a Constitutional amendment to require that \nanytime we provide an exception to the open records law, it \nwould now require a two-thirds vote of both the House and the \nSenate to make that exemption. The open public records law \nactually helps law enforcement in Florida by making more and \nmore records available for us to use in our factual data \nanalysis.\n    So to that extent I wholeheartedly support Congressman \nDavis's comments and would tell you that we probably need some \nregulation to prevent us from going overboard and to protect \nthe forth amendment rights, but we should err on the side of \nallowing the technologies to prove themselves out before we \noverregulate an industry that is just beginning.\n    Mr. Putnam. For the professor and anyone else who would \nlike to respond, how would you compare data mining technology \nto the emerging technology of DNA as a law enforcement tool 25 \nyears ago?\n    Mr. Rosen. I think DNA offers greater security benefits and \nfewer privacy threats for this reason. DNA is usually used in \nthe kind of focused investigation of the kind that Senator \nDockery was just suggesting: You have a clue and you can plug \nit into a data base and it can be used to exonerate or \ninculpate. And as long as there are restrictions on the use of \nDNA for secondary purposes, the government can't turn it over \nto insurance companies to deny me a job or make predictions \nabout my future health, I don't have privacy concerns about it.\n    Data mining, by contrast, of the kind that Roger Clark \ncalls ``mass dataveillance'' rather than ``personal \ndataveillance,'' poses very different privacy issues. And I \nwant to distinguish the two, because Senator Dockery just \ntalked about how useful it is once you know something about an \nindividual. This USF professor, you can plug him into a data \nbase and draw connections. That is the same thing that was done \nwith the sniper. When you have the tip in Alabama and plug it \ninto the data bases and establish connections, that is useful \nand that doesn't raise grave privacy concerns because the \nindividual has been identified in advance as suspicious.\n    My concern is the kind of mass dataveillance, not only the \ntotal information awareness level, but the profiling systems \nthat are being proposed at airports. And the reason I am \nconcerned about them, this is the surveillance of the data of \nmillions of innocent citizens. And it's just not a little bit \nof data. If the projects go forward, there are credit card \nrecords, phone calls, tax records, all public and private data; \nmass risk predictions based on this that could be used to \nprosecute people not for terrorism--which I'm all for--but for \nvery low-level crimes.\n    It is that kind of fishing expedition--it is the example of \nan unconstitutional search. At the time of the fourth \namendment, what the framers were most concerned about was \nbreaking into everyone's house looking for enemies of the \ngovernment, reading their private diaries, looking at innocent \ninformation, in the course of seeing whether or not they were a \ncritic of the king, and then arresting them for whatever you \nfound in their House. That was a general search and it was \nunconstitutional because it exposed a lot of innocent \ninformation while looking at guilty information. That is what \nmass dataveillance does. And that's why, without Constitutional \nrestrictions, I don't see how we could deny that there are \nprivacy concerns.\n    Mr. Putnam. A recent New York Times article, a Dr. Gilman \nLouie, CEO of InQTel, outlined in a recent speech two different \napproaches, one which he identified as the data mining approach \nwhich results in what he calls watch lists and what he \nindicated was too blunt an instrument; the second being data \nanalysis which begins with some type of investigative lead and \nthen uses software to scan for links between a person under \ninvestigation and known terrorists. I presume that is an \napproach you are advocating?\n    Mr. Rosen. I like that approach and I respect Mr. Louie, \nwho is sensitive to these issues, and he is distinguishing \nbetween focused data mining based on individualized suspicion \nand mass dataveillance.\n    And the same model interestingly has been taken by the \nForeign Intelligence Surveillance Court. Just yesterday the \nSupreme Court decided not to review that decision of the \nForeign Intelligence Surveillance Court that said we don't have \nto worry about broad surveillance of people who have been \nidentified in advance as agents of foreign powers because we \nsuspect that they're bad guys. And if we then find that they're \nguilty of lower level crimes it's good to get them off the \nstreets because we're pretty sure that they're suspicious. \nThat's different, said the Foreign Intelligence Surveillance \nCourt, from using this mass dataveillance to look at everyone \nwithout any cause to suspect them and going after them for \nlower level crimes.\n    So I'm glad that Mr. Louie, who is at the forefront of the \ngovernment's effort to merge technologies that have been \ndeveloped in the private sector and apply them in the national \nsecurity area, is sensitive to that distinction, too.\n    Mr. Putnam. Let me direct that to our witness, Dr. Louie, \nwho is not the person I was just quoting. You indicated in your \ntestimony that data mining is a process, not a tool. Please \nelaborate on that in the context of Mr. Rosen's comments.\n    Dr. Louie. Data mining goes--some of the focus that I keep \nhearing is the emphasis going back to patterns. Data mining \ndeals with patterns, but I think the term ``patterns'' needs to \nbe expanded a little bit to understand in terms of other ways \nof interpreting a pattern. A pattern can also be a series of \nevents. A led to B, B led to C, and on down the line. If we are \nplanning a--we'll call it a filtering mechanism to look at \neverybody, you have to establish some parameters of saying if \nwe are looking for people who buy large quantities of potassium \nnitrate fertilizer and they are not in agriculture or \nlandscaping and the like, maybe that should raise a flag. But \nall it does is just put up a flag, says this is of interest. \nAnd then if other events or other ties go back to it, then that \nshould, we'll call it, raise a level of suspicion that maybe \nforwards it to somebody else to review. I think that's the way, \nwe will call it, data mining in general can be applied in terms \nof looking for potential terrorists, whether it be something \nlike Oklahoma City or something like September 11th.\n    In terms of September 11th here we have another potentially \ninteresting, we will call it, information exchange of \nImmigration's data base or when they applied for visas was, \nwe'll call it, a little bit more broader in their perception of \nhow they looked at the information coming in for, let's say, \napplications of visas. We have, we'll call it, the linguistic \nissue of how do you spell the name, what are the variations of \nthe name, variations being, let's say, diminutive form of the \nname or a, we'll call it, a common substitution, Robert for \nBob, John for Jack, you know, and down the line. If we had a \nway to compare that and also previous visas, abbreviations of \nthe names, transposing of the name that would have identified, \nhad these people come through our visa process before, where \ndid they go, did that raise any suspicions.\n    That's the way I see data mining being applied in terms of \nbroad, we'll call it, filtering of information. Not tracking \nsomebody necessarily, but raising, we'll call it, levels of \nquestionable flags or activities that may lead to something. \nThat way you are not tracking an individual, you're just \ntracking recent events. If that event tracks out and says all \nthese events lead up to a suspicious activity, then we can go \nback and say, OK, where did all these names come in or what is \nthe relationship of that. And that's up for the analysts. It's \nthe same way we track money laundering, we track bank accounts. \nThe banks are required to report any transaction of $10,000 or \ngreater. So if I deposit $ 9,999 it's not going to trip the \nflag. But if, let's say, at the bank level they consolidate the \nend of the day receipts and they see that account exceeded that \n$10,000 maybe it should just raise a flag and make FINCEN aware \nthat there was a transaction, didn't meet the criteria but it's \njust something maybe to watch. Either the bank watches it or \nFINCEN watches it.\n    But that's the way I see you apply data mining. And in \nterms of--I believe that was Gilman Louie from In-Q-Tel.\n    Mr. Putnam. Yes.\n    Dr. Louie. I agree with his prospect and the way he \noutlines the way we should look at it. Data mining is an inert \ntool. You can take very thin slices and basically create a \nsandwich of a nice depth in order to act upon. And that's where \nwe use the term ``actionable information.'' And one slice of \ninformation in itself, it may be totally insignificant and of \nno value. But it's the cumulative process of all the \nassociations associated with that data point that become \ninteresting. And you don't have to store it. You just have to \nessentially flag it. And when we have enough flags that trip, \nwe'll call it, your suspicion level, then you look at it. You \ndon't necessarily take an action on it, but evaluate it. And \nthat's where the human aspect or the analysts and subject \nmatter experts in that area can say this does look suspicious \nor this should be maybe questioned.\n    Mr. Putnam. Mr. Forman.\n    Mr. Forman. I think it's incredibly important to keep in \nmind that data mining is a productivity tool. Yes, it's part of \na process, but at the end of the day our decision has to be is \nthat a process that we want to have that is a more productive \nprocess. And that's, I think, one of the big differences to \nunderstand about the Total Information Awareness Initiative. \nThat's an R&D project. That is not a Federal IT program. And \nwhen it hits the stage where somebody says, geez, we ought to \nbuy something, it falls into the process by which we put out \nthe standards associated with the business case. Are we going \nto get any productivity out of it?\n    I have always kept in mind early in my years when I did a \nlot of data analysis and operations research this notion of \ngarbage in, garbage out that Dr. Louie raised. I am very, very \nmindful, especially in this area of homeland security, where we \nhave got dozens of data bases, merely hooking them together and \napplying an algorithm is not going to make the data there any \nbetter. Even so, merely allowing those islands of automation to \nexist and the business process that run off of those islands of \nautomation aren't going to give us any greater homeland \nsecurity. The core and the issue here is to find out do we have \na better way, as we see in Florida, for the investigators to do \ntheir work. And are we happy that this is appropriate, given \nthe Privacy Act, given the other laws that cover that. And \nthere is a policy decision to be made there. That now is \nclearly required to be addressed in the business case process \nunder the E-Government Act, and under OMB guidance we are \nupdating it to comply with that.\n    Mr. Putnam. Anyone else wish to comment on that? With \nregard to the private sector, is there an industry standard out \nthere that is being used to guard privacy and security of the \ninformation in the data mining process? Solely in the private \nsector. Is there a single industry standard?\n    Dr. Louie. There are no unified business industry \nguidelines as far as, we'll call it, protecting the privacy of \nthe data. I think that most of our clients have relied on us to \ndevise a, we'll call it, a privacy statement of how we are \ngoing to handle data, how we are going to handle the physical \nstorage as well as dissemination of the information and how--\nwho will actually get to see and touch it. That's something \nthat we have devised as being the consultants or the \npractitioners to different companies. But there are no formal \nguidelines. We have adapted the, we'll call it, guidelines as \nspecified by the Society of Competitive Intelligence \nProfessionals in terms of saying, OK, this is how we will \nhandle the data. This is how we will ensure our clients' \nprivacy and we will try to abide by that as a form of ethics.\n    Mr. Forman. I would say from the standpoint of what we have \nseen, there are two standards that have existed over the last \ncouple of years. Opt in and opt out. And I know we have looked \nan awful lot at those standards to see what would be \nappropriate for the Federal Government. Opt out being a company \ntells you you have got this data: If you want to continue with \nthis on-line service or continue as a customer with us, we are \ngoing to show the data unless you tell us not to. And opt in is \nessentially like we see with the little cards at the Giant \ngrocery store chains. If you get this card you get a lot of \ndiscounts; in return you give us information about your buying \nhabits. And those discounts give you better products and so \nforth. And so, how the data is used and how the option is \navailable to the consumer, I think they still have a couple of \ncommon standards that have been around for a couple of years.\n    Mr. Putnam. Mr. Rosen.\n    Mr. Rosen. But opt in and opt out wouldn't begin to be \nadequate to the challenge of the regulation you're thinking \nabout now because much of this is data that you can't opt out \nof sharing. It's data such as credit card purchases that goes \nautomatically to warehouses like TRW or telephone calls that go \nto the telephone company and that the court has held are not \nlegally protected because of the circular reasoning that you \nvoluntarily turned the information over for one purpose and \ncan't withhold it for another. So I'd gather the kind of \nregulations that you want to be thinking about are the \npatchwork of laws that do currently regulate information \nsharing in the private sector, such as the Fair Credit \nReporting Act that would prohibit the kind of personally \nidentifiable financial information that can be shared. As I \nunderstand several of the data mining proposals, such as the \nTotal Information Awareness Program, in its original form there \nwas a suggestion that those laws should be relaxed and that the \ngovernment should have access to data that's currently \nrestricted by law, such as personally identifiable credit card \ninformation that can ordinarily be shared and the records of \ninternational telephone calls that are regulated by other \nstatutes. So I wouldn't--with respect to the effort of using \nprivate sector regulations as a model to guide you in the new \nworld that you face in Federal data mining, I don't think that \na simple opt in standard which is based on this voluntariness \nnotion would begin to do the trick. And that's why I think at \nsome point you may down the line have to think about \ncomprehensive reform at the level of the Privacy Act, which has \nproved inadequate for regulating the kind of things we are \ntalking about now.\n    Mr. Putnam. Speaking now about the public sector, what \nlevel of information sharing is currently allowable by law \nwithin and between all government agencies without a special or \na specific warrant or request for that information? In other \nwords, how much information sharing is there between HUD, VA, \nHHS, INS now from a technical potential and from a legal \npotential.\n    Mr. Forman. There's very little information sharing. This \nissue came up about a year ago with the concept after program \nthat was called gov.net, and there was a fear for cyber \nsecurity purposes that we had to protect the sharing of \ninformation between agencies, and we found out there was \nvirtually no sharing of information between agencies. There \ngenerally, it gets back to this issue that each agency built \nits own data base, it's own data store, if you want to use the \nparlance of today's hearing, to support its own mission. And \nthe question is, when can you look across the agencies, when is \nthere a need? Going back several years, two decades almost in \nthe scientific community, there was sharing probably most \nextensive as it relates to what we now call geospatial \ninformation or geographic information systems. There are \ngenerally requirements associated to that that we handle via \nthe computer security rules and models and the business case \npractices. Where we have seen a ramp-up of sharing between \nagencies has been in the data management area that I've alluded \nto in my testimony, and that happens to be with these major \nWelfare programs and it is generally by the PARIS Project. \nThere's been explicit congressional authorization, literally \nlaws authorizing that. We have asked for some additional legal \nauthorities or additional data sharing, a creation of the \nmatching data base that has current job data, but even that is \nonly updated quarterly. We probably could do better than that.\n    Mr. Putnam. So would a successful data mining or factual \ndata analysis project that was attempting to identify a \nparticular profile of a terrorist, for example, would they be \nable to access any and all Federal Governmental data bases \nwithout a specific change in the law? Or would they be able to \ndo that as a result of the law's silence on the topic? First \npart of the question. The second part of the question is, as a \ntechnical matter, could it actually be done?\n    Dr. Louie. On the technical side I say we could do that. We \nhave for several government agencies, but the technical side of \nmaking it happen is not really the problem. The problem is the \nquality and trustworthiness of the information that's in those \ndata bases, is I would say poor to--you know, it is amazing \nthat they can conduct business.\n    Mr. Putnam. Senator Dockery.\n    Ms. Dockery. Thank you, Mr. Chairman. In Florida we require \nreasonable suspicion to be developed before we use factual data \nanalysis, and then we abide by the standards established in 28 \nCode of Federal Regulations. To answer your question about \nsharing intelligence information, Florida deals well with \nsharing information with other States. In fact, there's a pilot \nproject, the Multistate Antiterrorism Information Exchange, \ncalled MATRIX, which is going to consist of 13 States in this \npilot project. Our problem has been to share information with \nthe Federal Government, both in terms of us willingly giving \nyou information and you not being able to receive it and us \ntrying to receive information from the Federal Government.\n    One case in point, Florida has 16 million residents, but 60 \nmillion tourists. We have a lot of people moving through the \nState and it would be very helpful to us if we could access the \nvisa data base, particularly if we could have access to anyone \nwho may be in Florida who has overstayed their visa and that \ncould lead to a lot of useful information in making these \nconnections. We do not keep dossiers on individuals. We look \nfor linkages based on reasonable suspicion in assorted events \nand then we look for those linkages. Then just as soon as we \nsee them they're gone. So it is not a matter of starting a file \non an individual. It's looking at an activity and trying to \nfind who had some access to something involved within that \nactivity. But it would be very helpful to us and to other \nStates if there was a better cooperation of sharing \ninformation.\n    We have now linked almost everything in Florida together so \nwe can access various agencies' data, but we cannot access \nanything from the Federal Government nor can they for us \nbecause the information that the State has is their possession. \nBut we are willing to share it. We just don't have the \ntechnology to do so.\n    Mr. Putnam. Mr. Forman.\n    Mr. Forman. From a legal perspective, I believe there's a \npretty broad coverage, let me refer to three laws in \nparticular, the Privacy Act of 1974, the Computer Matching and \nPrivacy Protection Act of 1988 and the E-Government Act of \n2002, all of which lay out the principles and the areas that \nmust be addressed, ultimately leading up to what we would look \nfor in the business case of privacy impact assessment. There is \na policy decision that will have to be made. There's guidance \nfrom both OMB and the National Institute for Standards and \nTechnology on that for Federal information systems to ensure \nappropriate protections of personal information. I think it's \nfair to review some of those cases and how that's being done. \nBut the legal framework exists. This does not have to be built \nfrom the ground up, per say.\n    I guess I'm more concerned about this on the technology \nside. These data bases were largely poorly crafted to start \nwith. The business processes generally are nonexistent and when \nwe try to share information which have different embedded rules \nin the data bases into a data warehouse and mine that data, I \nkeep in the back of my head garbage in, garbage out, because I \nthink that's the reality that we'll be forever patching \ntogether in the Federal arena. I believe that this at the end \nof the day is not so much a technology issue as we know. The \ntechnology exists. It's been used in many governments, \nincluding the U.S. Government, for years. The question comes \ndown to can we figure out what's the right business process and \nwho should be in charge or how we want to oversee that, pulling \nthat information together and the person who says I've got a \nterrorist threat. The best framework for that so far as it \nlinks to terrorism is the Department of Homeland Security Act.\n    Mr. Putnam. Mr. Rosen, do you have a comment?\n    Mr. Rosen. It's an interesting question whether there are \nmeaningful legal regulations on the sharing of data in the case \nof individualized suspicion. The Privacy Act has a broad law \nenforcement exception and a national security exception, so I'd \nimagine that when we're talking about personal dataveillance, \nfocused on suspicious individuals, there wouldn't be meaningful \nlegal restrictions on sharing. Mass dataveillance is a \ndifferent question. And I think that the people who have \nanalyzed this are divided about whether dataveillance along the \ntotal information awareness model would violate the Privacy \nAct. It's not clear whether the information that is being \naccessed would count as a system of records according to the \nPrivacy Act, and the mere phrase itself shows how outdated that \n1970's idea, which presumes that information stored in \ndifferent file cabinets is for regulating data sharing in the \n21st century. So--and then there's also the case that much of \nthis data is already held in the private sector and law \nenforcement has a long history of piggybacking on the grand \ndata warehouses like TRW, and so forth, in order to get \ninformation that it couldn't get on its own.\n    All this is to say that if you're in any way concerned \nabout restrictions on information sharing, as I hope that you \nwill be to the degree that the PATRIOT Act and the homeland \nsecurity bill create new provisions for information sharing and \nthe interest of national security, you're going to have to \nthink about this issue afresh and try to craft sensible \nregulations for these new technologies.\n    Mr. Putnam. Do you presume then that under the current law, \nparticularly the Privacy Act, that authorization of personal \ninformation that can be held by the IRS, for example, under the \ncurrent law would not be eligible to be transferred to Homeland \nSecurity or INS or a different agency?\n    Mr. Rosen. As I understand it. I'm not an expert on the \nIRS. The IRS has a series of complicated regulations that have \nensured that it especially doesn't lightly share information \nwith law enforcement. So both by practice and regulation, I am \nnot sure that there'd be easy access to that data. But the \nmere--but you're right to focus on precisely that question and \nthen extrapolate from there to other sensitive information that \nyou might not want to be shared without cause, and then you \nwill get a sense of the degree of the challenge that you face.\n    Mr. Putnam. Well, Chairman Davis pointed out something that \nin many of these cases data mining is the collation of \npreviously existing, perhaps even public data bases and \ncollections of information and that the amalgamation of that \ndata is what allows you to get a more useful outcome than the \ntime and effort and energy involved in searching each one \ndiscretely. The blowup over TIA, characterizing it, I think, \nhas been over this presumption of the next step of data \ncollection between public and private and even into the more \npersonal side of things in terms of habits and patterns based \non purchases or travel destinations and things like that. But \nis there anything--is there any effort currently underway other \nthan what had been a research and development project? Is there \nany active program in the Federal Government that is doing that \ntype of surveillance or data mining?\n    Mr. Rosen. I understand that the CAPPS II program, which is \nComputer Assisted Passenger Profiling Act--I think I have got \nthe acronym right--is based on very much of a TIA model and is \nalso trying to collate information which is already in the \npublic's sphere and make risk predictions for particular \npassengers at airports. So that's why I think the TIA model is \none that you will have to think about hard, and I think that \nthe chairman's notion that all this information is already in \nthe private domain and therefore is not of concern and can be \nanalyzed perhaps misses the fact that once the analysis becomes \ngranular there is a difference between having me watched on the \nstreet when I walk from door to door by a cop or a neighbor and \nthe government planting a camera on my back that follows me \nfrom door to door and records each of my activities throughout \nthe day. That reality, the fact that a level of instrusiveness \nis inconsistent with the values of a free society is one that \nour law is not well set up to deal with. The Supreme Court's \ntest for invasion of privacy, as you know, Congressman, says \nthe question there is a subjective expectation of privacy that \nsociety is prepared to accept as reasonable and as the \ninvasions become more invasive people's expectations are \nlowered with a lowering of Constitutional protections. So I \nwould resist the chairman's notion that as long as the \ninformation is out there, that any degree of collation and \ntechnical analysis is fair game because there is a point at \nwhich as you have said when very intimate personal information \nbecomes available to the government on a massive scale that's \nquite different from some reporter going down to the courthouse \nand rummaging through a couple of paper records 50 years ago.\n    Mr. Putnam. Mr. Forman.\n    Mr. Forman. Well, in preparation for this hearing, I did a \nrun on our major IT investments of the Federal Government. I \ndid actually two runs, to identify all the data mining and then \nto identify all the data warehouses because why do a data \nwarehouse if you're not going to mine the data. And zero \nprojects showed up. So I didn't believe that. We don't have \nanything go on with regards to this. So I used a data mining \ntool, the search engine on first.gov and got well over 1,000 \nhits. There's an awful lot of activity going on. Now the \nquestion that seems to me comes down to is do we have anything \ngoing on as an official IT investment that relates to kind of \nthese random searches. And I'm not aware of any that Dr. Rosen \nis so concerned about. It doesn't mean that it's not out there. \nI really need to go back and dig deeper. I just have not found \nany yet. On the other hand, is there--are there some data \nmining applications that are similar to that and I think, yeah, \nyou'd have to say that the credit card fraud is very similar. \nYou know the pattern. Same thing on Medicare, Medicaid, \nmischarging. We know that we should be spending, for example, a \ncertain amount for a certain type of procedure. If we see a \ncompany that is routinely overcharging us, we know that it's \nnot an error, it's a systematic overcharging. And so that's a \nvery similar type issue and I think in the areas of government \naccounts payables, where we know some tolerances and we can use \ndata mining to identify people who are overcharging or \nfraudulently charging us. You do see that and that has gone \nthrough the privacy impact assessment reviews generally.\n    Mr. Putnam. Senator Dockery, hasn't the State of Florida \nfor some time used a data analysis, data sharing, data mining \ntype technology to compare and even correlate employment \nrecords with child support payments to develop a list of folks \nwho are behind in that and whether or not they are cheating the \nsystem?\n    Ms. Dockery. Yes, that's one of many areas that Florida has \nused the technology. Also, in smuggling rings, money \nlaundering, child molestations, so we--after September 11th it \nwas the technology was already there and it was just a matter \nof adapting it to now apply it to homeland security.\n    Mr. Putnam. So there's a history of civil uses as well as \nthe criminal uses, at least in the State of Florida.\n    Ms. Dockery. Exactly.\n    Mr. Putnam. We have been joined by our ranking member, \ngentleman from Missouri, Mr. Clay, and I'd ask unanimous \nconsent that he be able to enter his statement into the record. \nAnd without objection, show it done, and now recognize him for \nhis statement and questions.\n    Mr. Clay. Thank you very much, Mr. Chairman. Let me say, \nfor Mr. Rosen, the Transportation Security Administration plans \nto use data mining to develop terrorist profiling for anyone \nwho flies. And if Congress goes along with this proposal, what \nsafeguard should be established at the same time to assure \npublic rights similar to those provided in the Privacy Act? Let \nme also say that--do you believe that airlines are now using \nprofiles when you go to the kiosk to get your boarding pass, \nand you put your card through the kiosk, don't you think that \nthey examine some of your recent credit activity now and is \nprofiling occurring now by the airlines?\n    Mr. Rosen. I do, Congressman. As I understand CAPPS I, or \nthe computer assisted profiling system that's now in use, it \ndoes indeed analyze publicly available information from the \nprivate and public sector and make risk predictions that can \nlead people to be taken aside for different searches. As I \nunderstand, CAPPS II would only increase this profiling by \nadding information to the data base. It's difficult to answer \nyour question adequately, because the Transportation Security \nAdministration is not forthcoming about exactly what \ninformation it's analyzing and how it's using it, and I think a \ncrucial part of your oversight role should be to ensure that \nthe data in the data base is transparent, not the algorithms. \nThe transportation authority says, well, we can't tell you what \nalgorithms we're using or the terrorists can beat the system. \nWhat Congress needs to know is not what the algorithms are, but \nis this data that the Federal Government is entitled to \nanalyze.\n    So when you think about how to regulate this new system, \nand this will be a pressing concern, even more so than total \ninformation awareness because that's been tabled for the \nmoment, think about transparency, accountability. Citizens \nshould be able to correct errors in their data base. We have \nheard a lot this morning about the poor quality of the data. \nImagine being stopped repeatedly on the basis of inaccurate \ninformation and having no remedy, not even being told why \nyou've been stopped. The application of fair information \npractices to the transportation arena is something that \nCongress urgently needs to think about because the Privacy Act \nin its incarnation is not adequate to the task.\n    So I think that this should be a good model for you as you \nthink about regulation.\n    Mr. Clay. Thank you very much.\n    Mr. Forman, along those same lines, airline security has \nhad a troubled history of racial profiling, even before the \nattack on the World Trade Towers. During the 1991 Gulf war \nindividuals with Middle Eastern names were forced off their \nflights despite the fact they were American citizens. Last year \nthe ACLU testified before Congress of dozens of such incidents, \nindividuals discriminated against in airports or on airplanes \nbased on race and heritage. The same people who oversaw the \nprivate contractors who provided discriminatory security are \nnow designing new systems. What is OMB doing to prevent racial \nprofiling from continuing in air transportation?\n    Mr. Forman. Well, let me put this into the context of the \nCAPPS II program. The CAPPS II program was not approved by OMB \nto proceed at the pace that they seem to want to proceed. I \nhave a huge spotlight on that project right now. They're late \nin getting back to me the information that they need to \nproceed. So the issues that we're talking about, the issues \nthat concern me essentially, CAPPS II could quickly become the \n80th watchlist. And I have to take a step back in my job and \nsay, what value added do we get by yet another island of \nautomation coming up with something farther away from something \nthat's going to give us the productivity and effectiveness \nwe're looking for. You know, the argument that I have heard in \nfavor of CAPPS and CAPPS II essentially went back to the \nquestion of do you want this random? Because my father, my \ngrandmother was pulled out of line. And it just didn't seem to \nmake sense. So there has to be something better. And I think, \nand I allude to this in my testimony in the customs arena, in \nthe package movement, we seem to figure out this risk paradigm. \nNow, I think that's what we are looking for. We're clearly not \nlooking for a racial profiling. We are looking for a risk \nprofiling. And there the data that I'm asking for, it's got to \nbe in the business case, would give us both the technical \nprogrammatic reviews as well as the policy review. We don't \nhave it yet.\n    Mr. Clay. In this process you're looking for random, random \nprofiling and not racial profiling or heritage?\n    Mr. Forman. We are looking for risk based--.\n    Mr. Clay. Risk based.\n    Mr. Forman. Reduction. So not random profiling.\n    Mr. Clay. So the 9-year-old little girl that goes through, \nyou may not want to search her, through TSA. You may not want \nto search her?\n    Mr. Forman. As a random selection, that would be correct.\n    Mr. Clay. Or the 85-year-old grandmother?\n    Mr. Forman. As a random selection, that would be correct. \nWe are looking for clear documentation that they have actually \nfigured out an approach that's going to improve the \nproductivity. You know, we can spend hundreds of millions of \ndollars on a terrific IT system with very pretty screens or \nvery fruitful data mining techniques. But at the end of the \nday, if it somehow does not lower the risk, to me, I would have \nto say that is not a good IT investment for the Federal \nGovernment and would recommend against that.\n    Mr. Clay. OK. All right. Thank you.\n    Mr. Kutz, does data mining need individual identities in \norder to detect patterns of unusual activity? And can the \ngovernment develop profiles of unusual activity and then \nfollowup on the specifics with appropriate oversight?\n    Mr. Kutz. Again, what--most of what we have done so far \nrelates to credit card data bases, but we have gone beyond that \ncertainly for the credit card data bases and these were \ngovernment credit cards, ones issued by the--on behalf of the \nFederal Government to use for government purposes. We did have \nthat information to basically analyze and put together patterns \nof activity, etc. But we have also gone beyond, I was going to \nmention an example last year. We testified before \nRepresentative Shays on the JS List suit, which is the current \nchem-bio suits that are being used in the Middle East. And what \nwe identified there was that they were excessing and selling \nthose goods on the Internet at the same time they were buying \nthem. And so in that instance, we tried to identify who was \nbuying these suits and whether or not they might be using them \nfor something that would be against the government. So we try \nto identify, where it is appropriate, individual identities to \nfollowup for investigative purposes.\n    Mr. Clay. Let me ask you a followup on the question I asked \nMr. Rosen. What exactly do the airlines look for when we go to \nthe kiosk and put our credit card through? What kind of \nfinancial activity are they looking at? Just out of curiosity.\n    Mr. Kutz. I couldn't answer that question.\n    Mr. Clay. You don't know. Does anyone on the panel know \nwhat they're looking at? I mean, is it one purchasing one-way \ntickets or what exactly.\n    Mr. Rosen. We know from criminal procedure cases that \nthere's certainly public information that they look for, one-\nway tickets, certain points of origin passengers and the \naddresses and phone numbers that you check in with and the \npeople that you also are traveling with, and information neuro \nnetwork analysis can be done on that. But we are assuming that \nthey're respecting legal limitations on, for example, looking \nat personally identifiable phone calls or personally \nidentifiable credit card information. But finding out the \nprecise answer to that, I know there are groups like some of \nthe privacy groups in town have Freedom of Information Act \nrequests to find out exactly what information is being used and \nthey haven't found the TSA terribly forthcoming, as I \nunderstand it.\n    Mr. Clay. Do you think they also look at recent purchases \nin retail outlets?\n    Mr. Rosen. As I understand it, they would be restricted \nfrom doing that by the Federal Credit Reporting Act, but you \nneed a closer parsing of the statute than I can give you for \nthat.\n    Mr. Clay. OK. Thank you very much.\n    [The prepared statement of Hon. Wm. Lacy Clay follows:]\n\n    [GRAPHIC] [TIFF OMITTED] T7229.045\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.046\n    \n    [GRAPHIC] [TIFF OMITTED] T7229.047\n    \n    Mr. Putnam. The gentleman raises an interesting point. \nImmediately after September 11th I was pulled every single time \nI flew because I was not in a frequent flier program, we bought \nour tickets at the last minute because of the Congressional \nschedule and it was always one-way. And so I got the body \ncavity search just about every time I flew. And it's terribly \nfrustrating and it begs some better type of profiling, \nparticularly based on risk. And while some Members of Congress \ncan be shady characters at times, hopefully we wouldn't fit the \nrisk profile.\n    Mr. Clay. Hopefully we wouldn't get stopped as often.\n    Mr. Putnam. Well, hopefully, at least not quite as often. \nEvery time got a little old.\n    But let's get back to the people component of this, because \nI think everyone has agreed that at the end of the day, no \nmatter what type of process there is and no matter what type of \ninformation or data is out there, at the end of the day it is \ngoing to require some analysis by a human being. And everyone \nin general has seemed to stress the need for quality data as \nwell as those high quality analytical skills in the personnel.\n    Can you expand on that a little bit and talk about where we \nare in terms of our human capital and the role that they play \nin obtaining acceptable results through this process?\n    Mr. Forman. I think there are some very, very good examples \nof the training and culture change that has to take place here. \nWhen you move from a paper based--technically we call knowledge \nmanagement environment--to an on-line you're going to use \ndifferent interfaces. To do--to have that tool kit, if you \nwill, generally, people have to become computer literate and \nwilling to use computers. And that's where we see, especially \nin the law enforcement arena, a cultural, maybe generational \nchange that we are working through. Certainly you'll see that \nat the FBI if you look at their use of the TRILOGY program and \nthe culture of change that the Director is bringing. From my \nperspective, in the business case itself I look at that. I look \nto see are we investing in training and process reengineering, \nchange management projects. And when I see generally data \nmining or tools that use these knowledge management systems and \nsupport systems tools without any training, that is a flag to \nus that this should go on the high risk list. Unfortunately, \nthat has been the pattern of government. Somebody in the \ntechnology side invests in these tools and then they get ready \nto deploy and they find out culturally or from an education \nstandpoint people don't want to use them. And as in the case of \nthe INS, then we go on a binge of buying training services. So \nI'd say right now, training or the education part has been an \nafterthought and it's one that needs a lot more attention and \nfunding from the up-front. We are trying to put that discipline \nin the process.\n    Mr. Kutz. Mr. Chairman, I would add to that the software \nthat we had to do the data mining that we have done in the \nfraud, waste and abuse type applications which is fantastic. \nIt's flexible. We certainly train our people, etc. But the real \nelement that makes it work is the people and the continuous \nlearning that goes on with even using that software and the \nvarious programs. So we've kind of got a process where as we \nlook at a system and a program, we understand the program, \nunderstand the controls, understand the vulnerabilities, and we \nuse that too as a feedback into the actual data mining \nstrategy, combining auditors and investigators again.\n    I mentioned Mr. Ryan, who's with me today, who worked for \nthe Secret Service doing money laundering and credit card \ncrimes for decades. People with that kind of experience \nteaching younger people some of the things that they know \nreally provides a great atmosphere for learning and developing \nall those human capital skills.\n    Mr. Putnam. Have you an estimate of the savings that have \nbeen derived from that type of data sharing initiative?\n    Mr. Kutz. From the data mining with respect to the fraud, \nwaste and abuse?\n    Mr. Putnam. From the financial management side, yes.\n    Mr. Kutz. If you go back to the improper payments reporting \nthat's gone on in Federal Government for years, I think that \nareas like Medicare have shown large decreases in estimated \nimproper payments, and that's I think in part due to the data \nmining that's gone on there. Another program that's had a great \ndeal of oversight in that area is the earned income tax credit, \nwhich had estimates of as much as $8 billion of improper or \nfraudulent type payments over the years. So there's certainly \nbeen savings. I don't think it's been quantified necessarily, \nbut the focus of data mining and the focus on improper payments \ngoing out the door has led to better controls in the government \nand probably saved billions of dollars.\n    Mr. Putnam. Senator Dockery.\n    Ms. Dockery. Thank you, Mr. Chairman. You bring up a good \npoint and one that piggybacks on to Congressman Davis. The \ninformation that we are using in tracking criminal activity and \npotential terrorist events takes into consideration what used \nto be information in various locations. By putting that all \ntogether, it cuts the time down from weeks or months to a \nmatter of minutes. Once that information has identified a risk, \nthat's when the investigations begin. So it still comes down to \nour human investigators, but instead of spending all their time \ndigging through paper to find out where to start, they now have \na starting point and spend their time more wisely looking at \nthose individuals who have come up as a potential risk. So it \ndoes involve a lot of training. We do--the success of what we \ndo with that information lies within our law enforcement, but \nthis allows them to spend their time in the investigation and \nnot in trying to put together a pattern.\n    Mr. Putnam. How reliable is that data? How often is it \nmaintained? How often is it upgraded? And we have certainly \nlearned in our experience with the election that sometimes our \ndata bases are a little old with respect to eligible voters and \nconvicted felons and things like that. How good a job does the \nState do in maintaining that data base that they depend on?\n    Ms. Dockery. Well, I am not an expert in that area, but I \nwould say that we do have systems put in place to purge \ninformation. We have systems put into place to check \ninformation. And the sharing of the information allows us to \nhear from other sources in the law enforcement community that \nsome information may be suspect. So I think our information is \ngood. Keep in mind that when it lists people with risk factors, \nthat doesn't point to that person as being guilty of anything. \nIt points to that person as coming up as maybe a place to start \nthe investigation.\n    Mr. Putnam. Mr. Forman, you had referred to geospatial \ninformation earlier in your testimony. In my understanding that \nis 1 of the 24 E-Government initiatives, and that would involve \nan overlay of information from a variety of sources with regard \nto identifying the geography of data. In essence, you overlay \nthe census data with USGS data and we can look at, you know, \nwhere the population threats are to sensitive estuaries or any \nof a million combinations of things by combining all the data \nthat's collected and stacking it in a meaningful way to derive \nanswers about what's going on. Isn't that data mining?\n    Mr. Forman. Yeah. That very definitely will have to require \ndata mining. There are two approaches to leveraging the \nredundant data sources. One is the concept of buy once and use \nmany. We are definitely proceeding with that. But then where do \nyou put that data? Is it some is maintained at National Weather \nService, for example, or NOAA and some is maintained at the \nU.S. Geological Survey, some is maintained at Environmental \nProtection Agency? That kind of pier to pier computing model is \nthe emerging concept of a virtual data warehouse in which case \nprobably at that program office you would have the meditative \ndescription of where do I go to find this data, what is the \nstandard, and access that. Regardless of whether it is a \nphysical data warehouse or this virtual data warehouse to get \naccess to that data, to make sense of it, data mining \ntechniques will be used. They have been used, you know, for \nexample, probably the best example today, if you go to the \nCensus Web site, American Fact Finder, you can find out \nsupposedly, I haven't done this, but the theory was you could \nfind out how many kids of soccer age for second grade soccer \nteams, second and third grade soccer teams are in your track, \nyou know, in your soccer league area. That wouldn't tell you by \nhouse, but that would tell you maybe by block or by \nsubdivision.\n    Mr. Putnam. The opportunities for the beneficial use strike \nme as endless. When you compare weather patterns with farm \npayments, with crop insurance, perils and things like that, \nthen maybe we start raising the risk premiums for that area or \nmaybe we adjust our farm payments so we don't let people plant \nin that area until El Nino clears up. I mean the opportunities \nare endless to derive information. The Federal Government \nspends a fortune collecting information and the fact that it is \nfor the large part underutilized is distressing from a taxpayer \nperspective.\n    Mr. Rosen, you mentioned earlier that perhaps we should \nconsider the creation of a special court to consider these \ntypes of requests for specific searches, I believe.\n    Mr. Rosen. I did. And, Congressman, I would distinguish the \nneed for a special court when we are talking about the mass \ndataveillance of personally identifiable data with the kind of \nsyndromic surveillance that you and Mr. Forman have just been \ntalking about. This is indeed a wonderful resource, and there \nare no privacy issues when you're making general statements \nabout weather patterns or census information that's not \npersonally identifiable or the Centers for Disease Control \nusing data mining to figure out when people are checking in in \none area with an epidemic or, to give another example that I am \nvery impressed by, the city of Chicago using data mining to \nfigure out when crime patterns correspond with particular \nweather patterns and sports events and then they can deploy the \ncops to that area of town when there is a particular game on \nand that's really hot and then they can stop crime. These are \nwonderful things that don't raise any privacy issues at all. \nThat's very different though from, and again if the jargon \nisn't helpful let's come up with another term, but mass \ndataveillance, suspicionless searches at airports, the total \ninformation awareness model, this is something that needs \nregulations.\n    So my message has been this stuff isn't all good or all bad \nand the technology isn't evil, just be especially attuned to \nthe privacy dangers of suspicionless searches that allow \npersonal information to be collected in ways that are not \ncurrently available. And for that I think you do need--it \ndoesn't have to be a special court. You could have a \nmagistrate. You could have a congressional oversight body. \nThere are all sorts of ways to do it. But you have to separate \nthe model as the data is traceable but not identifiable. You \ncan do those sort of general predictions and risk profiles that \nMr. Forman is talking about, but you can't actually identify me \nas the person who's been buying fertilizer unless it really \nlooks like I'm a terrorist because I've done some other things \nthat are suspicious, too.\n    Mr. Putnam. Well, I would remind you and the rest of the \npanel and the audience that on May 6th we will convene our next \noversight hearing on this topic, specifically to address TIA \nCAPPS II and some other similar programs.\n    With that, I will yield back to the gentleman from Missouri \nfor any questions.\n    Mr. Clay. Thank you, Mr. Chairman. Senator Dockery, I'd be \ninterested to know what Florida does to protect individual \nrights. Does an individual have a right to know what \ninformation about them is included in the data analyzed in the \nfactual data analysis? Does the individual have a right to \ncorrect the information in those data bases that is wrong? And \nwhat happens if an individual is singled out because of \nincorrect information in one of these data bases? Can you kind \nof expound on that for me?\n    Ms. Dockery. Yes. Thank you. All the information that is in \nthe data bases are part of Florida's open public records. So \nany individual is at any time able to check out those records \nand to clarify any misinformation on those records. We don't \nkeep particular files on any individuals. We look for events, \nand risk factors may make somebody come up. Then it goes to a \nhuman being, an investigator to investigate that and they may \nfind that just because the individual was identified as being--\nfitting those risk profile that person was nowhere near the \nevent. So there are a lot of safeguards built in. And of \ncourse, we abide by the Federal Code that I mentioned earlier.\n    Mr. Clay. So the safeguards are there and they're helpful \nand people can followup and correct them?\n    Ms. Dockery. Yes.\n    Mr. Clay. That sounds like a pretty foolproof system. Thank \nyou.\n    Mr. Kutz, what would you recommend Congress do to stop the \nracial profiling that is going on in today's airline security? \nDo you have any recommendations?\n    Mr. Kutz. No, that's not an area that I deal with so I \ncan't comment on that.\n    Mr. Clay. OK. Well, let me also ask you, you recently did \nsome work for Congress where you identified several people \ngetting treatment at veterans hospitals who were listed as \ndeceased on Social Security records. With further \ninvestigation, you showed that the problem was errors in the \nSocial Security records. Now, if TSA had those Social Security \nrecords in their data base, those people would be stopped from \nflying and they would have no way of knowing why or correcting \nthe incorrect information. Would you agree that any system used \nby TSA has to allow for the public to know what information is \nbeing used to rate them and what other safeguards should be in \nplace?\n    Mr. Kutz. Your question gets back to the issue I think Mr. \nForman talked about, about data quality in the Federal \nGovernment, and we did indeed find, and this was from military \ntreatment facilities, we had compared people who were served at \nsome military treatment facilities with a Social Security death \nfile and there were some hits that came out of people that \nappeared to be dead that were not really dead. And so there \nwere errors in the Social Security death file, and that \ncertainly raises issues about what that file is used for. That \nfile is certainly shared with others. It's sold to others. And \nthe Social Security Inspector General has reported other \nexamples of errors with that.\n    So this issue of Federal Government data base reliability \nis a major challenge here in all applications of data mining \ngoing forward. And I had some experiences I was going to share \nwith you on the IRS, where I used to be responsible for the IRS \nfinancial audit, and we found lots of instances there with the \nerrors in the system there were people who were being pursued \nand having taxes collected from them but didn't owe any taxes. \nAt the same time we were issuing lots of refunds to people who \nweren't due refunds.\n    So, again you've got lots of issues with data quality and I \nwould say that the Federal Government is decades behind the \nprivate sector in that area. I got to go to Bentonville, AR \nwithin the last year to visit the Wal-Mart headquarters and it \nwas quite fascinating to see the technology that they use in \ntheir inventory supply chain management, and when I compare \nthat to where the Federal Government is with its inventory \nmanagement again it's just decades behind. And they were able \nto tell us at Wal-Mart headquarters how many tubes of \ntoothpaste there were at the Fairfax Wal-Mart here in 1 minute. \nAnd not only that, but how many they had actually stocked in \nthe last week, how many had been bought in the last week, just \ntremendous technology, whereas again in the Federal Government \nI'll go back to the JS List, the chem-bio suits used by our \ntroops. Once those left the defense warehouses into the \nmilitary services, complete visibility was lost and we were \nunable to determine where these chem-bio suits were, some from \nprior years that had been defective through a fraud scheme by a \nprivate sector company.\n    Mr. Clay. You do make recommendations to the different \nagencies how to correct the errors that you all find?\n    Mr. Kutz. Right. That's the value of data mining. It helps \nus to make valuable recommendations to Federal agencies to \nimprove their control systems, etc., to try to minimize the \nrisk of these things happening that I've just described.\n    Mr. Clay. What was your recommendation to the Social \nSecurity Administration?\n    Mr. Kutz. We didn't make any recommendations to them \nbecause the Inspector General had already made recommendations \nto them, and they are working to clean up that data base.\n    Mr. Clay. I see. Thank you very much.\n    Mr. Forman, would you support legislation that prohibited \nthe TSA from using any system that used profiles based on race, \nreligion, national origin, gender, sexual orientation or \nproxies for those characteristics?\n    Mr. Forman. I forever remember my time on the Hill and a \ngood staffer on detail from GAO who has been a staffer to this \ncommittee before, the devil's in the details. I'd have to see \nthe specifics.\n    Mr. Clay. See the specifics. OK. Thank you very much. And \nthank you, Mr. Chairman.\n    Mr. Putnam. Thank you, Mr. Clay. And Mr. Kutz, when Mr. \nForman gets done with the Federal Government, Bentonville, AR \nis going to be sending executives up here to tour the Federal \nGovernment to see how efficient we are. Isn't that right?\n    Mr. Forman. Absolutely.\n    Mr. Putnam. I want to thank the witnesses for their \noutstanding testimony and for the questions of the \nsubcommittee. We will be focusing very, very directly on this \ntopic throughout the 108th Congress. Our next hearing on the \ntopic is May 6th to look at some of the specific issues that \nhave been raised. But this is very clearly on my radar screen \nand something that we will continue to monitor very closely. It \nis an important issue. It holds the promise of tremendous \npotential benefits to our taxpayers in eliminating waste, fraud \nand abuse and bringing better financial management practice, \nand frankly it raises some red flags in terms of protecting \nthose very same taxpayers' privacy and personal information. So \nwe will do what we can to determine where that fine line is and \nattempt to walk it.\n    So I understand Mr. Rosen has to be out to teach his class, \nbut do any of you have one last question that you wish we had \nasked you that you want to answer?\n    Senator Dockery.\n    Ms. Dockery. It's not a question. But, Mr. Chairman, if I \ncould just take this minute since I don't have the opportunity \nto speak to a congressional committee every day, I want to \nthank you on behalf of the States for what you do in Congress, \nto send money down to the States to allow us to do the job of \nprotecting the residents in our State against any threat to our \nhomeland security, and I would ask that in the future when \nmoneys are coming down from the Federal Government, the more \nflexibility you could give us in spending those moneys and if \nyou could have those moneys go through the State rather than \ndirectly to the local governments so that we can have a better \nfeel for what's coming down and avoid duplication of effort. \nBut thank you for all that you do for us and thank you for \nletting me participate today.\n    Mr. Putnam. Thank you, Senator.\n    Dr. Louie.\n    Dr. Louie. Yeah. This is on-line data collection. The point \nabout individual data elements are not necessarily very \nimportant in themselves, but you should also look at how this \ndata is used as if it were classified material. Individual \nelements in themselves are not necessarily important. It's the \ncombination of multiple elements that make it an interesting \nissue as far as questionable invasion of privacy or whether it \nraises flags about how that data is being used in the case of \nare we really profiling or are we looking at a risk assessment. \nShould we look at race and national origin? Probably yes. In \nthemselves they are not necessarily the most important item, \nbut in combination with other data elements they may raise a \nlevel of risk, and it needs to be considered in that manner. It \nneeds to be viewed not as an individual component, but the sum \nof all the components looked at in terms of evaluating whether \nthis information is something that warrants looking into or not \nlooking into.\n    So does it make it actionable? That's the way you need to \nlook at the collection of data, not the individual elements \nnecessarily.\n    Thank you for the opportunity.\n    Mr. Putnam. My pleasure. Thank you. Anyone else?\n    Mr. Kutz. Yeah, I would just say I appreciate you inviting \nus to the hearing today. Since we work for Congress, we \ncertainly believe data mining is a tool that's going to be able \nto help us better serve you and to do better audits and \ninvestigations on your behalf. So I appreciate that.\n    Mr. Putnam. Thank you. Mr. Rosen. Mr. Forman. We appreciate \nyour efforts. I'm reminded that in the event there are \nadditional questions the record will remain open for 2 weeks \nfor submitted answers. And with that, the meeting is adjourned.\n    [Whereupon, at 11:30 a.m., the subcommittee was adjourned.]\n    [Additional information submitted for the hearing record \nfollows:]\n\n[GRAPHIC] [TIFF OMITTED] T7229.048\n\n[GRAPHIC] [TIFF OMITTED] T7229.049\n\n[GRAPHIC] [TIFF OMITTED] T7229.050\n\n[GRAPHIC] [TIFF OMITTED] T7229.051\n\n[GRAPHIC] [TIFF OMITTED] T7229.052\n\n[GRAPHIC] [TIFF OMITTED] T7229.053\n\n[GRAPHIC] [TIFF OMITTED] T7229.054\n\n[GRAPHIC] [TIFF OMITTED] T7229.055\n\n[GRAPHIC] [TIFF OMITTED] T7229.056\n\n[GRAPHIC] [TIFF OMITTED] T7229.057\n\n[GRAPHIC] [TIFF OMITTED] T7229.058\n\n[GRAPHIC] [TIFF OMITTED] T7229.059\n\n[GRAPHIC] [TIFF OMITTED] T7229.060\n\n[GRAPHIC] [TIFF OMITTED] T7229.061\n\n\x1a\n</pre></body></html>\n"