Data Mining: Federal Efforts Cover a Wide Range of Uses 	 
(04-MAY-04, GAO-04-548).					 
                                                                 
Both the government and the private sector are increasingly using
"data mining"--that is, the application of database technology	 
and techniques (such as statistical analysis and modeling) to	 
uncover hidden patterns and subtle relationships in data and to  
infer rules that allow for the prediction of future results. As  
has been widely reported, many federal data mining efforts	 
involve the use of personal information that is mined from	 
databases maintained by public as well as private sector	 
organizations. GAO was asked to survey data mining systems and	 
activities in federal agencies. Specifically, GAO was asked to	 
identify planned and operational federal data mining efforts and 
describe their characteristics. 				 
-------------------------Indexing Terms------------------------- 
REPORTNUM:   GAO-04-548 					        
    ACCNO:   A09947						        
  TITLE:     Data Mining: Federal Efforts Cover a Wide Range of Uses  
     DATE:   05/04/2004 
  SUBJECT:   Counterterrorism					 
	     Crime prevention					 
	     Data collection					 
	     Federal agencies					 
	     Fraud						 
	     Information technology				 
	     Personnel management				 
	     Planning						 
	     Statistical methods				 
	     Data mining					 
	     Personal information				 

******************************************************************
** This file contains an ASCII representation of the text of a  **
** GAO Product.                                                 **
**                                                              **
** No attempt has been made to display graphic images, although **
** figure captions are reproduced.  Tables are included, but    **
** may not resemble those in the printed version.               **
**                                                              **
** Please see the PDF (Portable Document Format) file, when     **
** available, for a complete electronic file of the printed     **
** document's contents.                                         **
**                                                              **
******************************************************************
GAO-04-548

United States General Accounting Office

GAO	Report to the Ranking Minority Member, Subcommittee on Financial Management,
the Budget, and International Security, Committee on Governmental Affairs, U.S.
                                     Senate

May 2004

                                  DATA MINING

                             Federal Efforts Cover
                              a Wide Range of Uses

                                       a

GAO-04-548

Highlights of GAO-04-548, a report to the Ranking Minority Member,
Subcommittee on Financial Management, the Budget, and International
Security, Committee on Governmental Affairs, U.S. Senate

Both the government and the private sector are increasingly using "data
mining"-that is, the application of database technology and techniques
(such as statistical analysis and modeling) to uncover hidden patterns and
subtle relationships in data and to infer rules that allow for the
prediction of future results. As has been widely reported, many federal
data mining efforts involve the use of personal information that is mined
from databases maintained by public as well as private sector
organizations.

GAO was asked to survey data mining systems and activities in federal
agencies. Specifically, GAO was asked to identify planned and operational
federal data mining efforts and describe their characteristics.

May 2004

DATA MINING

Federal Efforts Cover a Wide Range of Uses

Federal agencies are using data mining for a variety of purposes, ranging
from improving service or performance to analyzing and detecting terrorist
patterns and activities. Our survey of 128 federal departments and
agencies on their use of data mining shows that 52 agencies are using or
are planning to use data mining. These departments and agencies reported
199 data mining efforts, of which 68 are planned and 131 are operational.
The figure here shows the most common uses of data mining efforts as
described by agencies. Of these uses, the Department of Defense reported
the largest number of efforts aimed at improving service or performance,
managing human resources, and analyzing intelligence and detecting
terrorist activities. The Department of Education reported the largest
number of efforts aimed at detecting fraud, waste, and abuse. The National
Aeronautics and Space Administration reported the largest number of
efforts aimed at analyzing scientific and research information. For
detecting criminal activities or patterns, however, efforts are spread
relatively evenly among the agencies that reported having such efforts.

In addition, out of all 199 data mining efforts identified, 122 used
personal information. For these efforts, the primary purposes were
improving service or performance; detecting fraud, waste, and abuse;
analyzing scientific and research information; managing human resources;
detecting criminal activities or patterns; and analyzing intelligence and
detecting terrorist activities.

Agencies also identified efforts to mine data from the private sector and
data from other federal agencies, both of which could include personal
information. Of 54 efforts to mine data from the private sector (such as
credit reports or credit card transactions), 36 involve personal
information. Of 77 efforts to mine data from other federal agencies, 46
involve personal information (including student loan application data,
bank account numbers, credit card information, and taxpayer identification
numbers).

Top Six Purposes of Data Mining Efforts in Departments and Agencies

www.gao.gov/cgi-bin/getrpt?GAO-04-548

To view the full product, including the scope and methodology, click on
the link above. For more information, contact Linda Koontz at (202)
512-6240 or [email protected].

Contents

Letter                                                                   1 
                                  Results in Brief                          2 
                                     Background                             3 
            Agencies Identified Numerous Data Mining Efforts with Various 
                                        Aims                                7 
                                      Summary                              12 

Appendixes

Appendix I: Objective, Scope, and Methodology 14

Appendix II: Surveyed Departments and Agencies 16

Appendix III:	Departments and Agencies Reporting No Data Mining Efforts 23

Appendix IV: Inventories of Efforts 27

Tables	Table 1: Table 2: Table 3: Table 4: Table 5: Table 6: Table 7:
Table 8: Table 9:

Top Six Purposes of Data Mining Efforts in Departments and Agencies and
Number of Efforts Reported Department of Agriculture's Inventory of Data
Mining Efforts Department of Commerce's Inventory of Data Mining Efforts
Department of Defense's Inventory of Data Mining Efforts Department of
Education's Inventory of Data Mining Efforts Department of Energy's
Inventory of Data Mining Efforts Department of Health and Human Services'
Inventory of Data Mining Efforts Department of Homeland Security's
Inventory of Data Mining Efforts Department of the Interior's Inventory of
Data Mining Efforts

                                       8

27

29

29

37

40

41

43

46

47 49 50

50 Table 10: Department of Justice's Inventory of Data Mining

Efforts Table 11: Department of Labor's Inventory of Data Mining Efforts
Table 12: Department of State's Inventory of Data Mining Efforts Table 13:
Department of Transportation's Inventory of Data Mining

Efforts

Table 14: Department of the Treasury's Inventory of Data Mining Efforts 51
Table 15: Department of Veterans Affairs' Inventory of Data Mining Efforts
54 Table 16: Environmental Protection Agency's Inventory of Data Mining
Efforts 56 Table 17: Export-Import Bank of the United States' Inventory of
Data Mining Efforts 56 Table 18: Federal Deposit Insurance Corporation's
Inventory of Data Mining Efforts 57 Table 19: Federal Reserve System's
Inventory of Data Mining Efforts 57 Table 20: National Aeronautics and
Space Administration's Inventory of Data Mining Efforts 58 Table 21:
Nuclear Regulatory Commission's Inventory of Data Mining Efforts 62 Table
22: Office of Personnel Management's Inventory of Data Mining Efforts 62
Table 23: Pension Benefit Guaranty Corporation's Inventory of Data Mining
Efforts 63 Table 24: Railroad Retirement Board's Inventory of Data Mining
Efforts 63 Table 25: Small Business Administration's Inventory of Data
Mining Efforts 64

Figures Figure 1: Top Six Purposes of Data Mining Efforts That Involve     
                                     Personal Information                  10
           Figure 2: Top Six Purposes of Data Mining Efforts That Involve  
                                      Private Sector Data                  11 
           Figure 3: Top Six Purposes of Data Mining Efforts That Involve  
                               Data from Other Federal Agencies            12 

Abbreviations

CARDS Counterintelligence Analytical Research Data System
CG Coast Guard
CI-AIMS Counterintelligence Automated Investigative

Management System DHHS Department of Health and Human Services DOD
Department of Defense DOE Department of Energy DOT Department of
Transportation EFTPS Electronic Federal Tax Payment System EOS Earth
Observing System FARS Fatality Analysis Reporting System FDA Food and Drug
Administration GENESIS Global Environmental and Earth Science Information

System GSFC Goddard Space Federal Center HR Human Resources HRSA Health
Resources and Services Administration MATRIX Multistate Anti-terrorism
Information Exchange System NASA National Aeronautics and Space
Administration NVO National Virtual Observatory OIG Office of Inspector
General OLAP On-line Analytical Processing RSST Real Estate Stress Test
SAA Spectral Analysis Automation SAS Safety Automated System SMARTS
Statistical Management Analysis and Reporting Tool

System SWC Space Warfare Center TIMS Technical Information Management
System TOP Treasury Offset Program VA Veterans Affairs VHA Veterans Health
Administration VISN Veterans Integrated Service Network

This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed in
its entirety without further permission from GAO. However, because this
work may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this material
separately.

A

United States General Accounting Office Washington, D.C. 20548

May 4, 2004

The Honorable Daniel K. Akaka

Ranking Minority Member

Subcommittee on Financial Management, the Budget, and International
Security Committee on Governmental Affairs United States Senate

Dear Senator Akaka:

Data mining-a technique for extracting knowledge from large volumes of
data-is increasingly being used by government and by the private sector.
As has been widely reported, many federal data mining efforts involve the
use of personal information1 that is mined from public as well as private
sector organizations.

This report responds to your request that we identify and describe
operational and planned data mining systems and activities in federal
agencies. In a follow-up report, we plan to perform an in-depth review of
selected federal data mining efforts.

The term "data mining" has a number of meanings. For purposes of this
work, we define data mining as the application of database technology and
techniques-such as statistical analysis and modeling-to uncover hidden
patterns and subtle relationships in data and to infer rules that allow
for the prediction of future results. We based this definition on the most
commonly used terms found in a survey of the technical literature. In our
initial survey of chief information officers, these officials found the
definition sufficient to identify agency data mining efforts.

1As used in this report, personal information is all information
associated with an individual and includes both identifying information
and nonidentifying information. Identifying information, which can be used
to locate or identify an individual, includes name, aliases, Social
Security number, e-mail address, driver's license number, and
agency-assigned case number. Nonidentifying personal information includes
age, education, finances, criminal history, physical attributes, and
gender.

To address our objective to identify and describe operational and planned
data mining systems and activities in federal agencies, we surveyed chief
information officers or comparable officials at 128 federal departments
and agencies to determine whether the agencies had operational and planned
data mining systems or activities.2 We then conducted telephone interviews
with the reported system managers to obtain information on the
characteristics of the identified data mining efforts. To verify the
information we received, we sent follow-up letters to agencies that
responded as well as to those that did not respond, we asked responsible
officials to verify the information, and we performed random assessments
of the means that these officials used to verify the information.

In addition, we conducted a search of technical literature and periodicals
to develop a comprehensive list of federal government data mining efforts
and then compared these efforts with data mining efforts reported by
federal agencies. If the data mining efforts on our lists were not
reported on the survey, we contacted the appropriate chief information
officers and, with their concurrence, added the efforts.

We performed our work from May 2003 to April 2004 in accordance with
generally accepted government auditing standards. Additional details on
our scope and methodology are provided in appendix I.

Results in Brief	Federal agencies are using data mining for a variety of
purposes, ranging from improving service or performance to analyzing and
detecting terrorist patterns and activities. Our survey of 128 federal
departments and agencies on their use of data mining shows that 52
agencies are using or are planning to use data mining. These departments
and agencies reported 199 data mining efforts, of which 68 were planned
and 131 were operational. The most common uses of data mining efforts were
described by agencies as

o  improving service or performance;

o  detecting fraud, waste, and abuse;

o  analyzing scientific and research information;

2That is, we asked about both systems explicitly dedicated to data mining
and activities using automated tools to "mine" databases that are part of
other systems. In this report, we use the word "efforts" to refer to both
systems and activities, unless otherwise specified.

o  managing human resources;

o  detecting criminal activities or patterns; and

o  analyzing intelligence and detecting terrorist activities.

The Department of Defense reported having the largest number of data
mining efforts aimed at improving service or performance and at managing
human resources. Defense was also the most frequent user of efforts aimed
at analyzing intelligence and detecting terrorist activities, followed by
the Departments of Homeland Security, Justice, and Education.

The Department of Education reported the largest number of efforts aimed
at detecting fraud, waste, and abuse, while the National Aeronautics and
Space Administration targets most of their data mining efforts (21 out of

23) toward analyzing scientific and research information. Data mining
efforts for detecting criminal activities or patterns, however, were
spread relatively evenly among the reporting agencies.

In addition, out of all 199 data mining efforts identified, 122 used
personal information. For these efforts, the primary purposes were
detecting fraud, waste, and abuse; detecting criminal activities or
patterns; analyzing intelligence and detecting terrorist activities; and
increasing tax compliance.

Agencies also identified efforts to mine data from the private sector and
data from other federal agencies, both of which could include personal
information. Of 54 efforts to mine data from the private sector (such as
credit reports or credit card transactions), 36 involve personal
information. Of 77 efforts to mine data from other federal agencies, 46
involve personal information (including student loan application data,
bank account numbers, credit card information, and taxpayer identification
numbers).

Background	Data mining enables corporations and government agencies to
analyze massive volumes of data quickly and relatively inexpensively. The
use of this type of information retrieval has been driven by the
exponential growth in the volumes and availability of information
collected by the public and private sectors, as well as by advances in
computing and data storage capabilities. In response to these trends,
generic data mining tools are increasingly available for-or built
into-major commercial database applications. Today, mining can be
performed on many types of data,

including those in structured, textual, spatial, Web, or multimedia forms.
Data mining is becoming a big business; Forrester Research has estimated
that the data mining market is passing the billion dollar mark.

Although the use and sophistication of data mining have increased in both
the government and the private sector, data mining remains an ambiguous
term. According to some experts, data mining overlaps a wide range of
analytical activities, including data profiling, data warehousing, online
analytical processing, and enterprise analytical applications.3 Some of
the terms used to describe data mining or similar analytical activities
include "factual data analysis" and "predictive analytics." We surveyed
technical literature and developed a definition of data mining based on
the most commonly used terms found in this literature. Based on this
search, we define data mining as the application of database technology
and techniques-such as statistical analysis and modeling-to uncover hidden
patterns and subtle relationships in data and to infer rules that allow
for the prediction of future results. We used this definition in our
initial survey of chief information officers; these officials found the
definition sufficient to identify agency data mining efforts.

Data mining has been used successfully for a number of years in the
private and public sectors in a broad range of applications. In the
private sector, these applications include customer relationship
management, market research, retail and supply chain analysis, medical
analysis and diagnostics, financial analysis, and fraud detection. In the
government, data mining was initially used to detect financial fraud and
abuse. For example, data mining has been an integral part of GAO audits
and investigations of federal government purchase and credit card
programs.4 Data mining and related technologies are also emerging as key
tools in Department of Homeland Security initiatives.

3Lou Agosta, "Data Mining Is Dead-Long Live Predictive Analytics!"
(Forrester Research, Oct. 30, 2003),
http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.html
(downloaded Jan. 26, 2004).

4For more information on the uses of data mining in GAO audits, see U.S.
General Accounting Office, Data Mining: Results and Challenges for
Government Programs, Audits, and Investigations, GAO-03-591T (Washington,
D.C: Mar. 25, 2003).

    Data Mining Poses Privacy Challenge

Since the terrorist attacks of September 11, 2001, data mining has been
seen increasingly as a useful tool to help detect terrorist threats by
improving the collection and analysis of public and private sector data.
In a recent report on information sharing and analysis to address the
challenges of homeland security, it was noted that agencies at all levels
of government are now interested in collecting and mining large amounts of
data from commercial sources.5 The report noted that agencies may use such
data not only for investigations of known terrorists, but also to perform
large-scale data analysis and pattern discovery in order to discern
potential terrorist activity by unknown individuals. Such use of data
mining by federal agencies has raised public and congressional concerns
regarding privacy.

One example of a large-scale development effort launched in the wake of
the September 11 attacks is the Multistate Anti-terrorism Information
Exchange System, known as MATRIX. MATRIX, currently used in five states,6
provides the capability to store, analyze, and exchange sensitive
terrorism-related and other criminal intelligence data among agencies
within a state, among states, and between state and federal agencies.
Information in MATRIX databases includes criminal history records,
driver's license data, vehicle registration records, incarceration
records, and digitized photographs. Public awareness of MATRIX and of
similar large-scale data mining or data mining-like projects has led to
concerns about the government's use of data mining to conduct a mass
"dataveillance"7-a surveillance of large groups of people-to sift through
vast amounts of personally identifying data to find individuals who might
fit a terrorist profile.

5Creating a Trusted Information Network for Homeland Security (New York
City: The Markle Foundation, December 2003),
http://www.markletaskforce.org/Report2_Full_Report.pdf (downloaded Mar. 8,
2004).

6Five states are currently participating in the MATRIX pilot project:
Connecticut, Florida, Michigan, Ohio, and Pennsylvania.

7Roger Clarke, "Information Technology and Dataveillance," Communications
of the ACM, vol. 31, issue 5 (New York City: ACM Press, May 1988),
http://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.html (downloaded Mar.
5, 2004). Clarke defines mass dataveillance as the systematic use of
personal data systems in the investigation or monitoring of the actions or
communications of groups of people.

Mining government and private databases containing personal information
creates a range of privacy concerns. Through data mining, agencies can
quickly and efficiently obtain information on individuals or groups by
exploiting large databases containing personal information aggregated from
public and private records. Information can be developed about a specific
individual or about unknown individuals whose behavior or characteristics
fit a specific pattern. Before data aggregation and data mining came into
use, personal information contained in paper records stored at widely
dispersed locations, such as courthouses or other government offices, was
relatively difficult to gather and analyze. As one expert noted, data
mining technologies that provide for easy access and analysis of
aggregated data challenge the concept of privacy protection afforded to
individuals through the inherent inefficiency of government agencies
analyzing paper, rather than aggregated, computer records.8

Privacy concerns about mined or analyzed personal data also include
concerns about the quality and accuracy of the mined data; the use of the
data for other than the original purpose for which the data were collected
without the consent of the individual; the protection of the data against
unauthorized access, modification, or disclosure; and the right of
individuals to know about the collection of personal information, how to
access that information, and how to request a correction of inaccurate
information.9

8K.A. Taipale, "Data Mining and Domestic Security: Connecting the Dots to
Make Sense of Data," The Columbia Science and Technology Law Review, vol.
V, 2003-2004 (New York City: Columbia Law School, 2004),
http://www.stlr.org/cite.cgi?volume=5&article=2 (downloaded Mar. 18,
2004).

9These privacy concerns are reflected in the Fair Information Practices
proposed in 1980 by the Organization for Economic Cooperation and
Development and endorsed by the U.S. Department of Commerce in 1981. These
practices govern collection limitation, purpose specification, use
limitation, data quality, security safeguards, openness, individual
participation, and accountability.

  Agencies Identified Numerous Data Mining Efforts with Various Aims

Of 128 federal departments and agencies surveyed for information on their
planned and operational data mining efforts (listed in app. II), 52
agencies reported 199 data mining efforts, and 69 agencies reported that
they were not engaged in data mining and were not planning such efforts
(listed in app. III). Of the 199 data mining efforts, 68 were planned and
131 were operational. Seven agencies did not respond to our survey.10
Appendix IV lists the 199 data mining efforts reported, along with key
characteristics.

Agencies described the most common purposes of data mining efforts as

o  improving service or performance;

o  detecting fraud, waste, and abuse;

o  analyzing scientific and research information;

o  managing human resources;

o  detecting criminal activities or patterns; and

o  analyzing intelligence and detecting terrorist activities.

As shown in table 1, the Department of Defense reported the largest number
of efforts aimed at improving service or performance (with 19 out of 65
reported efforts) and at managing human resources (with 14 out of 17
efforts). Defense was also the most frequent user of efforts aimed at
analyzing intelligence and detecting terrorist activities, with 5 of 14
efforts, followed by the Departments of Homeland Security and Justice,
with 4 and 3 efforts, respectively. The Department of Education has the
largest number of efforts aimed at detecting fraud, waste, and abuse (9
out of 24 efforts reported). The National Aeronautics and Space
Administration accounts for 21 of the 23 identified efforts for analyzing
scientific and research information. Efforts are spread relatively evenly
among the agencies that reported using data mining efforts for detecting
criminal

10Agencies that did not respond to our survey are (1) the Central
Intelligence Agency; (2) the Corporation for National and Community
Services; (3) the Department of Army, Department of Defense; (4) the Equal
Employment Opportunity Commission; (5) the National Park Service,
Department of the Interior; (6) the National Security Agency, Department
of Defense; and (7) the Rural Utilities Service, Department of
Agriculture.

activities or patterns. Table 1 summarizes the top six uses of data mining
efforts among the responding agencies.

Table 1: Top Six Purposes of Data Mining Efforts in Departments and Agencies and
                           Number of Efforts Reported

                                                                      Analyzing 
                                    Analyzing            Detecting intelligence 
              Improving Detecting scientific   Managing   criminal          and 
                                  and                                 detecting 
             service or fraud,       research     human activities    terrorist 
                        waste,                          or         
Department  performance and abuse information resources   patterns   activities 
 or agency                                                         
Department                                                         
    of                8         1                                  
Agriculture                                                        

                             Department of Commerce

            Department of Defense            19     1     1    14    1   
           Department of Education            6     9                3   
             Department of Energy                                    3   
        Department of Health and Human                                   
                   Services                   4           1              
       Department of Homeland Security        5                 2    2   
          Department of the Interior          1                          
            Department of Justice             1                 1    3   
             Department of Labor              3     1                    
             Department of State                    2                    
         Department of Transportation               1                    
          Department of the Treasury          4     1                2   
        Department of Veterans Affairs        5     5                1   
       Environmental Protection Agency              1                    
       Export-Import Bank of the United                                  
                    States                    1                          
    Federal Deposit Insurance Corporation     1                          
            Federal Reserve System                  1                    
        National Aeronautics and Space                                   
                Administration                1     1    21              
        Nuclear Regulatory Commission         1                          
        Office of Personnel Management        1                          
     Pension Benefit Guaranty Corporation     2                          
          Railroad Retirement Board           1                          
        Small Business Administration         1                          
                    Total                    65    24    23    17    15    14 

                 Source: GAO analysis of agency-provided data.

Some data mining purposes focus on human activities and therefore are
inherently likely to involve personal information; examples of these
purposes are detecting fraud, waste, and abuse; detecting criminal
activities or patterns; managing human resources; and analyzing
intelligence. The following are examples of data mining efforts for each
of these purposes:

o 	Detecting fraud, waste, and abuse. The Veterans Benefits
Administration's C & P Payment Data Analysis effort mines veterans'
compensation and pension data for evidence of fraud.

o 	Detecting criminal activities or patterns. The Department of
Education's Title IV Identity Theft Initiative effort focuses on identity
theft cases involving education loans.

o 	Managing human resources. The U.S. Air Force's Oracle HR (Human
Resources) uses data mining to provide information on promotions, pay
grades, clearances, and other information relevant to human resources
planning.

o 	Analyzing intelligence and detecting terrorist activities. The Defense
Intelligence Agency's Verity K2 Enterprise mines data from the
intelligence community and Internet sources to identify foreign terrorists
or U.S. citizens connected to foreign terrorism activities.

On the other hand, other categories of efforts do not necessarily focus on
human activities or involve personal information, such as many of the
efforts aimed at analyzing scientific and research information. The
National Aeronautics and Space Administration, for example, mines large,
complex earth science data sets to find patterns and relationships to
detect hidden events (the system is called Machine Learning and Data
Mining for Improved Data Understanding of High Dimensional Earth Sensed
Data).

Similarly, many efforts aimed at improving service or performance (the
most frequently cited purpose of data mining efforts) do not involve
personal information. For example, the Department of the Navy's Supply
Management System Multidimensional Cubes system includes a data warehouse
containing data on every ship part that has been ordered since the 1980s,
with multidimensional information on each part. The Navy uses data mining
to calculate failure rates and identify needed improvements; according to
the Navy, this system reduces downtime on ships by improving parts
replacement.

However, some efforts aimed at improving service or performance do involve
personal information. For example, the Veterans Administration's VISN
(Veterans Integrated Service Network) 16 Data Warehouse is mined for a
variety of information, including patient visits, laboratory tests, and
pharmacy records, to provide management with health care system
performance information.

Overall, 122 of the 199 data mining efforts involve personal information.
Figure 1 shows the top six purposes of these efforts, as well as their
distribution.

Figure 1: Top Six Purposes of Data Mining Efforts That Involve Personal
Information

Purposes

Increasing tax compliance

Analyzing intelligence and detecting terrorist activities

Detecting criminal activities or patterns

Managing human resources

Detecting fraud, waste, and abuse

Improving service or performance 33

0 10203040 Number of data mining efforts

Source: GAO analysis of agency data.

Of the 199 data mining efforts, 54 use or plan to use data from the
private sector. Of these, 36 involve personal information. The personal
information from the private sector included credit reports and credit
card transaction records. Figure 2 shows the distribution of the top six
purposes of the 54 efforts involving data from the private sector.

Figure 2: Top Six Purposes of Data Mining Efforts That Involve Private
Sector Data

Purposes

Improving safety

Detecting criminal activities or patterns

Analyzing scientific and research information

Analyzing intelligence and detecting terrorist activities

Detecting fraud, waste, and abuse

Improving service or performance 14

0 10203040

Number of data mining efforts Source: GAO analysis of agency data.

Of the 199 data mining efforts, 77 efforts use or plan to use data from
other federal agencies. Of the 77 efforts, 46 involve personal
information. The personal information from other federal agencies included
student loan application data, bank account numbers, credit card
information, and taxpayer identification numbers. Figure 3 shows the top
six uses for the 77 efforts involving data from other federal agencies and
their distribution.

Figure 3: Top Six Purposes of Data Mining Efforts That Involve Data from
Other Federal Agencies

                                    Purposes

Managing human resources

Detecting fraud, waste, and abuse

                   Detecting criminal activities or patterns

Analyzing intelligence and detecting terrorist activities

Analyzing scientific and research information

Improving service or performance

                                       20

                    0 10203040 Number of data mining efforts

                      Source: GAO analysis of agency data.

Summary	Driven by advances in computing and data storage capabilities and
by growth in the volumes and availability of information collected by the
public and private sectors, data mining enables government agencies to
analyze massive volumes of data. Our survey shows that data mining is
increasingly being used by government for a variety of purposes, ranging
from improving service or performance to analyzing and detecting terrorist
patterns and activities.

Although this survey provides a broad overview of the emerging uses of
data mining in the federal government, more work is needed to shed light
on the privacy implications of these efforts. In future work, we plan to
examine selected federal data mining efforts and their implications.

As agreed with your office, unless you publicly announce the contents of
the report earlier, we plan no further distribution until 30 days from the
report date. At that time, we will send copies of this report to the
Chairmen and Ranking Minority Members of the House Committee on Government
Reform; Subcommittee on Civil Service and Agency Organization, House
Committee on Government Reform; Select Committee on Homeland Security,
House of Representatives; Senate Committee on Governmental

Affairs; and the Subcommittee on Oversight of Government Management, the
Federal Workforce and the District of Columbia, Senate Committee on
Governmental Affairs. We will also make copies available to others on
request. In addition, this report will be available at no charge on the
GAO Web site at http://www.gao.gov.

If you have any questions concerning this report, please call me at (202)
512-6240 or Mirko J. Dolak, Assistant Director, at (202) 512-6362. We can
also be reached by e-mail at [email protected] and [email protected],
respectively. Key contributors to this report were Camille M. Chaires,
Barbara S. Collier, Orlando O. Copeland, Nancy E. Glover, Stuart M.
Kaufman, Lori D. Martinez, Morgan F. Walts, and Marcia C. Washington.

Sincerely yours,

Linda D. Koontz Director, Information Management Issues

Appendix I

                       Objective, Scope, and Methodology

Our objective was to identify and describe planned and operational federal
data mining efforts. As a first step in addressing this objective, we
developed a definition of "data mining." Because this expression has a
range of meanings, we surveyed the technical literature to develop a
definition based on the most commonly used terms found in this literature.
We defined data mining as the application of database technology and
techniques-such as statistical analysis and modeling-to uncover hidden
patterns and subtle relationships in data and to infer rules that allow
for the prediction of future results. In our initial survey of chief
information officers, these officials found the definition sufficient to
identify agency data mining efforts.

We then surveyed chief information officers or comparable officials at 128
federal departments and agencies (see app. II) and asked them to identify
whether their agency had operational and planned data mining efforts. We
achieved a 95 percent response rate. Of the 121 agencies that responded,
69 reported that they did not have any data mining efforts (see app. III).
We followed up with these 69 agencies and gave them another opportunity to
report data mining efforts.

To obtain information on the characteristics of the identified operational
or planned data mining efforts, we conducted structured telephone
interviews1 with the identified system owners or activity managers. The
interviews were designed to obtain detailed information about each data
mining system, including the purpose and size, the use of personal
information, and the use of data from the private sector or other federal
organizations. We pretested the structured interview to ensure relevance
and clarity.

We aggregated these data by agency and sent them back to the chief
information officer, comparable official, or their designee and asked that
they review the characteristics for completeness and accuracy. One of the
52 departments and agencies that reported data mining systems-the
Department of Homeland Security-has not responded to our request to review
the reported data for completeness and accuracy.

1In a structured interview, the interviewer asks the same questions of
numerous individuals or individuals representing numerous organizations in
a precise manner, offering each interviewee the same set of possible
responses.

We performed random assessments of the means that these officials used to
verify the information. Based on these assessments, we concluded that the
agencies' verification methods were reasonable and that as a result, we
could rely on the accuracy of the reported data. We also conducted a
search of technical literature and periodicals to develop a list of
federal government data mining efforts and then compared the efforts on
this list with the data mining efforts reported by federal agencies. If
the data mining efforts on our list were not reported on the survey, we
contacted the chief information officer or comparable official to
determine whether that data mining effort should be included in our
survey.

Because this was not a sample survey, there are no sampling errors.
However, the practical difficulties of conducting any survey may introduce
errors, commonly referred to as nonsampling errors. For example,
difficulties in how a particular question is interpreted, in the sources
of information that are available to respondents, or in how the data are
entered into a database or were analyzed can introduce unwanted
variability into the survey results. We took steps in the development of
the structured interview, the data collection, and the data analysis to
minimize these nonsampling errors. Among these steps, we pretested the
structured interview instrument, contacted nonresponding agencies as well
as agencies not identifying data mining efforts, and sent the aggregated
data to the agency chief information officer for review.

We conducted our work from May 2003 to April 2004 in accordance with
generally accepted government auditing standards.

Appendix II

                       Surveyed Departments and Agencies

Department of Agriculture

o  Agricultural Marketing Service

o  Agricultural Research Service

o  Animal and Plant Health Inspection Service

o  Cooperative State Research, Education, and Extension Service

o  Farm Service Agency

o  Food and Nutrition Service

o  Food Safety and Inspection Service

o  Foreign Agricultural Service

o  Forest Service

o  National Agricultural Statistics Service

o  Natural Resources Conservation Service

o  Risk Management Agency

o  Rural Utilities Service Department of Commerce

o  Bureau of the Census

o  Economic Development Administration

o  International Trade Administration

o  National Oceanic and Atmospheric Administration

o  U.S. Patent and Trademark Office

Department of Defense

o  Missile Defense Agency

o  Defense Advanced Research Projects Agency

o  Defense Commissary Agency

o  Defense Contract Audit Agency

o  Defense Contract Management Agency

o  Defense Information Systems Agency

o  Defense Intelligence Agency

o  Defense Legal Services Agency

o  Defense Logistics Agency

o  Defense Security Cooperation Agency

o  Defense Security Service

o  Defense Threat Reduction Agency

o  Department of the Air Force

o  Department of the Army

o  Department of the Navy

o  National Geospatial-Intelligence Agency

o  National Security Agency

o  U.S. Marine Corps Department of Education

Department of Energy

o  Bonneville Power Administration

o  Southeastern Power Administration

o  Southwestern Power Administration

o  Western Area Power Administration Department of Health and Human
Services

o  Administration for Children and Families

o  Agency for Healthcare Research and Quality

o  Centers for Disease Control and Prevention

o  Centers for Medicare and Medicaid Services

o  Food and Drug Administration

o  Health Resources and Services Administration

o  Indian Health Service

o  National Institutes of Health

o  Program Support Center Department of Homeland Security

o  Border and Transportation Security Directorate

o  Bureau of Citizenship and Immigration Services

o  Emergency Preparedness and Response Directorate

o  Information Analysis and Infrastructure Protection Directorate

o  Management Directorate

o  Science and Technology Directorate

o  U.S. Coast Guard

o  U.S. Secret Service Department of Housing and Urban Development
Department of the Interior

o  Bureau of Indian Affairs

o  Bureau of Land Management

o  Bureau of Reclamation

o  Minerals Management Service

o  National Park Service

o  Office of Surface Mining Reclamation and Enforcement

o  U.S. Fish and Wildlife Service

o  U.S. Geological Survey Department of Justice

o  Bureau of Alcohol, Tobacco, Firearms, and Explosives

o  Drug Enforcement Administration

o  Federal Bureau of Investigation

o  Federal Bureau of Prisons

o  U.S. Marshals Service Department of Labor Department of State

Department of Transportation

o  Federal Aviation Administration

o  Federal Highway Administration

o  Federal Motor Carrier Safety Administration

o  Federal Railroad Administration

o  Federal Transit Administration

o  National Highway Traffic Safety Administration
Department of the Treasury

o  Bureau of Engraving and Printing

o  Bureau of the Public Debt

o  Financial Management Service

o  Internal Revenue Service

o  Office of the Comptroller of the Currency

o  Office of Thrift Supervision

o  U.S. Mint
Department of Veterans Affairs

o  Veterans Benefits Administration

o  Veterans Health Administration
Agency for International Development
Central Intelligence Agency
Corporation for National and Community Service

Environmental Protection Agency
Equal Employment Opportunity Commission
Executive Office of the President
Export-Import Bank of the United States
Federal Deposit Insurance Corporation
Federal Energy Regulatory Commission
Federal Reserve System
Federal Retirement Thrift Investment Board
General Services Administration
Legal Services Corporation
National Aeronautics and Space Administration
National Credit Union Administration
National Labor Relations Board
National Science Foundation
Nuclear Regulatory Commission
Office of Management and Budget
Office of Personnel Management
Peace Corps
Pension Benefit Guaranty Corporation
Railroad Retirement Board
Securities and Exchange Commission

Small Business Administration Smithsonian Institution Social Security
Administration U.S. Postal Service

Appendix III

Departments and Agencies Reporting No Data Mining Efforts

The following 69 departments and agencies reported that they have no
operational or planned data mining efforts:

Department of Agriculture

o  Agricultural Marketing Service

o  Agricultural Research Service

o  Animal and Plant Health Inspection Service

o  Cooperative State Research, Education, and Extension Service

o  Farm Service Agency

o  Foreign Agricultural Service

o  Forest Service

o  National Agricultural Statistics Service

o  Food Safety and Inspection Service Department of Commerce

o  Economic Development Administration

o  Bureau of the Census

o  International Trade Administration

o  Department of Commerce Headquarters

o  National Oceanic and Atmospheric Administration Department of Defense

o  Defense Contract Audit Agency

o  Missile Defense Agency

o  Defense Legal Services Agency

Appendix III
Departments and Agencies Reporting No
Data Mining Efforts

o  Defense Security Service

o  Defense Threat Reduction Agency

o  Defense Logistics Agency

o  Defense Advanced Research Projects Agency

o  Defense Contract Management Agency

o  Defense Security Cooperation Agency Department of Energy

o  Bonneville Power Administration

o  Southeastern Power Administration

o  Southwestern Power Administration

o  Western Area Power Administration Department of Health and Human
Services

o  Centers for Medicare and Medicaid Services

o  Administration for Children and Families

o  National Institutes of Health

o  Indian Health Service Department of Homeland Security

o  Science and Technology Directorate

o  Management Directorate

o  Bureau of Citizenship and Immigration Services

o  Department of Homeland Security Headquarters

Appendix III
Departments and Agencies Reporting No
Data Mining Efforts

Department of Housing and Urban Development Department of the Interior

o  Bureau of Reclamation

o  Bureau of Land Management

o  U.S. Geological Survey

o  Fish and Wildlife Service

o  Office of Surface Mining Reclamation and Enforcement

o  Bureau of Indian Affairs

o  Department of the Interior Headquarters Department of Justice

o  Bureau of Alcohol, Tobacco, Firearms, and Explosives Department of
Transportation

o  Federal Aviation Administration

o  Federal Transit Administration

o  Federal Railroad Administration

o  Federal Motor Carrier Safety Administration

o  Federal Highway Administration Department of the Treasury

o  Comptroller of the Currency

o  Bureau of the Public Debt

o  Office of Thrift Supervision

Appendix III
Departments and Agencies Reporting No
Data Mining Efforts

o  Department of the Treasury Headquarters

o  Bureau of Engraving and Printing
Agency for International Development
Executive Office of the President
Federal Energy Regulatory Commission
Federal Retirement Thrift Investment Board
General Services Administration
Legal Services Corporation
National Credit Union Administration
National Labor Relations Board
National Science Foundation
Office of Management and Budget
Peace Corps
Security and Exchange Commission
Smithsonian Institution
Social Security Administration
U.S. Postal service

Appendix IV

Inventories of Efforts

The following tables present selected information from our survey of 128
major federal departments and agencies on their use of data mining. The
tables list the purpose of each data mining effort, whether the system is
planned or operational, and whether the system uses personal information,
data from the private sector, or data from other federal agencies. The
survey shows that 52 departments and agencies are using or are planning to
use data mining. These departments and agencies reported 199 data mining
efforts, of which 68 were planned and 131 were operational.

 Table 2: Department of Agriculture's Inventory of Data Mining Efforts Features

Other agency data

Organization/
system name Description Purpose Status

Personal information Private sector data

       Department of Agriculture Headquarters Food and Nutrition Service

Travel Data Mart Will consolidate employee    Improving  Planned Yes No No 
                    travel                                                 
                    information from financial  service or                 
                    and                                                    
                    travel systems. Will allow                             
                    for a                       performance                
                    governmentwide e-travel                                
                    system                                                 
                    and provide the department                             
                    with                                                   
                        information on the                                 
                             financial                                     
                       ramifications of its                                
                              travel.                                      

      Financial    Is used in the production  Financial  Operational No No No 
     Statements                of                                          
Data Warehouse  consolidated financial                                  
                   statements.                management                   
                   Provides information for                                
                   products                                                
                    that are used to satisfy                               
                            external                                       
                    reporting requirements,                                
                            such as                                        
                   Office of Management and                                
                   Budget                                                  
                     and Department of the                                 
                            Treasury                                       
                         requirements.                                     

Financial Data    Is the department's     Financial  Operational Yes No No 
                           internal                                        
     Warehouse    financial management                                     
                  reporting                  management                    
                      system. Data mining is                               
                                 done for ad                               
                  hoc and on-demand reports.                               

                        Assists in                                            
    Grantee Monitoring  monitoring the       Improving  Operational Yes No No
                        financial                                          
Activities-Southeast   status of grant   service or                     
                         holders. Grantees                                 
     Regional Office      are required to   performance                    
                              provide                                      
                        expenditure                                        
                        reports, and                                       
                        analysis                                           
                           is performed                                    
                          quarterly that                                   
                          matches stated                                   
                           draws to the                                    
                         actual draws from                                 
                             the U.S.                                      
                             Treasury.                                     

                         (Continued From Previous Page)

                                    Features

Other agency data

Organization/
system name Description Purpose Status

Personal information Private sector data

Grantee Monitoring       Assists in       Improving  Operational Yes No No 
                          monitoring the                                   
Activities-Mountain    management and    service or                     
                         distribution of                                   
     Plains Regional   Indian funds for     performance                    
                       major food benefit                                  
         Office        programs, such as                                   
                       food stamps, in                                     
                        10 grantee states.                                 

Grantee Monitoring   Maximizes on-site    Improving  Operational Yes No No 
                           monitoring                                      
      Activities-     efforts by confirming service or                     
                      the accuracy                                         
Southwest Regional of grantee            performance                    
                      accounting. Reduces                                  
         Office           on-site time,                                    
                        maximizes time to                                  
                      complete reviews, and                                
                               has                                         
                      achieved a 50 percent                                
                             travel                                        
                            savings.                                       

Grantee Monitoring    Will be a reporting     Improving  Planned No No Yes 
                              system to                                   
Activities-Midwest provide reports and       service or                
                      automate the                                        
    Regional Office   audit process. Plans are  performance               
                                 to                                       
                      acquire data mining tools                           
                      to review                                           
                        and compare budgets,                              
                              reports,                                    
                             and plans.                                   

    Grantee Monitoring   Supports on-site   Improving  Operational Yes Yes No 
                            reviews of                                     
Activities-Northeast    analyses to     service or                      
                        confirm financial                                  
     Regional Office          report       performance                     
                           information.                                    
                        Will create ad-hoc               Planned   No  No  No 
    Integrated Program      reporting       Improving                      
                            centers to                                     
    Accounting System        validate      service or                      
                            accounting                                     
      Data Integrity       information.    performance                     

         Natural Resources Conservation Service Risk Management Agency

    National Resource  Is a trending          Improving  Operational No No No 
                       database that tracks                                
Inventory Used for      more than 200     service or                    
                          resource issues                                  
       Statistical     such as monitoring    performance                   
       Analysis of     erosion. Also                                       
    Past Soil Survey   processes statistical                               
                            technology.                                    
       Databases.                                                          

CAE   Is part of a congressionally     Detecting   Operational Yes Yes Yes 
         mandated project to assist the fraud, waste,                     
           Risk Management Agency in      and abuse                       
         controlling fraud, waste, and                                    
           abuse in the Federal Crop                                      
         Insurance Corporation program.                                   

                       Source: Department of Agriculture.

  Table 3: Department of Commerce's Inventory of Data Mining Efforts Features

Other agency data

Organization/
system name Description Purpose Status

Personal information Private sector data

                        U.S. Patent and Trademark Office

      Compensation     Generates and makes   Managing  Operational Yes No Yes 
                       available                                          
Projection Model in     compensation        human                      
                         projection data,                                 
the Enterprise Data    both salary and    resources                    
                           benefits, on                                   
        Warehouse      current employees and                              
                                on                                        
                       planned hires. It                                  
                       also accounts for                                  
                        planned attritions.                               

                        Source: Department of Commerce.

Table 4: Department of Defense's Inventory of Data Mining Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

          Defense Commissary Agency Defense Information Systems Agency

DeCA Electronic Will be a corporate         Improving  Planned Yes Yes Yes 
                   information                                            
       Records            system for managing service or                  
                                 unstructured                             
Management and  data. It will allow for    performance                 
                   electronic                                             
Archive System   record keeping, document                              
                   management, and automated                              
                       receipt processes.                                 

Corporate Decision Mines data to produce   Improving  Operational No No No 
                            analytical                                     
    Support System/     data on commissary   service or                    
                           operations.                                     
       Commissary     Provides information   performance                   
                      such as what                                         
       Operations     items stores are                                     
                      selling and helps                                    
Management System    determine whether                                  
                           cashiers are                                    
                          being honest.                                    

Enterprise Business Will replace the current   Improving  Planned No No No 
Intelligence System  management information   service or                
                          environment, which     performance               
                               includes                                    
                        operations, reporting,                             
                               billing,                                    
                       statistics, and other                               
                       management                                          
                        information activities.                            

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

            Defense Intelligence Agency Department of the Air Force

Insight Smart  Will be a data mining   Analyzing     Planned   Yes No  Yes 
                  knowledge                                               
     Discovery      discovery tool to   intelligence                      
                      work against                                        
                  unstructured text.    and detecting                     
                  Will categorize                                         
                  nouns (names,           terrorist                       
                  locations, events)                                      
                  and present                                             
                  information in         activities                       
                  images.                                                 
     Verity K2    Mines data from the                             Yes Yes Yes 
     Enterprise   intelligence            Analyzing   Operational         
                  community and         intelligence                      
                  Internet searches                                       
                  to identify foreign   and detecting                     
                  terrorists or U.S.                                      
                  citizens connected to   terrorist                       
                         foreign                                          
                  terrorism activities.  activities                       
     PATHFINDER   Is a data mining tool                           Yes No  Yes 
                  developed for           Analyzing   Operational         
                  analysts that                                           
                  provides the ability  intelligence                      
                  to                                                      
                  analyze government    and detecting                     
                  and private                                             
                    sector databases      terrorist                       
                     rapidly. It can                                      
                   compare and search    activities                       
                        multiple                                          
                     large databases                                      
                        quickly.                                          
                  Is a large search                               No  No  Yes 
      Autonomy    engine tool that        Analyzing   Operational         
                    is used to search   intelligence                      
                       hundreds of                                        
                  thousands of word     and detecting                     
                  documents. Is                                           
                      used for the        terrorist                       
                    organization and                                      
                   knowledge discovery   activities                       
                           of                                             
                      intelligence.                                       

    ANG Data   Will be used to measure military   Measuring Planned Yes No No 
Warehouse-      readiness. It incorporates     military                 
    Guardian   information on all disciplines to  readiness                
                   provide management information                          
                   needed to assess military                               
                           readiness.                                      

Integrated Space     Will be an internal      Improving  Planned Yes No No 
                             database                                      
    Warfare Center   containing information on  service or                 
                                all                                        
        (SWC)       development/execution       performance                
     Information    activities                                             
        System          within the SWC. Will be                            
                                    used by all                            
                      management and analyst                               
                    personnel to track and                                 
                    align the                                              
                      center's activities to                               
                            warfighter                                     
                     needs, report on execution                            
                                        status,                            
                    financial status, schedule                             
                    status,                                                
                    and performance                                        
                    measurements.                                          

Safety Automated Will query databases to find  Improving Planned Yes No No 
     System (SAS)    automation mishaps. Governed  safety                  
                                               by                          
                        Directive 920124 and will                          
                                        allow for                          
                    the investigation and                                  
                    reporting of                                           
                    identified automation                                  
                    mishaps.                                               

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Enterprise Business  Will support strategic   Improving  Planned No No Yes 
                              planning,                                   
         System           assist in building    service or                
                            scientific and                                
                        technical budgets for   performance               
                               the Air                                    
                       Force, and serve as a                              
                       launch point                                       
                        for all new programs.                             
                               Research                                   
                       and development case                               
                       files will be                                      
                       maintained for 75 years;                           
                                 the                                      
                          activity indexes,                               
                            catalogs, and                                 
                         tracks these files.                              

    Genomic and     Analyzes National     Analyzing   Operational No  No  Yes 
                      Institutes of                                       
     Proteomic    Health's genetic data. scientific                       
      Results                            and                              
      Analysis                             research                       
                                         information                      
                  Enhances combat                                 Yes No   No 
    IG Corporate  readiness and           Improving   Operational         
    Information   mission capabilities    service or                      
       System     for Air Combat                                          
                  Command units and      performance                      
                  commanders.                                             
                      It assists in                                       
                    preparing for and                                     
                        conducting                                        
                       inspections.                                       
      Computer      Evaluates network                             No  No   No 
      Network         activities to       Improving   Operational         
Defense System create rules for       information                      
                  intrusion detection                                     
                  system signature sets.   security                       
        FAME      Will serve as a                       Planned   No  No  Yes 
                  central repository       Managing                       
                  for Air Force manpower    human                         
                  information. Will       resources                       
                  track manpower                                          
                  and unit authorization                                  
                         funding.                                         
      Resource    Serves as a manpower                            No  No   No 
       Wizard     tracking                Improving   Operational         
                      system. Tracks      service or                      
                      positions and                                       
                  captures data for      performance                      
                  specific funding                                        
                        purposes.                                         
     Government   Is used in overseeing                           Yes Yes  No 
                  purchases               Detecting   Operational         
Purchase Card  made by Air Force         fraud,                        
                  personnel with            waste,                        
                  government-provided     and abuse                       
                  credit cards.                                           

Ambulatory Data Tracks the initial      Monitoring   Operational Yes No No 
                   diagnosis of                                            
System Queries       patients with the public health                    
                       results of further                                  
                   testing and diagnosis.                                  
                   Allows for                                              
                   early notification of                                   
                   diseases and                                            
                         injuries.                                         

Modus Operandi Is an investigative       Detecting   Operational Yes No No 
                  tool used to                                             
      Database      identify and track      criminal                       
                         trends in                                         
                   criminal behavior. It  activities or                    
                           links                                           
                  characteristics of                                       
                  crimes and                patterns                       
                      provides details on                                  
                             crime scenes                                  
                      and other crime                                      
                         factors.                                          

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

      Executive       Takes data from all     Improving  Operational No No No 
      Decision            functional                                       
Support System           metric balances. service or                    
                            Processes charts                               
                   and graphs to identify    performance                   
                   trends and                                              
                    to make sure goals are                                 
                         accomplished.                                     

Inspire        Is a tool that assists in Performing Operational Yes No Yes 
                                providing a                               
             narrative description of all   strategic                     
           research and development that is  planning                     
            being conducted within the Air                                
               Force. Provides cost and                                   
                   milestone information on                               
                                   research                               
              and development projects.                                   

Discoverer 	Is used to manage personnel Managing Operational Yes No No
records, including individual human aliases and histories. resources

Requirements and Will serve as a repository    Improving  Planned No No No 
                    for new                                                
Concepts System   system projects and system  service or                
                    requirements. It will be     performance               
                    available                                              
                    for consultation for                                   
                    information on                                         
                    all project requests and                               
                    identified                                             
                           requirements.                                   

Business Objects	Is a commercial off-the-shelf tool Managing Operational
Yes No Yes that is used to analyze and report human on human resources
activities. resources

THRMIS    Uses commercial off-the-shelf    Managing  Operational Yes No No 
              software to maintain a data       human                      
            warehouse of integrated inventory resources                    
              and manpower data for the Total                              
            Force: active duty (officer and                                
            enlisted), Air Force Reserve, Air                              
            National Guard, and civilians. Is                              
               used to assess and analyze the                              
                health of the Air Force.                                   

SAS   Is a Web-enabled personnel data  Managing  Operational  Yes  No   No 
          system that gives authorized     human                          
         users worldwide the ability to  resources                        
          tabulate demographic data on                                    
          recruitment, promotion, and                                     
                   retention.                                             

Oracle HR    Is a personnel management     Managing  Operational Yes No No 
              system that manages information   human                      
               for promotions, pay grades,    resources                    
                        clearances, and other                              
                                  information                              
               relevant to human resources.                                

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Health Modeling Provides information and  Improving  Operational Yes No No 
         and       decision                                                
     Informatics   support to the Air Force service or                     
      Division                                                             
      Data Mart    headquarters' surgeon    performance                    
                   general for                                             
                   decision making, policy                                 
                       development, and                                    
                           resource                                        
                     allocation. It also                                   
                           provides                                        
                   performance information                                 
                             and                                           
                     analysis to medical                                   
                        field units in                                     
                    support of performance                                 
                   measurement objectives.                                 

FIRST EDV (BRIO) Will deal with Air Force     Improving  Planned No Yes No 
                    budgets                                                
                    and other components of its service or                 
                    financial environment.      performance                
                    Historical                                             
                    analyses and trend analyses                            
                    will                                                   
                    be performed on the budget                             
                             process.                                      

IG World Is used to store and track data  Improving  Operational Yes No No 
                                        and                                
              requirements, such as lodging service or                     
                                        and                                
            augmentee requirements, for the performance                    
                PAC inspector general.                                     

           Department of Defense Headquarters Department of the Navy

    Automated  Will be used to improve      Managing  Operational Yes Yes Yes 
               personnel                                                  
Continuing      security continuing        human                       
                        evaluation                                        
Evaluation  efforts within Department of resources                     
     System                                                               
               Defense (DOD) by identifying                               
                issues of security concern                                
                    between the normal                                    
                  reinvestigation cycle for                               
                                  those who                               
               hold DOD security clearances                               
                                        and                               
                 have signed a consent form                               
                                    that is                               
                     still in effect.                                     

Human Resource   Is used to improve Navy    Managing  Operational No No No 
Trend Analysis readiness. Data on personnel   human                     
                  manning levels are mined to  resources                   
                  ensure that each Navy unit                               
                  has                                                      
                  the correct number of                                    
                  training                                                 
                       personnel aboard.                                   

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

U.S. Naval  Allows for the assessment of   Managing  Operational Yes No No 
    Academy                                                                
                  academic performance of       human                      
                  midshipmen. It includes     resources                    
                 demographic information,                                  
                  information on grades,                                   
              participation in sports,                                     
              leadership                                                   
              positions, etc. It is an                                     
              extension of                                                 
               the registrar's system and is                               
                 mined for comparisons and                                 
                          trends.                                          

    Navy Training    Provides overall Navy   Managing  Operational Yes Yes No 
        Master              training                                       
Planning System  information to assist in   human                       
                    delivering                                             
                    Navy training in the     resources                     
                    most efficient                                         
                     manner. Pertinent data                                
                              from                                         
                     multiple databases are                                
                      consolidated into a                                  
                             single                                        
                    database that is mined.                                

        DHAMS          Is a database that     Improving  Operational No No No 
                            contains                                       
Multidimensional information on the time  service or                    
                              and                                          
        Cubes       attendance of 3,000                                    
                    mariners                 performance                   
                    across 120 ships. Allows                               
                    managers to look at what                               
                    people                                                 
                             were doing at a                               
                         particular time and                               
                    to look across the fleet                               
                                  as a whole                               
                        and compare ship                                   
                          activities.                                      

     National       Is used to conduct      Analyzing   Operational No Yes No 
       Cargo            predictive                                         
Tracking Plan       analysis for       intelligence                     
       Cargo        counterterrorism,                                      
     Tracking     small weapons of mass   and detecting                    
     Division                                                              
                 destruction                                               
                 proliferation,             terrorist                      
                 narcotics,                                                
                 alien smuggling, and      activities                      
                 other high-                                               
                   interest activities                                     
                        involving                                          
                    container shipping                                     
                        activity.                                          

Supply Management   Reduces downtime on    Improving  Operational No No No 
                            ships by                                       
        System          allowing for the     service or                    
                        analysis of ship                                   
Multidimensional  parts information. The  performance                   
                              data                                         
         Cubes       warehouse contains data                               
                     on every                                              
                     part that has been                                    
                     ordered since                                         
                       the 1980s, and has                                  
                        multidimensional                                   
                         information on                                    
                       each part. Failure                                  
                          rates can be                                     
                     calculated and                                        
                     improvements can                                      
                         be identified.                                    

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

    Type Commanders  Is designed to provide a Measuring Operational No No Yes 
                              fully                                       
       Readiness     integrated environment   military                    
                     for online                                           
Management System analytical processing of readiness                   
                     readiness                                            
                     indicators. Examples of                              
                     readiness                                            
                        indicators include                                
                            status of                                     
                       supplies available,                                
                           equipment in                                   
                        operation, health                                 
                           status, and                                    
                       capabilities of the                                
                              crew.                                       

    FATHOM (APMC-    Will be an internal    Managing     Planned   Yes No  No 
                         program and                                      
Human Resources)  project tool used to     human                       
                           improve                                        
                    staffing, recruiting,   resources                     
                    and managing                                          
                    day-to-day operations.                                
    Navy Training    Is used for planning                          No  No Yes 
        Quota                and            Improving  Operational        
      Management     forecasting training  service or                     
        System           needs based                                      
                    on skill requirements. performance                    

                    National Geospatial-Intelligence Agency

OLAP (On-Line  Will provide aggregations of    Improving  Planned No No No 
    Analytical   imagery system performance data service or                
    Processing)    for management officers and   performance               
                   senior source decision makers                           
                                              to                           
                 characterize system performance                           
                 and contribution to                                       
                 intelligence                                              
                  issues of national priority.                             

      CITO Data       Will evaluate and identify  Improving  Planned No No No 
       Mining                            imagery                           
                   system performance trends for service or                
                   optimization, monitoring, or  performance               
                          reengineering.                                   

Information Relevance Prototype Will establish an information relevancy
prototype to serve as a framework for community evaluation of commercial
information relevance approaches, methods, and technology. The term
information relevance refers to the ability of users to receive or
extract, then display and describe, information with measurable
satisfaction according to their need.

Improving Planned No No No
service or
performance

                               U.S. Marine Corps

Operational Data Is used for workforce planning. Managing Operational Yes
No No Store Enterprise human

                                   resources

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Global Combat Support Systems- Marine Corps Will be a physical
implementation of the IT enterprise architecture designed to support both
improved and enhanced marine air/ground task force combat service support
functions and commander and combatant commander joint task force combatant
support information requirements. Data mining will allow for
interoperability with legacy Marine Corps systems and allow for a shared
data environment.

Improving Planned No Yes No
service or
performance

    Total Force Data     Is a system whose    Managing  Operational Yes No No 
                              primary                                      
        Warehouse      purpose is workforce     human                      
                       planning and                                        
                       workforce policy       resources                    
                       decision making.                                    
                       It contains current                                 
                       (after 30 days)                                     
                           and historical                                  
                          workforce data.                                  
                           Is a Web-based                           Yes No No 
      Marine Corps          information       Managing  Operational        
       Recruiting      system used for          human                      
                       managing assets                                     
Information Support and tracking enlisted  resources                    
                       and officer                                         
                       accessions into the                                 
         System        Marine Corps.                                       

                         Source: Department of Defense.

  Table 5: Department of Education's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

    Citizenship of     Looks for issues    Improving  Operational Yes Yes Yes 
         PLUS             regarding                                       
    Loan Borrowers-  citizenship among    service or                      
                     its PLUS loan                                        
National Student  borrowers. Flags     performance                     
                     records based                                        
Loan Data Systems on selected criteria                                 
                     and requests                                         
                          additional                                      
                       information from                                   
                           schools.                                       

                     Is a proactive                                           
    Foreign Schools  investigation        Detecting   Operational Yes No  Yes
                     effort                                               
      Initiatives    that looks at                                        
       National      whether financial    criminal                        
                     aid                                                  
                     was granted                                          
Student Loan Data individuals        activities or                     
                     attending                                            
                     foreign                                              
    System/Central   institutions                                         
                     during periods       patterns                        
      Processing     of nonenrollment.                                    
     Professional    Used to determine                            Yes Yes Yes 
                            when          Improving   Operational         
       Judgment         professional     service or                       
      Practices:     judgment has been                                    
     Title IV Pell   exercised for                                        
        Grants,      "special"          performance                       
                     situations                                           
National Student    where families                                     
                       cannot afford                                      
       Loan Data     college expenses.                                    

      Title IV      Compares Department of  Detecting  Operational Yes No Yes 
     Applicant-                                                           
Death Database  Education data with the    fraud,                      
                   Social                     waste,                      
        Match      Security                 and abuse                     
                   Administration's death                                 
                   database to detect fraud                               
                              or                                          
                      criminal activity.                                  

    Title IV Loans   Will compare                                             
         with        information from      Detecting     Planned   Yes No  No
                     the                                                  
    No Applications   Free Application   fraud, waste,                    
                         for Federal                                      
                     Student Aid Program   and abuse                      
                          with the                                        
                       Federal Family                                     
                       Education Loan                                     
                     Program to identify                                  
                           fraud.                                         
                     Compares Department                           Yes No Yes 
      OIG-Project            of            Analyzing   Operational        
      Strikeback     Education and       intelligence                     
                     Federal Bureau of                                    
                     Investigation data  and detecting                    
                     for anomalies.                                       
                     Also verifies                                        
                     personal              terrorist                      
                     identifiers.                                         
                                          activities                      
                     Audits and verifies                           Yes No Yes 
Accuracy of U.S.       personal         Detecting   Operational        
     Department of   information that is fraud, waste,                    
                     contained in the                                     
       Education        Department of      and abuse                      
       Personal          Education's                                      
         Data           personal data                                     
                           system.                                        
                       Audits data to                              Yes No  No 
Impact of Cohort     determine the     Legislative  Operational        
                     impact of                                            
     Default Rate    legislation that       impact                        
                     extended                                             
     Redefinition-   the college loan                                     
                     repayment default                                    
National Student  period from 180 to                                   
                          270 days.                                       
Loan Data System                                                       

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

       CheckFree     Takes monthly         Detecting   Operational Yes Yes No 
                     billing information                                   
Software/Purchase  from the Bank of   fraud, waste,                     
                         America to                                        
     Card Program     create reports on    and abuse                       
                         purchases,                                        
                     purchase quantity,                                    
                     and frequency                                         
                     of purchases. Data                                    
                     are mined for                                         
                     instances of fraud                                    
                          or abuse.                                        

     Improper Pell    Will compare Pell   Detecting     Planned   Yes No   No 
         Grant          Grants issued                                     
    Payment Activity   with the amounts  fraud,                           
                         received and    waste,                           
                         look at the                                      
                        eligibility of    and abuse                       
                            grant                                         
                         recipients.                                      
                      Helps identify                                          
Title IV Identity  patterns and                                Yes No   No
         Theft        trends              Detecting   Operational         
       Initiative     in identity theft    criminal                       
                       cases involving                                    
                          loans for       activities                      
                          education.          or                          
                         Provides an                                      
                        investigative                                     
                         resource for                                     
                           victims         patterns                       
                      of identity theft.                                  
        Title IV      Reviews addresses                           Yes No  Yes 
       Applicant-     listed on Title     Improving   Operational         
    Use of Multiple   IV applications to  service or                      
                       see if they are                                    
Addresses/Central      valid. For     performance                      
                      example, jails or                                   
Processing System      employment                                      
                      addresses are not                                   
                       considered valid                                   
                          addresses.                                      
                      Identifies funds                            No  No   No 
         Lapsed       that remain in the  Improving   Operational         
     Funds/Improper   grants and payment  service or                      
                          processing                                      
    Draw of Federal   system beyond the  performance                      
                      time period for                                     
     Grant Proceeds     allocating the                                    
                            funds.                                        
                       Will support the                 Planned   No  No   No 
    Decision Support     department's     Improving                       
System with Online performance-based   service or                      
                      initiative. Will                                    
       Analytical        allow custom    performance                      
       Processing     queries of schools                                  
                      from state and                                      
         Query        local databases                                     
                      for                                                 
                       demographics and                                   
                         test scores.                                     
         Grant            Assists in                              Yes Yes Yes 
     Administration     managing grant    Detecting   Operational         
and Payment System   activities and   fraud,                           
                      aids in detecting  waste,                           
                      instances of fraud  and abuse                       
                         or abuse in                                      
                      grant activities.                                   

Budget Execution Uses information in the  Financial  Operational Yes No No 
                    National                                               
       Support      Student Loan Data System                               
                    and a                    management                    
                    sample drawn from it to                                
                    estimate                                               
                    cohort distributions for                               
                    financial                                              
                    activities related to                                  
                    the Federal                                            
                    Family Education Loan                                  
                    Program                                                
                      pursuant to the Credit                               
                                 Reform Act.                               

Pell Grant Model Provides estimates on the Financial  Operational No No No 
                    total                                                  
     Assumptions    cost of the Pell Grant                                 
                    program. It               management                   
                      uses data from previous                              
                                    years and                              
                    makes assumptions for                                  
                    future                                                 
                             years.                                        

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

National Student Compiles student      Detecting   Operational Yes No  Yes 
                    loan information                                      
                         from the                                         
Loan Data System    guaranteeing     fraud, waste,                     
                         agencies.                                        
                        Is used for                                       
                        eligibility       and abuse                       
                       tracking and                                       
                       to calculate                                       
                      default rates.                                      
      Loan Model    Estimates the cost    Financial               Yes No  Yes 
                          of loan                     Operational         
     Assumptions      programs. Also                                      
                       analyzes loan    management                        
                     default behavior.                                    
    Office of the    Is part of an OIG                            Yes No  Yes 
                     investigation to     Detecting   Operational         
      Inspector     determine potential   criminal                        
       General           fraud of                                         
(OIG) Projects:     financial aid    activities or                     
                    grants primarily in                                   
     Tumbleweed/      New Hampshire.      patterns                        
       Snowball                                                           
                    Processes                                                 
       Central      applications for                              Yes No   No
      Processing    student               Detecting   Operational         
        System      aid. Contains data  fraud, waste,                     
                       on more than                                       
                        13 million                                        
                    applications. Data    and abuse                       
                            are                                           
                         mined for                                        
                    demographic trends.                                   
     Direct Loan    Is used to track                              Yes Yes Yes 
       Services     the life of student   Improving   Operational         
        System      direct loans and to  service or                       
                       monitor loan                                       
                        repayments.      performance                      

        CheckFree          Uses monthly     Detecting  Operational Yes Yes No 
                        billing information                                
Software/Travel Card    from Bank of     fraud,                         
                         America to create  waste,                         
         Program        reports on travel   and abuse                      
                        expenditures to                                    
                         look for improper                                 
                           use of travel                                   
                              cards.                                       

                        Source: Department of Education.

Table 6: Department of Energy's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Counterintelligence  Is an investigative  Detecting  Operational Yes No No 
                            management                                     
        Automated         system used by      criminal                     
                           Department of                                   
      Investigative     Energy (DOE) field   activities                    
                          sites to track         or                        
    Management System  investigative cases                                 
                       on individuals         patterns                     
        (CI-AIMS)        or countries that                                 
                           threaten DOE                                    
                        assets. Information                                
                          stored in this                                   
                       database is also used                               
                            to support                                     
                       federal and state law                               
                       enforcement                                         
                        agencies in support                                
                            of national                                    
                             security.                                     

Autonomy   Will be used to mine a myriad     Detecting   Planned Yes No No 
             intelligence-related databases     criminal                   
            within the intelligence community activities or                
            to uncover criminal or terrorist    patterns                   
                   activities relating to DOE                              
                                      assets.                              

Counterintelligence  Is used to log     Detecting   Operational Yes No Yes 
                         briefings and                                    
Analytical Research debriefings given   criminal                       
                            to DOE                                        
       Data System       employees who   activities or                    
                       travel to foreign                                  
                       countries or                                       
         (CARDS)       interact with                                      
                       foreign             patterns                       
                       visitors to DOE                                    
                       facilities. Data                                   
                       are                                                
                       mined to identify                                  
                       potential threats                                  
                        to DOE assets.                                    

                         Source: Department of Energy.

Table 7: Department of Health and Human Services' Inventory of Data Mining
                                Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

                   Agency for Healthcare Research and Quality

National Patient Safety Network Will contain reports on adverse medical
events that are filed by hospitals. The planned network's purpose is to
take out patient personal identifiers and other items that may violate
certain rules and create a warehouse that can be used by registered and
unregistered users to evaluateand implement patient safety and quality
measures. The network will be used to create tools that hospitals can use
for making quality improvements.

Improving Planned No No No
service or
performance

Centers for Disease Control and Prevention Department of Health and Human
               Services Headquarters Food and Drug Administration

BioSense        Enhances the nation's   Analyzing   Operational No Yes Yes 
                           capability to                                  
             rapidly detect bioterrorism intelligence                     
                                 events.                                  
                                         and detecting                    
                                           terrorist                      
                                          activities                      

DHHS Blood    Monitors the country's    Monitoring   Operational No Yes No 
                         blood                                             
Monitoring  supply by keeping an       public health                    
     Program   inventory on                                                
               red blood cells and                                         
               platelets and                                               
               monitors blood supply                                       
               shortages,                                                  
                   the nature of the                                       
                     shortage, and                                         
                 size of the shortages.                                    

         Mission       Is a comprehensive redesign and Operational No Yes Yes 
                                 Monitoring                               
Accomplishment and     reengineering of two core                       
                            mission-food or drug                          
       Regulatory      critical legacy systems at Food                    
                                   safety                                 
Compliance Services  and Drug Administration (FDA)                     
         System          that support the regulatory                      
                        functions that primarily take                     
                                    place                                 
                           in FDA's field offices.                        

                         (Continued From Previous Page)

                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

Turbo Establishment Provides a             Improving Operational No Yes No 
                       standardized database                               
    Inspection Report     of citations of      safety                      
                          regulations and                                  
                       statutes, and help                                  
                       investigators in                                    
                       preparing reports. It                               
                            will collect                                   
                          data on specific                                 
                            observations                                   
                       uncovered during                                    
                       inspections and                                     
                       provide a more uniform                              
                               format                                      
                        nationwide that will                               
                             allow for                                     
                       electronic searches                                 
                       and statistical                                     
                           analysis to be                                  
                            performed by                                   
                             citation.                                     

       Phonetic      Is a search engine that  Improving Operational No Yes No 
                     provides                                              
     Orthographic    results indicating how    safety                      
                     similar two                                           
Computer Analysis      drug names are on a                              
                                 phonetic and                              
                     orthographic basis. Its                               
                     purpose is                                            
                     to help in the safety                                 
                     evaluation of                                         
                     proposed proprietary                                  
                     names to                                              
                     reduce drug name                                      
                     confusion after                                       
                     an application is                                     
                     approved by the                                       
                               FDA.                                        

MPRIS Data Will provide data to support end    Improving  Planned No No No 
Warehouse  user ad-hoc query analysis and     service or                
              standard reporting needs. It will  performance               
                    provide the foundation for a                           
                                         central                           
              reporting repository that can be                             
              used to populate business-specific                           
                         data marts.                                       

Development and  Will develop advanced    Analyzing    Planned Yes Yes Yes 
                           software                                       
    Deployment of   tools for quantitative scientific and                 
                         analysis of                                      
       Advanced     drug safety data.         research                    
      Analytical    Medical officers                                      
    Tools for Drug  and safety evaluators   information                   
        Safety             will use                                       
                    these advances in                                     
Risk Assessment  software tools.                                       

Add data mining capability to CFSAN Adverse Event Reporting System Is a
comprehensive system for tracking, reviewing, and reporting adverse event
incidences involving foods, cosmetics, and dietary supplements.
Integrating and centralizing the system and eliminating patchwork systems
make information on these adverse events available to federal, state, and
local governments as well as to industry and the public in a more timely
and efficient manner.

Monitoring Planned Yes Yes Yes
food or drug
safety

                         (Continued From Previous Page)

                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

                  Health Resources and Services Administration

HRSA Geospatial Data warehouse that primarily Improving Operational No Yes
Yes Data Warehouse collects programmatic, service or

demographic, and statistical data. performance

                             Program Support Center

       Employee      Uses information from a  Improving  Operational No No No 
      Assistance     database                                              
Program Analysis  of employee assistance  service or                    
                             program                                       
                      case information that  performance                   
                            does not                                       
                     contain client personal                               
                     identifiers.                                          
                       Data are mined for                                  
                             quality                                       
                      assurance and program                                
                     management information                                
                             that is                                       
                       used to enhance the                                 
                           quality and                                     
                      cost effectiveness of                                
                            services.                                      

                Source: Department of Health and Human Services.

  Table 8: Department of Homeland Security's Inventory of Data Mining Efforts
                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

                 Border and Transportation Security Directorate

Workforce Profile Contains payroll and personnel Managing Operational Yes
No Yes Data Mart data and is mined for workforce human

trends. resources

Customs Integrated Is a Customs data mart Managing  Operational Yes No Yes 
                      contained                                           
Personnel Payroll   within Department of    human                      
                             Homeland                                     
    System Data Mart   Security's workforce  resources                    
                           profile data                                   
                       mart. Personnel and                                
                           payroll data                                   
                          are mined for                                   
                        workforce trends.                                 

                     Assists the                                              
Internal Affairs  Internal Affairs      Detecting   Operational Yes No Yes
                     group by                                             
       Treasury        mining criminal     criminal                       
                      activity data to                                    
      Enforcement    ascertain how       activities or                    
                     Customs' employees                                   
                     are using the                                        
    Communications   Treasury                                             
                     Enforcement           patterns                       
System Audit Data       System.                                        
         Mart                                                             

      Operations     Assists in managing     Improving  Operational No No Yes 
                     the operation                                        
      Management     of all ports of entry  service or                    
                          for incoming                                    
Reports Data Mart carriers, people, and  performance                   
                     cargo. Helps                                         
                       in making resource                                 
                          (people and                                     
                     equipment) allocation                                
                              and                                         
                     operational                                          
                     improvement decisions.                               

                         (Continued From Previous Page)

                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

Automated Export Mines data on export    Improving  Operational No Yes Yes 
                    trade in the                                          
System Data Mart   U.S. and produces    service or                     
                          reports on                                      
                    historical shipping    performance                    
                    and receiving                                         
                           trends.                                        

Seized Property/   Mines data to ensure   Improving  Operational Yes No No 
                          data quality                                     
     Forfeitures,       and review work     service or                     
                          assignments.                                     
    Penalties, and       System has two     performance                    
         Fines          components: one                                    
    Case Management   that processes legal                                 
                          cases like a                                     
       Data Mart     law firm, and a second                                
                          that serves                                      
                     as property and                                       
                     inventory control by                                  
                       tracking property                                   
                            seized.                                        

Incident Data Will look through incident   Analyzing   Planned Yes Yes Yes 
       Mart      logs for                                                 
                 patterns of events. An     intelligence                  
                 incident is an                                           
                 event involving a law      and detecting                 
                 enforcement                                              
                 or government agency for     terrorist                   
                 which a                                                  
                 log was created (e.g.,      activities                   
                 traffic ticket,                                          
                 drug arrest, or firearm                                  
                 possession).                                             
                 The system may look at                                   
                 crimes in a                                              
                   particular geographic                                  
                         location,                                        
                 particular types of                                      
                 arrests, or any                                          
                 type of unusual activity.                                

Case Management Assists in managing    Analyzing   Operational Yes Yes Yes 
                           law                                            
      Data Mart     enforcement cases,  intelligence                      
                        including                                         
                      Customs cases.    and detecting                     
                       Reviews case                                       
                    loads, status, and    terrorist                       
                      relationships                                       
                       among cases.      activities                       

                Emergency Preparedness and Response Directorate

Enterprise Data Warehouse Will take data from multiple, disparate systems
and integrate the data into one reporting environment. The objective of
the effort is to allow for the reduction of data within the agency and to
provide an enterprise view of information necessary to drive critical
business processes and decisions. Data on internal human resources, all
aspects of disaster management, infrastructure, equipment location, etc.,
will be used.

Disaster Planned Yes Yes Yes
response and
recovery

         Information Analysis and Infrastructure Protection Directorate

Analyst Notebook  Correlates events     Analyzing   Operational Yes Yes No 
          I2           and people to                                       
                    specific information intelligence                      
                                         and detecting                     
                                           terrorist                       
                                          activities                       

                         (Continued From Previous Page)

                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

Automatic Message Automatically takes        Analyzing   Planned No No Yes 
                     messages from                                        
    Handling System  external agencies and    intelligence                
                     routes them                                          
       (Verity)           to appropriate      and detecting               
                            recipients                                    
                                                terrorist                 
                                               activities                 

                                U.S. Coast Guard

Readiness Assists in ensuring readiness for all Improving Operational Yes
No No
Management Coast Guard missions. service or
System performance

CG Info Provides one-stop shopping for   Improving  Operational Yes No Yes 
            Coast Guard information. It is service or                     
                                       the                                
             central location and common   performance                    
            interface for the entire Coast                                
                                     Guard                                
            to gain near real-time access                                 
                         to                                               
             data from multiple, disparate                                
                                     Coast                                
            Guard information systems. It                                 
           provides a single interface for                                
                                     users                                
              to view mission-critical                                    
                       support                                            
                        data.                                             

                              U.S. Secret Service

      Criminal        Mines data in        Detecting   Operational Yes No Yes 
                   suspicious activity                                    
Investigation  reports received from    criminal                       
                  banks to find                                           
Division Data  commonalities in data  activities or                    
       Mining          to assist in                                       
                      strategically                                       
                  allocating resources.    patterns                       

                    Source: Department of Homeland Security.

Table 9: Department of the Interior's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

                          Minerals Management Service

Data Mining of the Technical Information Management System (TIMS) Database
Is a corporate database for oil and gas leases. The database is mined in
support of policy development. One area of data mining is identification
of leases that will be abandoned in the near future. Data mining has shown
that leases with six or more producing wells in 1 year are almost never
abandoned in the next year. Another application of data mining is the
safety of oil and gas operations. For example, data mining has shown that
accidents have a peak rate on Thursday mornings.

Improving Operational Yes Yes No
service or
performance

                      Source: Department of the Interior.

  Table 10: Department of Justice's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Department of Justice Headquarters Drug Enforcement Administration Federal
                            Bureau of Investigation

Drug/Financial Will contain data from,     Detecting   Planned Yes Yes Yes 
                  and be                                                  
Fusion Center  used by, Organized Crime    criminal                    
                  and                                                     
                  Drug Enforcement Task     activities or                 
                  Force                                                   
                  agencies. The system will                               
                                     permit   patterns                    
                  the collection and cross                                
                  case                                                    
                  analysis of all drug and                                
                  related                                                 
                   financial investigative                                
                            data.                                         

     Statistical   Is a query analysis     Detecting   Operational Yes No Yes 
                   and reporting                                          
     Management    tool that pulls data    criminal                       
                         from many                                        
    Analysis and   systems. It allows    activities or                    
                   for statistical                                        
Reporting Tool    analyses of drug                                     
                        cases Drug         patterns                       
System (SMARTS)      Enforcement                                       
                     Administration's                                     
                        statistical                                       
        /SPSS           reporting.                                        

TOLLS Is a database of telephone calls   Detecting   Operational Yes No No 
          from court ordered and approved   criminal                       
              wiretaps and Title III      activities or                    
         investigations. Information such   patterns                       
           as telephone numbers, time and                                  
         date of calls, and call duration                                  
                                       is                                  
           captured. Data are mined for                                    
            patterns to give leads in                                      
                   investigations of drug                                  
                             trafficking.                                  

       Secure     Allows the FBI to        Analyzing   Operational Yes No Yes 
Collaborative  search multiple                                         
    Operational    data sources through  intelligence                     
                           one                                            
     Prototype    interface to uncover   and detecting                    
                  terrorist and                                           
    Environment/   criminal activities     terrorist                      
                           and                                            
Investigative  relationships. Data     activities                      
        Data      sources are a                                           
     Warehouse        combination of                                      
                      structured and                                      
                    unstructured text.                                    

      Foreign     Supports the Foreign    Analyzing   Operational Yes Yes Yes 
     Terrorist         Terrorist                                          
Tracking Task Tracking Task Force    intelligence                      
       Force     that seeks to                                            
     Activity       prevent foreign     and detecting                     
                    terrorists from                                       
                 gaining access to the    terrorist                       
                         United                                           
                 States. Data from the   activities                       
                 Department                                               
                 of Homeland Security,                                    
                        Federal                                           
                 Bureau of                                                
                 Investigation, and                                       
                 public                                                   
                  data sources are put                                    
                      into a data                                         
                   mart and mined to                                      
                       determine                                          
                 unlawful entry and to                                    
                        support                                           
                    deportations and                                      
                     prosecutions.                                        

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

FBI Intelligence Is intended to take a      Analyzing   Planned Yes No Yes 
                    subset of                                             
    Community Data    approved data from a   intelligence                 
                              data                                        
        Marts       warehouse and make it    and detecting                
                    available                                             
                    to the intelligence        terrorist                  
                    community.                                            
                                              activities                  

                Federal Bureau of Prisons U.S. Marshals Service

     Business   Will be a warehouse designed to  Improving  Planned No No Yes 
Information                                                            
    Warehouse       provide information on      service or                
                manufacturing by Federal Prison performance               
                  Industries, which runs 100                              
                factories in various prisons.                             
                Data                                                      
                 will be mined for information                            
                              on                                          
                 the manufacturing environment                            
                    (such as information on                               
                           material                                       
                 on hand, scheduling, and the                             
                production process) and                                   
                financial                                                 
                          activities.                                     

USMS Workload Will seek to develop a workforce Managing  Planned Yes No No 
     Modeling    model that will support budget     human                  
                   formulation, execution, and    resources                
                   resource analysis. Will be a                            
                 planning and execution activity                           
                        that will be used to help                          
                                        determine                          
                   the quantity and location of                            
                       required resources.                                 

                         Source: Department of Justice.

Table 11: Department of Labor's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Dashboard   Provides links to programs    Improving  Operational Yes No No 
    Display                                                                
              throughout the Department of  service or                     
              Labor's Employment Training   performance                    
             Administration to provide                                     
             reports                                                       
              or information on financial                                  
                      activities.                                          

     Enforcement    Is used to track       Improving   Operational Yes Yes No 
                    investigations of                                      
      Management    violations of Title   service or                       
                        I and other                                        
     System, Case      criminal laws     performance                       
                       pertaining to                                       
     Opening, and   pension and welfare                                    
                          rights.                                          
Results Analysis                                                        
                     Is used to monitor                            Yes No  No 
       Employee          compliance        Detecting   Operational         
      Retirement    with Title I of the  fraud, waste,                     
        Income            Employee                                         
     Security Act   Retirement Income      and abuse                       
         Data       Security Act.                                          
        System                                                             

      Mine Safety and     Mines data from a  Improving Operational Yes No Yes 
                            data store of                                 
Health Administration   information on     safety                      
                          safety and health                               
    Teradata Data Store    enforcement and                                
                             demographic                                  
                            data for mine                                 
                          operations, along                               
                             with miner                                   
                         accidents, injury,                               
                                 and                                      
                            illness data.                                 

      Mathematical     Will look at data from     Improving  Planned No No No 
                       economic                                            
Statistics Research surveys to compare rates  service or                
                                  of                                       
         Center        nonresponse for Bureau of performance               
                       Labor                                               
                              Statistics.                                  

                          Source: Department of Labor.

Table 12: Department of State's Inventory of Data Mining Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Citibank's Ad Hoc    Enables purchase   Detecting   Operational Yes Yes No 
                           card managers                                   
Reporting System    to track trends   fraud, waste,                     
                       related to the                                      
                       usage of credit     and abuse                       
                          cards by                                         
                            employees in                                   
                     purchasing supplies                                   
                      and services for                                     
                        official use.                                      
                        Purchase card                                      
                         program is                                        
                          worldwide, and                                   
                       spending patterns                                   
                     and purchases are                                     
                     monitored for                                         
                     potential misuse or                                   
                           fraud.                                          

     Purchase Card      Will involve the       Detecting   Planned Yes Yes No 
                          automation of                                    
Management System    internal workflow    fraud, waste,                 
                            processes                                      
                     (system is in the early   and abuse                   
                            phases of                                      
                     development). Will use                                
                            internal                                       
                     data and bank data to                                 
                     track trends                                          
                     and anomalies in the                                  
                     Department                                            
                      of State's worldwide                                 
                            purchase                                       
                          card program.                                    

                          Source: Department of State.

Table 13: Department of Transportation's Inventory of Data Mining Efforts
                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

                   Department of Transportation Headquarters

DOT IT Security Will collect information to allow Detecting Planned Yes No
No Management System management to assess its IT fraud, waste,

security infrastructure. and abuse

                 National Highway Traffic Safety Administration

State Data Analyzes, mines, and researches  Improving Operational No No No 
     System                                                                
              automotive crash data, such as    safety                     
              statistics from rollovers of                                 
              SUVs,                                                        
                     from 22 states to improve                             
                                       highway                             
               safety and lessen fatalities.                               
              Policies can be set based on the                             
                           data.                                           

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

       Fatality      Helps to evaluate the  Improving Operational Yes Yes Yes 
       Analysis                                                           
Reporting System effectiveness of motor   safety                       
                            vehicle                                       
        (FARS)       safety standards and                                 
                            highway                                       
                     safety programs. Data                                
                              are                                         
                    collected from all 50                                 
                    states, the                                           
                    District of Columbia,                                 
                    and Puerto                                            
                       Rico and are used to                               
                               evaluate and                               
                    support highway safety.                               

       National     Collects and mines       Improving Operational Yes Yes No 
      Automotive    information on                                         
Sampling System    automotive crashes.     safety                       
                           System is                                       
                     related to the Federal                                
                             Motor                                         
                    Vehicle Safety Standards                               
                              that                                         
                    regulate vehicle                                       
                    compliance items                                       
                    such as seat belts, air                                
                    bags, and                                              
                    the stopping distance of                               
                            brakes.                                        

                     Source: Department of Transportation.

Table 14: Department of the Treasury's Inventory of Data Mining Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

                          Financial Management Service

Treasury Offset Mines data to reduce the number Improving Operational Yes
No Yes
Program (TOP) of debts listed in TOP. service or
Cleanup performance

Electronic  Is a free service offered by Increasing  Operational Yes No No 
     Federal               the              tax                            
Tax Payment  Department of the Treasury  compliance                     
     System                for                                             
     (EFTPS)     individuals and business                                  
    Marketing                                                              
                 taxpayers who pay their                                   
                         federal                                           
               taxes electronically. Mining                                
               activity                                                    
                  tracks enrollment, tax                                   
                         payment                                           
                history, and usage trends.                                 

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

                            Internal Revenue Service

      Planning,    Will be a component of the   Improving  Planned Yes No Yes 
      Analysis,                                                           
    and Decision   Custodial Accounting        service or                 
                   Program,                                               
Support System  which is the warehouse that performance                
                               is                                         
                   used to query transactional                            
                   data                                                   
                   and produce reports. This                              
                   activity                                               
                   is meant to improve                                    
                   reporting and                                          
                   use decision support tools.                            

     Abusive    Will model characteristics  Increasing tax Planned Yes Yes No 
    Corporate               of                                             
Tax Shelter  corporate tax shelters and    compliance                   
    Detection              use                                             
      Model    models to predict corporate                                 
                           tax                                             
               shelter abuse and to assess                                 
               compliance risk in the                                      
               corporate                                                   
                   taxpayer population.                                    

K-1 Link Analysis	Will be used to detect potential tax Increasing tax
Planned Yes No No evasion. compliance

    Research on the  Will be used to research   Detecting   Planned Yes No No 
                     data on                                               
     Population of   taxpayers who receive    fraud, waste,                
                     the EITC.                                             
     Taxpayers Who                              and abuse                  
    Receive Earned                                                         
Income Tax Credit                                                       

     Issue Based    Will provide access Increasing tax   Planned   No  Yes No 
                    to a variety of                                        
      Management    data sources within   compliance                       
                         IRS. Will                                         
     Information    assist in research                                     
        System      and case work.                                         
Electronic Fraud Mines data to                                  Yes No  No 
                    evaluate and rate     Improving    Operational         
                    potentially                                            
Detection System fraudulent            service or                       
                    individual tax                                         
                         returns.        performance                       
        Reveal        Will be used to                    Planned   Yes Yes No 
                     detect financial     Detecting                        
                     criminal activity     criminal                        
                        such as tax                                        
                         evasion.       activities or                      
                                           patterns                        

Oracle Model 22  Takes information    Increasing tax Operational Yes No No 
                    from individual                                        
     Partnership      tax returns and      compliance                      
        Return          attempts to                                        
    Scoring System  replicate judgments                                    
                          made by                                          
                    taxpayers to detect                                    
                    the likelihood                                         
                    of material errors.                                    

SPSS Form 1120-S  Will automate the       Increasing tax Planned Yes No No 
                     classification of                                     
    Return Scoring    certain corporate tax    compliance                  
                            returns.                                       
        System                                                             
    Oracle Model 33       Will identify                     Planned Yes No No 
                        noncompliance in     Increasing tax                
      Partnership     partnership returns.     compliance                  
        Scoring                                                            
         Model                                                             

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Compliance Will identify taxpayer Increasing tax Planned Yes Yes Yes
Laboratory noncompliance by looking at compliance

groups of returns.

U.S. Mint

      Information    Collects information on  Improving  Operational No No No 
                            potential                                      
      Technology     intrusions to U.S. Mint information                   
       Intrusion            systems.                                       
Detection System    Looks for trends in    security                     
                           information                                     
                     reported by sensors to                                
                     determine                                             
                     if illicit activity has                               
                            occurred.                                      
                         Minimizes false                                   
                           positives.                                      

E-Commerce Fraud     Attempts to       Detecting   Operational Yes Yes Yes 
                     identify and stop                                    
       Analysis     fraudulent activity   criminal                        
       Activity     involving stolen                                      
                    credit cards to     activities or                     
                    order products over                                   
                    the Internet or via                                   
                        telephone.        patterns                        
                    Fraud rating                                          
                    identifiers are                                       
                    used to                                               
                      identify areas                                      
                      where fraud has                                     
                      occurred and to                                     
                       determine the                                      
                       likelihood of                                      
                     fraud. Allows for                                    
                    orders to be                                          
                    stopped or for                                        
                    orders                                                
                      over a certain                                      
                    dollar limit to be                                    
                         stopped.                                         

Data Warehouse 	Will be an integrated, scalable, expandable data warehouse
that will support business functions by grouping the data in
subjectoriented data marts. Each warehouse data mart will be defined to
integrate both internal and external data to provide the necessary
information to perform both historical and predictive analysis and support
numerous calculations.

Improving Planned No No No
service or
performance

                      Source: Department of the Treasury.

Table 15: Department of Veterans Affairs' Inventory of Data Mining Efforts
                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

  Department of Veterans Affairs Headquarters Veterans Benefits Administration
                         Veterans Health Administration

Veterans Affairs  Is used to monitor    Detecting   Operational Yes Yes No 
                         and manage                                        
Central Incident intrusion detection    criminal                        
                       and firewalls.                                      
                    Scripts are written  activities or                     
Response Center      for forensic                                       
                       analysis to go                                      
                        through data       patterns                        
                       collected from                                      
                         system and                                        
                       network logs.                                       
    Purchase Card   Will identify                        Planned   Yes Yes No 
         Data       patterns in purchase   Detecting                       
     Mining (SAS)   card use to identify fraud, waste,                     
                         fraud and                                         
       Reports         misuse and to       and abuse                       
                       maintain good                                       
                     internal controls.                                    
Travel Card Data Will be used to look                 Planned   Yes Yes No 
                    for patterns in        Detecting                       
     Mining (SAS)   the use of travel    fraud, waste,                     
                    credit cards that                                      
       Reports       indicate misuse or    and abuse                       
                        fraud and to                                       
                       maintain good                                       
                     internal controls.                                    

     Office of    Analyzes and matches      Detecting   Operational Yes No No 
     Inspector    (within the                                              
General (OIG)  guidelines of the law)  fraud, waste,                    
                  Veterans                                                 
                      Affairs (VA) files,   and abuse                      
                       pertaining to both                                  
                  VA-provided benefits                                     
                  and health                                               
                  care services to detect                                  
                              patterns of                                  
                  waste, fraud, or abuse.                                  

C & P Payment Data      Analyzes        Detecting   Operational Yes No Yes 
                       compensation and                                   
        Analysis       pension data to   fraud, waste,                    
                        detect fraud,                                     
                      waste, and abuse.    and abuse                      
      C & P Large        Serves as an                              Yes No  No 
        Payment        internal control    Detecting   Operational        
      Verification     intended to make  fraud, waste,                    
        Process           sure that                                       
                       payments over a     and abuse                      
                        certain dollar                                    
                      threshold are                                       
                      reviewed to detect                                  
                      potential fraud or                                  
                            abuse.                                        

Primary Analysis Is used mainly to        Improving  Operational No  No No 
         and        discover trends,                                       
    Classification   incidents/events, and    safety                       
                    vulnerabilities that                                   
                    may exist in VA                                        
                          hospitals.                                       
      Allocation       Is used in making                            Yes No No 
       Resource            resource          Improving  Operational        
Center Database  allocation decisions    service or                     
                    based on the                                           
                      analysis of patient   performance                    
                         workload and                                      
                          cost data.                                       

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

     Veteran's Health   Integrates patient,  Improving  Operational Yes No No 
                           clinical, and                                   
Administration (VHA) financial data to   service or                     
                        present a unified                                  
      Financial and         management      performance                    
         Clinical         perspective and                                  
        Data Mart        enable consistent                                 
                            reporting.                                     
                        Is used to identify                         Yes No No 
     Decision Support   patterns of care     Improving  Operational        
          System            and patient     service or                     
                        outcomes linked to                                 
                             resource                                      
                          consumption and   performance                    
                               costs                                       
                          associated with                                  
                           each patient                                    
                            encounter.                                     

        Top 50      Is used to standardize   Improving  Operational No Yes No 
                    medical                                                
Standardization   and hospital supplies  service or                     
                              and                                          
Listing/Managed  equipment to (1)        performance                    
                    improve VHA's                                          
Inventory System     bargaining position                                
                            when soliciting                                
                    bids and (2) facilitate                                
                    the ability to                                         
                    move doctors among                                     
                    hospitals.                                             

VISN 16 Data    Provides unified view of  Improving  Operational Yes No No 
                                   the VISN                                
    Warehouse   16 VA region, composed of   service or                     
                10                                                         
                     medical centers and 30 performance                    
                                 outpatient                                
                clinics. The system gives a                                
                                    view of                                
                the enterprise for                                         
                management                                                 
                purposes. It is mined for a                                
                                    variety                                
                of types of information                                    
                such as                                                    
                  patient encounters, lab                                  
                          tests,                                           
                  pharmacy records, etc.                                   

                    Source: Department of Veterans Affairs.

  Table 16: Environmental Protection Agency's Inventory of Data Mining Efforts
                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Conceptual Plans  Will regularly review      Detecting   Planned Yes No No 
          to         financial data                                        
       Design an     systems for contracts,   fraud, waste,                
       Approach      bank cards,                                           
     and System to   and small purchases and    and abuse                  
                              other                                        
Review Financial  financial databases for                               
                     misuse or                                             
         Data        fraud of Environmental                                
                     Protection                                            
                         Agency's assets.                                  

Drinking Water Integrates and          Monitoring   Operational Yes No Yes 
        Data      analyzes drinking                                       
     Warehouse    water information from public health                    
                          state,                                          
                      regional, and                                       
                       headquarters                                       
                  sources. Includes data                                  
                  on water                                                
                   systems, compliance,                                   
                          sample                                          
                  analytical results,                                     
                  and audit data.                                         

                    Source: Environmental Protection Agency.

  Table 17: Export-Import Bank of the United States' Inventory of Data Mining
                                Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

       Integrated      Is used to generate   Improving  Operational Yes No No 
                       reports that                                        
Information System  describe bank        service or                     
                       lending activities                                  
     Data Warehouse    and exposure trends. performance                    
       Mining for                                                          
        Financial                                                          
    Risk Information                                                       

                Source: Export-Import Bank of the United States.

Table 18: Federal Deposit Insurance Corporation's Inventory of Data Mining
                                Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Real Estate  Is used to measure real  Detecting risk Operational No No Yes 
      Stress             estate                                           
Test (RSST)  risk. Bank examiners use  in financial                    
                          data                                            
                from the system data as                                   
                part of a                   systems                       
                pre-examination planning                                  
                                 process                                  
                to assist in identifying                                  
                          risk                                            
                    concentrations.                                       

Determination of  Will support the     Improving      Planned   Yes No  No 
                     development of a                                     
Insured Deposits  new system for       service or                      
                     implementing the                                     
                     deposit insurance   performance                      
                          claims.                                         
      Statistical     Is used to rate                              No  No Yes 
        CAMELS           financial      Detecting risk Operational        
                     institutions'                                        
    Offsite Review   performance and     in financial                     
                     risk                                                 
                         management                                       
                         practices.        systems                        

     Growth       Is used to identify    Detecting risk Operational No No Yes 
Monitoring          financial                                          
     System    institutions that have     in financial                    
               experienced                                                
               significant growth.                                        
               Serves as an                 systems                       
                early warning system for                                  
                               detecting                                  
                financial institutions                                    
                      that might                                          
                pose financial risk to                                    
                         FDIC.                                            

                 Source: Federal Deposit Insurance Corporation.

  Table 19: Federal Reserve System's Inventory of Data Mining Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

     Office of the    Will support audits and   Detecting   Planned Yes No No 
Inspector General  evaluations. Using ACL, fraud, waste,                
                              queries                                      
      (OIG), Audit    will be run against the   and abuse                  
        Services              board's                                      
                      financial and personnel                              
                              systems                                      
                      to detect fraud, waste,                              
                      and abuse,                                           
                           or to provide                                   
                            information                                    
                       supporting any aspect                               
                             of an OIG                                     
                             project.                                      

                        Source: Federal Reserve System.

  Table 20: National Aeronautics and Space Administration's Inventory of Data
                            Mining Efforts Features

Private sector data

Organization/ system
name Description Purpose Status

Personal information Other agency data

     Archiving of Web        Will gather       Analyzing    Planned No No Yes 
                           metadata on the                                
      Information at      GSFC Web site at   scientific and               
         National              NASA to                                    
      Aeronautics and       preserve NASA       research                  
                               legacy                                     
Space Administration     information.      information                 
    (NASA) and Goddard                                                    
Space Federal Center                                                   
          (GSFC)                                                          

My Goddard Search- Will allow Web mining   Analyzing    Planned No  Yes No 
                      of scientific                                        
       Mining of      data at Goddard Space scientific and                 
       Goddard's      Center. It                                           
    Web environment     is referred to as      research                    
                           "Google for                                     
                            Goddard."        information                   
       NetContext     Will monitor network                 Planned Yes No  No 
                      traffic for the         Detecting                    
                           purpose of        fraud, waste                  
                      identifying bandwidth                                
                      use, fraud, abuse,      and abuse                    
                      and IT security-                                     
                       related activities.                                 

Geophysics Time     Will develop a set of   Analyzing    Planned No No Yes 
                               algorithms to                              
Series Analysis identify patterns within  scientific and               
                   temporal                                               
                   activities. The data will    research                  
                              be                                          
                    trajectories of objects   information                 
                              and                                         
                      movement of objects                                 
                            within                                        
                            images.                                       

       "Simmarizer"     Uses data mining   Analyzing    Operational No No  No 
                        techniques to                                     
    (Simulation-Based       extract      scientific and                   
                         knowledge from                                   
         Summary/        simulators to      research                      
                           understand                                     
       Discovery of      conditions and   information                     
                           scenarios                                      
        Knowledge)      regarding space                                   
                           missions.                                      
                           Is used to                                         
Global Environmental      obtain                                 No No Yes
                          information      Analyzing    Operational       
    and Earth Science     about global   scientific and                   
                        climate changes.                                  
    Information System                      research                      
        (GENESIS)                         information                     

    Machine Learning  Will find patterns and   Analyzing    Planned No No Yes 
          and                                                             
    Data Mining for      relationships in    scientific and               
                          large, complex                                  
        Improved        earth science data      research                  
      Intelligent             sets,                                       
Data Understanding specifically for rare   information                 
           of               and small                                     
    High Dimensional     events hidden in                                 
                           larger data                                    
Earth Sensed Data  signals. Will build                                 
                      new capabilities                                    
                        to understand NASA                                
                             science                                      
                              data.                                       

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/ system
name Description Purpose Status

Personal information Other agency data

Distributed Data Mining Techniques for Object Discovery in the National
Virtual Observatory (NVO) Involves using a data mining tool set for space
science research. Incorporates a small number of targeted data mining
techniques in order to address specific NASA space science research
programs. In particular, the data mining environment will be used to
explore NASA's large space science data collections. These techniques are
being applied to astronomical object discovery, identification,
classification, and interpretation across large multiple distributed
astronomy data collections.

Analyzing Operational No No Yes
scientific and
research
information

Diamond Eye  Analyzes large sets of    Analyzing    Operational No Yes Yes 
     (System            images                                            
for Mining    looking for specific   scientific and                    
     Images)          features.                                           
                                           research                       
                                         information                      

Data Mining of 3-D   Will automate the      Analyzing    Planned No No Yes 
                           analysis of                                    
    Numerical Model   weather model output,  scientific and               
    Forecast Output   observation, and          research                  
          and         satellite data to                                   
Its Application to allow for a better      information                 
                      understanding of                                    
      Atmospheric     the science of weather                              
        Research      dynamics                                            
                      and to predict future                               
                             weather                                      
                             events.                                      

    Ecological  Will develop an adaptable      Analyzing    Planned No No Yes 
Forecasting  system                                                    
                  that can be used to mine   scientific and               
                           large                                          
                volumes of scientific data,     research                  
                identify                                                  
                 novel causal relationships   information                 
                           in the                                         
                  data about earth system                                 
                   processes, and rapidly                                 
                incorporate discoveries with                              
                    biospheric models to                                  
                          generate                                        
                 now-casts and forecasts of                               
                biospheric events and                                     
                conditions.                                               

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/ system
name Description Purpose Status

Personal information Other agency data

Distributed Data Mining for Large NASA Databases (Earth Science Earth
Observing System Data) Will research changes, trends, and relationships in
Earth Observing System (EOS) data. The major feature of this activity is
that it will allow for different data to be mined in parts and then
merged. The capability is needed for instances when scientific data are at
different locations. A research quality software will be used to allow for
a communication system and run-time environment for applying a collective
data analysis approach not bound to any specific platform, learning
algorithm, or representation of knowledge.

Analyzing Planned No No Yes
scientific and
research
information

     Discovery of    Will detect patterns in   Analyzing    Planned No No Yes 
        Changes            scientific                                     
    from the Global       data that are      scientific and               
                         geospatial and                                   
Carbon Cycle and  dynamic and represented    research                  
                               as                                         
    Climate System    raster data (gridded    information                 
         Using              cells of                                      
      Data Mining     surfaces such as the                                
       Activity             sun's or                                      
                       earth's surfaces).                                 
                             Mining                                       
                     capabilities are being                               
                            developed                                     
                     for future                                           
                     NASA-relevant data and                               
                            science.                                      

     "AutoSciProd"   Uses statistical and   Analyzing    Operational No No No 
                     image data to                                         
      (Automatic        determine and     scientific and                   
      Generation       improve science                                     
      of Science          products.          research                      
       Products                                                            
from Large Image                        information                     
      Data Sets)                                                           
Near Archive Data  Pulls data from an                             No No No 
                          archive of        Analyzing    Operational       
    Mining of Earth   earth science data  scientific and                   
                         and applies                                       
     Science Data    scientists' analyses    research                      
                             and                                           
                      algorithms to the    information                     
                            data.                                          
                       Will improve the                    Planned   No No No 
Spectral Analysis     collection,        Analyzing                      
Automation (SAA)  identification, and  scientific and                   
                        evaluation of                                      
        System         spectral data to      research                      
                         better meet                                       
                         scientists'       information                     
                        requirements.                                      

     Multiple Sensor      Will be used for      Analyzing    Planned No No No 
          Image             collaborative                                  
Registration, Image  preprocessing of data scientific and               
                                 and                                       
Fusion and Dimension research on wavelets.    research                  
                                Will                                       
                        comprise research      information                 
     Reduction Using    software that                                      
         Wavelets        looks at different                                
                            technologies                                   
                            such as image                                  
                           processing and                                  
                             dimensions.                                   

                         (Continued From Previous Page)

                                    Features

Private sector data

Organization/ system
name Description Purpose Status

Personal information Other agency data

GMSEC Event  Will be used to           Analyzing      Planned   No  No  No 
                determine health                                           
Message Data of and reasons for      scientific and                     
      Mining    problems with                                              
       Task       satellite systems.       research                        
                                         information                       
    Intrusion    Looks at all traffic                              Yes No  No 
    Detection       that traverses        Improving    Operational         
      System       NASA's networks'      information                       
                       borders.                                            
                                           security                        
    AvSP/ASMM   Is used with                                       No  Yes No 
     Foreign    simulations to identify   Analyzing    Operational         
      Object    foreign object damage   scientific and                     
    Detection   indicators                                                 
     Toolset      for commercial jet       research                        
                       engines.                                            
                                         information                       

Mission and Science    Will be a basic       Analyzing    Planned No No No 
                             technology                                    
     Measurement and   research program that  scientific and               
                             will also                                     
    Discovery Systems   support infusion of      research                  
                             resulting                                     
                       technologies into NASA  information                 
                       missions.                                           
                       Purpose of the program                              
                               is to                                       
                       solve the research                                  
                       challenge in                                        
                        extracting the most                                
                             scientific                                    
                       knowledge from NASA's                               
                       space                                               
                         missions and data                                 
                             archives.                                     

    StarTool: Solar      Is used for       Analyzing   Operational No No   No 
        Active       recognition of solar                                 
                     activity in          scientific                      
Region Detection  sequences of         and                             
                     multiband                                            
                        solar images.       research                      
                                          information                     
       "Toogle"          Searches for                              No No   No 
     (Times-Series   time-series data. Is  Improving   Operational        
    Search Engine)   similar to a Google     safety                       
                            search                                        
                           engine.                                        
      Use of Data       Will help the                    Planned   No No  Yes 
        Mining,        National Oceanic    Improving                      
    Remote Sensing,    and Atmospheric     service or                     
          and           Administration                                    
      Geographic      automate its fire   performance                     
                          detection                                       
      Information    systems and improve                                  
        Systems              the                                          
     for Wildfire      accuracy of fire                                   
       Detection          detection                                       
    and Prediction         systems.                                       
       Knowledge     Will mine data using                Planned   No Yes Yes 
       Discovery     software that         Analyzing                      
    and Data Mining   has been developed  scientific                      
                          to exploit      and                             
       Based on       information from a    research                      
     Hierarchical        hierarchical                                     
         Image        image segmentation  information                     
     Segmentation          process.                                       

             Source: National Aeronautics and Space Administration.

Table 21: Nuclear Regulatory Commission's Inventory of Data Mining Efforts
                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Licensee Event    Identifies nuclear     Improving  Operational No  Yes No 
                        safety trends                                      
     Report Data       and patterns in       safety                        
                         commercial                                        
                    nuclear power events.                                  
     Centralized   Will consolidate and                  Planned   Yes No  No 
                   standardize              Improving                      
     Information    reporting for nuclear  service or                      
      Delivery             reactor                                         
                        regulations.       performance                     

                     Source: Nuclear Regulatory Commission.

  Table 22: Office of Personnel Management's Inventory of Data Mining Efforts
                                    Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

CRIS Retirement  Mines federal employee  Improving  Operational Yes No Yes 
                    benefits                                              
     Data Mining         data such as      service or                     
       Activity         information on                                    
                    retirement and life    performance                    
                    insurance to                                          
                      assist in managing                                  
                           federal                                        
                    employee eligibilities                                
                             and                                          
                        entitlements.                                     

                    Source: Office of Personnel Management.

Table 23: Pension Benefit Guaranty Corporation's Inventory of Data Mining
                                Efforts Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

      Corporate     Will streamline access to   Improving   Planned No No Yes 
     Performance   management and operational   service or                
Indicators and   performance measures and   performance                
      Analytics     permit the correlation of                             
                     performance and output                               
                            measures.                                     

    Corporate Policy    Is a stochastic     Improving  Operational No Yes Yes 
          and           simulation model                                  
        Research       that incorporates   service or                     
                        historic equity                                   
      Department's    and interest rates   performance                    
                      and bankruptcy                                      
Forecasting System possibilities to                                    
                      forecast scenarios                                  
                       for more than 300                                  
                         pension plans                                    
                       and their related                                  
                           corporate                                      
                           sponsors.                                      

                 Source: Pension Benefit Guaranty Corporation.

Table 24: Railroad Retirement Board's Inventory of Data Mining Efforts Features

Organization/
system name Description Purpose Status

Personal information Private sector data Other agency data

       Railroad      Consists of two major  Improving  Operational Yes No Yes 
      Retirement     databases                                            
Board Data Stores (payment and          service or                     
                     entitlement history                                  
                      and employment data  performance                    
                     maintenance) that are                                
                     mined by                                             
                     actuaries to produce                                 
                            annual                                        
                     actuarial reports and                                
                           for audit                                      
                      support and quality                                 
                           control.                                       

                       Source: Railroad Retirement Board.

Table 25: Small Business Administration's Inventory of Data Mining Efforts
                                    Features

Private sector data

Organization/
system name Description Purpose Status

Personal information Other agency data

Loan Monitoring Helps to identify,       Improving  Operational Yes Yes No 
                   measure, and                                            
       System        manage the risk of    service or                      
                            Small                                          
                          Business         performance                     
                      Administration's                                     
                   portfolio. Business                                     
                   credit scores                                           
                   are used but individual                                 
                   credit                                                  
                       scores are not.                                     

      MONSTER and        Mines data from     Financial  Operational Yes No No 
                          database that                                    
Econometric Models includes all                                         
                      transactions for each  management                    
                      loan that affects SBA                                
                             subsidy                                       
                       costs, to assist in                                 
                           determining                                     
                       credit subsidy rates                                
                            for SBA's                                      
                          various credit                                   
                            programs.                                      

                     Source: Small Business Administration.

GAO's Mission	The General Accounting Office, the audit, evaluation and
investigative arm of Congress, exists to support Congress in meeting its
constitutional responsibilities and to help improve the performance and
accountability of the federal government for the American people. GAO
examines the use of public funds; evaluates federal programs and policies;
and provides analyses, recommendations, and other assistance to help
Congress make informed oversight, policy, and funding decisions. GAO's
commitment to good government is reflected in its core values of
accountability, integrity, and reliability.

  Obtaining Copies of GAO Reports and Testimony

The fastest and easiest way to obtain copies of GAO documents at no cost
is through the Internet. GAO's Web site (www.gao.gov) contains abstracts
and fulltext files of current reports and testimony and an expanding
archive of older products. The Web site features a search engine to help
you locate documents using key words and phrases. You can print these
documents in their entirety, including charts and other graphics.

Each day, GAO issues a list of newly released reports, testimony, and
correspondence. GAO posts this list, known as "Today's Reports," on its
Web site daily. The list contains links to the full-text document files.
To have GAO e-mail this list to you every afternoon, go to www.gao.gov and
select "Subscribe to e-mail alerts" under the "Order GAO Products"
heading.

Order by Mail or Phone	The first copy of each printed report is free.
Additional copies are $2 each. A check or money order should be made out
to the Superintendent of Documents. GAO also accepts VISA and Mastercard.
Orders for 100 or more copies mailed to a single address are discounted 25
percent. Orders should be sent to:

U.S. General Accounting Office 441 G Street NW, Room LM Washington, D.C.
20548

To order by Phone: 	Voice: (202) 512-6000 TDD: (202) 512-2537 Fax: (202)
512-6061

  To Report Fraud, Contact:

Web site: www.gao.gov/fraudnet/fraudnet.htmWaste, and Abuse in E-mail:
[email protected] Federal Programs Automated answering system: (800)
424-5454 or (202) 512-7470

Public Affairs	Jeff Nelligan, Managing Director, [email protected] (202)
512-4800 U.S. General Accounting Office, 441 G Street NW, Room 7149
Washington, D.C. 20548

                               Presorted Standard
                              Postage & Fees Paid
                                      GAO
                                Permit No. GI00

United States
General Accounting Office
Washington, D.C. 20548-0001

Official Business
Penalty for Private Use $300

Address Service Requested
*** End of document. ***