Data Mining: Early Attention to Privacy in Developing a Key DHS  
Program Could Reduce Risks (28-FEB-07, GAO-07-293).		 
                                                                 
The government's interest in using technology to detect terrorism
and other threats has led to increased use of data mining. A	 
technique for extracting useful information from large volumes of
data, data mining offers potential benefits but also raises	 
privacy concerns when the data include personal information. GAO 
was asked to review the development by the Department of Homeland
Security (DHS) of a data mining tool known as ADVISE (Analysis,  
Dissemination, Visualization, Insight, and Semantic Enhancement).
Specifically, GAO was asked to determine (1) the tool's planned  
capabilities, uses, and associated benefits and (2) whether	 
potential privacy issues could arise from using it to process	 
personal information and how DHS has addressed any such issues.  
GAO reviewed program documentation and discussed these issues	 
with DHS officials.						 
-------------------------Indexing Terms------------------------- 
REPORTNUM:   GAO-07-293 					        
    ACCNO:   A66350						        
  TITLE:     Data Mining: Early Attention to Privacy in Developing a  
Key DHS Program Could Reduce Risks				 
     DATE:   02/28/2007 
  SUBJECT:   Data collection					 
	     Data integrity					 
	     Data mining					 
	     Government information dissemination		 
	     Homeland security					 
	     Information disclosure				 
	     Internal controls					 
	     Privacy law					 
	     Privacy policies					 
	     Right of privacy					 
	     Risk assessment					 
	     Policy evaluation					 
	     Counterterrorism					 

******************************************************************
** This file contains an ASCII representation of the text of a  **
** GAO Product.                                                 **
**                                                              **
** No attempt has been made to display graphic images, although **
** figure captions are reproduced.  Tables are included, but    **
** may not resemble those in the printed version.               **
**                                                              **
** Please see the PDF (Portable Document Format) file, when     **
** available, for a complete electronic file of the printed     **
** document's contents.                                         **
**                                                              **
******************************************************************
GAO-07-293

   

     * [1] 

          * [2]Results in Brief
          * [3]Background

               * [4]Privacy Concerns Have Been Raised Regarding Data Mining
               * [5]Federal Laws and Guidance Define Steps to Protect Privacy
                 of
               * [6]Fair Information Practices

          * [7]ADVISE Is Intended to Help Identify Patterns of Interest to

               * [8]The ADVISE Tool Provides Analytical Capabilities Intended
                 to

                    * [9]Information Layer
                    * [10]Knowledge Layer
                    * [11]Application Layer

               * [12]ADVISE Is Expected to Benefit DHS by Helping to Detect
                 Poten

          * [13]DHS Has Not Yet Addressed Key Privacy Risks Associated with

               * [14]Potential Privacy Concerns Arise with the Use of the
                 ADVISE
               * [15]DHS Has Implemented Security Controls but Has Not Yet
                 Assess
               * [16]Privacy Protection Controls to Mitigate Identified Risks
                 Exi

          * [17]Conclusions
          * [18]Recommendations for Executive Action
          * [19]Agency Comments and Our Evaluation

     * [20]Appendix I: Objectives, Scope, and Methodology
     * [21]Appendix II: Comments from the Department of Homeland Securi
     * [22]Appendix III: GAO Contact and Staff Acknowledgments

          * [23]GAO Contact
          * [24]Staff Acknowledgments

               * [25]Order by Mail or Phone

Report to the Chairman, Committee on Appropriations, House of
Representatives

United States Government Accountability Office

GAO

February 2007

DATA MINING

Early Attention to Privacy in Developing a Key DHS Program Could Reduce
Risks

GAO-07-293

Contents

Letter 1

Results in Brief 2
Background 4
ADVISE Is Intended to Help Identify Patterns of Interest to Homeland
Security Analysts 13
DHS Has Not Yet Addressed Key Privacy Risks Associated with Expected Uses
of the ADVISE Tool 18
Conclusions 23
Recommendations for Executive Action 23
Agency Comments and Our Evaluation 24
Appendix I Objectives, Scope, and Methodology 26
Appendix II Comments from the Department of Homeland Security 27
Appendix III GAO Contact and Staff Acknowledgments 30

Table

Table 1: Fair Information Practices 12

Figures

Figure 1: An Overview of the Data Mining Process 5
Figure 2: Major Elements and Functions of ADVISE 14
Figure 3: Typical Semantic Graph 16

Abbreviations

ADVISE Analysis, Dissemination, Visualization, Insight, and Semantic
Enhancement
DHS Department of Homeland Security
ICAHST Interagency Center for Applied Homeland Security Technology
OECD Organization for Economic Cooperation and Development
OMB Office of Management and Budget
PIA privacy impact assessment

This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed in
its entirety without further permission from GAO. However, because this
work may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this material
separately.

United States Government Accountability Office

Washington, DC 20548

February 28, 2007

The Honorable David R. Obey
Chairman, Committee on Appropriations
House of Representatives

Dear Mr. Chairman:

Since the terrorist attacks of September 11, 2001, there has been an
increasing focus on the need to prevent and detect terrorist threats
through technological means. Data mining--a technique for extracting
useful information from large volumes of data--is one type of analysis
that has been used increasingly by the government to help detect terrorist
threats. While data mining offers a number of promising benefits, its use
also raises privacy concerns when the data include personal information.^1

Federal agency use of personal information is governed primarily by the
Privacy Act of 1974 and the E-Government Act of 2002, which prescribe
specific activities that agencies must perform to protect privacy, such as
(1) ensuring that personal information is used only for a specified
purpose, or related purposes, and that it is accurate for those purposes
and (2) conducting assessments of privacy risks associated with
information technology used to process personal information, known as
privacy impact assessments.^2 Agencies that wish to reap the potential
benefits of data mining are faced with the challenge of implementing
adequate privacy controls for the systems that they use to perform these
analyses.

You asked us to review the Department of Homeland Security's (DHS)
development of an analytical tool known as Analysis, Dissemination,
Visualization, Insight, and Semantic Enhancement (ADVISE). Specifically,
we agreed with your staff that our objectives were to determine (1) the
planned capabilities, uses, and associated benefits of the ADVISE tool and
(2) whether potential privacy issues could arise from using ADVISE to
process personal information and how DHS has addressed any such issues.
Our review did not include intelligence applications, such as uses of the
tool by the DHS Office of Intelligence and Analysis.

^1For purposes of this report, the term personal information encompasses
all information associated with an individual, including both identifying
and nonidentifying information. Personally identifying information, which
can be used to locate or identify an individual, includes things such as
names, aliases, and agency-assigned case numbers.

^2A privacy impact assessment is an analysis of how personal information
is collected, stored, shared, and managed in a federal system to ensure
that privacy requirements are addressed.

To address our first objective, we identified and analyzed the ADVISE
tool's planned capabilities, uses, and associated benefits. We reviewed
program documentation, including annual program execution plans, and
interviewed agency officials responsible for managing and implementing the
program. We also interviewed officials at DHS components that have begun
to implement the tool^3 in order to identify their current or planned
uses, the progress of their implementation, and the benefits they hope to
gain.

To address our second objective, we searched for potential privacy
concerns by reviewing relevant reports, including prior GAO reports and
the DHS Privacy Office 2006 report on data mining.^4 We identified and
analyzed actions to comply with the Privacy Act of 1974 and the
E-Government Act of 2002. We also interviewed technical experts within the
DHS Science and Technology Directorate and personnel responsible for
implementing ADVISE at DHS components to assess privacy controls included
in the ADVISE tool, as well as the quality assurance processes for data
analyzed using ADVISE. We performed our work from June 2006 to December
2006 in the Washington, D.C., metropolitan area and Laurel, Maryland. Our
work was performed in accordance with generally accepted government
auditing standards. Our objectives, scope, and methodology are discussed
in more detail in appendix I.

Results in Brief

ADVISE is a data mining tool under development that is intended to
facilitate the analysis of large amounts of data. It is designed to
accommodate both structured data (such as information in a database) and
unstructured data (such as e-mail texts, reports, and news articles) and
to allow an analyst to search for patterns in data, including
relationships among entities (such as people, organizations, and events),
and to produce visual representations of these patterns, referred to as
semantic graphs. Although none are fully operational, DHS's planned uses
of this tool include implementations at four departmental components
(including Immigration and Customs Enforcement and other components).^5
DHS is also considering further deployments of ADVISE. The intended
benefit of the ADVISE tool is to help detect activities that threaten the
United States by facilitating the analysis of large amounts of data that
otherwise would be very difficult to review. DHS is currently in the
process of testing the tool's effectiveness.

^3These DHS components include Immigration and Customs Enforcement and
other components. We also interviewed officials from the Interagency
Center of Applied Homeland Security Technology, who are responsible for
testing the tool's capabilities. ADVISE is also being used by the Office
of Intelligence and Analysis. We did not review that application.

^4DHS, Data Mining Report: DHS Privacy Office Response to House Report
108-774 (July 6, 2006).

Use of the ADVISE tool raises a number of privacy concerns. DHS has added
security controls to the ADVISE tool, including access restrictions,
authentication procedures, and security auditing capability. However, it
has not assessed privacy risks. Privacy risks that could apply to ADVISE
include the potential for erroneous association of individuals with crime
or terrorism, the misidentification of individuals with similar names, and
the use of data that were collected for other purposes. A privacy impact
assessment would determine the specific privacy risks associated with
ADVISE and help officials determine what controls are needed to mitigate
those risks. Although DHS officials are considering conducting a modified
version of such an assessment, the ADVISE tool has not yet been assessed
because department officials believe it is not needed given that the
ADVISE tool itself does not contain personal data. However, the tool's
intended uses include applications involving personal information, and the
E-Government Act, as well as related Office of Management and Budget and
DHS guidance, emphasize the need to assess privacy risks early in systems
development. Further, if a privacy impact assessment were conducted now
and privacy risks identified, a number of controls exist that could be
built into the tool to mitigate those risks. For example, controls could
be implemented to ensure that personal information is used only for a
specified purpose or compatible purposes, or they could provide the
capability to distinguish among individuals that have similar names (a
process known as disambiguation) to address the risk of misidentification.
Because privacy risks such as these have not been assessed and decisions
about mitigating controls have not been made, DHS faces the likelihood
that ADVISE-based system implementations containing personal information
may require costly and potentially duplicative retrofitting at a later
date to add the needed privacy controls.

^5ADVISE is also being used by the Office of Intelligence and Analysis. We
did not review that application.

To ensure that privacy protections are in place before DHS proceeds with
implementations of systems based on ADVISE, we are recommending that the
Secretary of Homeland Security immediately conduct a privacy impact
assessment of the ADVISE tool to identify privacy risks and implement
privacy controls to mitigate those risks.

We obtained oral and written comments on a draft of this report from DHS.
In its comments DHS generally agreed with the content of this report and
described actions initiated to address our recommendations.

Background

As defined in a report that we issued in May 2004,^6 data mining is the
application of database technology and techniques--such as statistical
analysis and modeling--to uncover hidden patterns and subtle relationships
in data and to infer rules that allow for the prediction of future
results. This definition is based on the most commonly used terms found in
a survey of the technical literature.

Data mining has been used successfully for a number of years in the
private and public sectors in a broad range of applications. In the
private sector, these applications include customer relationship
management, market research, retail and supply chain analysis, medical
analysis and diagnostics, financial analysis, and fraud detection. In the
government, data mining has been used to detect financial fraud and abuse.
For example, we used data mining to identify fraud and abuse in expedited
assistance and other disbursements to Hurricane Katrina victims.^7

Although the characteristics of data mining efforts can vary greatly, data
mining generally incorporates three processes: data input, data analysis,
and results output. In data input, data are collected in a central data
"warehouse," validated, and formatted for use in data mining. In the data
analysis phase, data are typically queried to find records that match
topics of interest. The two most common types of queries are pattern-based
queries and subject-based queries:

^6GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses,
[26]GAO-04-548 (Washington, D.C.: May 4, 2004).

^7GAO, Expedited Assistance for Victims of Hurricane Katrina and Rita:
FEMA's Control Weaknesses Exposed the Government to Significant Fraud and
Abuse, [27]GAO-06-403T (Washington, D.C.: Feb. 13, 2006).

           o Pattern-based queries search for data elements that match or
           depart from a predetermined pattern (e.g., unusual claim patterns
           in an insurance program).

           o Subject-based queries search for any available information on a
           predetermined subject using a specific identifier. This could be
           personal information such as an individual identifier (e.g., an
           individual's name or Social Security number) or an identifier for
           a specific object or location. For example, the Navy uses
           subject-based data mining to identify trends in the failure rate
           of parts used in its ships.

           The data analysis phase can be iterative, with the results of one
           query being used to refine criteria for a subsequent query. The
           output phase can produce results in printed or electronic format.
           These reports can be accessed by agency personnel and can also be
           shared with personnel from other agencies. Figure 1 depicts a
           generic data mining process.

Figure 1: An Overview of the Data Mining Process

In recent years, data mining has emerged as a prevalent government
mechanism for processing and analyzing large amounts of data. In our May
2004 report, we noted that 52 agencies were using or were planning to use
data mining in 199 cases, of which 68 were planned, and 131 were
operational. Additionally, following the terrorist attacks of September
11, 2001, data mining has been used increasingly as a tool to help detect
terrorist threats through the collection and analysis of public and
private sector data. This may include tracking terrorist activities,
including money transfers and communications, and tracking terrorists
themselves through travel and immigration records. According to an August
2006 DHS Office of Inspector General survey of departmental data mining
initiatives,^8 DHS is using or developing 12 data mining programs, 9 of
which are fully operational and 3 of which are still under development.

One such effort is the ADVISE technology program. Managed by the DHS
Science and Technology Directorate,^9 the ADVISE program is primarily
responsible for (1) continuing to develop the ADVISE data mining tool and
(2) promoting and supporting its implementation throughout DHS. According
to program officials, it has spent approximately $40 million to develop
the tool since 2003.

To promote the possible implementation of the tool within DHS component
organizations, program officials have made demonstrations (using
unclassified data) to interested officials, highlighting the tool's
planned capabilities and expected benefits. Program officials have
established working relationships with component organizations that are
considering adopting the tool, including detailing them staff (typically
contractor-provided) to assist in the setup and customization of their
ADVISE implementation and providing training for the analysts who are to
use it.

Program officials project that implementation of the tool at a component
organization should generally consist of six main phases and take
approximately 12 to 18 months to complete. The six phases are as follows:

           o preparing infrastructure and installing hardware and software;

           o modeling information sources and loading data;

           o verifying and validating that loaded data are accurate and
           accessible;

           o training and familiarizing analysts and assisting in the
           development of initial research activities using visualization
           tools;

           o supporting analysts in identifying the best ways to use ADVISE
           for their problems, obtaining data, and developing ideas for
           further improvements; and

           o turning over deployment to the component organizations to
           maintain the system and its associated data feeds.

           The program has also provided initial funding for the setup,
           customization, and pilot testing of implementations within
           components, under the assumption that when an implementation
           achieves operational status, the respective component will take
           over operations and maintenance costs. Program officials estimate
           that the tool's operations and maintenance costs will be
           approximately $100,000 per year, per analyst. The program has also
           offered additional support to components implementing the tool,
           such as helping them develop privacy compliance documentation.
           According to DHS officials, the program has spent $12.15 million
           of its $40 million in support of several pilot projects and test
           implementations throughout the department.

           Currently, the department's Interagency Center for Applied
           Homeland Security Technologies (ICAHST) group within the Science
           and Technology Directorate is testing the tool's effectiveness,
           adequacy, and cost-effectiveness as a data mining technology.
           ICAHST has completed preliminary testing of basic functionality
           and is currently in the process of testing the system's
           effectiveness, using mock data to test how well ADVISE identifies
           specified patterns of interest.
			  
			  Privacy Concerns Have Been Raised Regarding Data Mining

           The impact of computer systems on the ability of organizations to
           protect personal information was recognized as early as 1973, when
           a federal advisory committee on automated personal data systems
           observed that "The computer enables organizations to enlarge their
           data processing capacity substantially, while greatly facilitating
           access to recorded data, both within organizations and across
           boundaries that separate them." In addition, the committee
           concluded that "The net effect of computerization is that it is
           becoming much easier for record-keeping systems to affect people
           than for people to affect record-keeping systems." ^10

           In May 2004, we reported that mining government and private
           databases containing personal information creates a range of
           privacy concerns.^11 Through data mining, agencies can quickly and
           efficiently obtain information on individuals or groups by
           searching large databases containing personal information
           aggregated from public and private records. Information can be
           developed about a specific individual or a group of individuals
           whose behavior or characteristics fit a specific pattern. The ease
           with which organizations can use automated systems to gather and
           analyze large amounts of previously isolated information raises
           concerns about the impact on personal privacy.

           Further, we reported in August 2005^12 that although agencies
           responsible for certain data mining efforts took many of the key
           steps required by federal law and executive branch guidance for
           the protection of personal information, none followed all key
           procedures. Specifically, while three of the four agencies we
           reviewed had prepared privacy impact assessments
           (PIA)--assessments of privacy risks associated with information
           technology used to process personal information--for their data
           mining systems, none of them had completed a PIA that adequately
           addressed all applicable statutory requirements. We recommended
           that four agencies complete or revise PIAs for their systems to
           fully comply with applicable guidance. As of December 2006, three
           of the four agencies reported that they had taken action to
           complete or revise their PIAs.
			  
			  Federal Laws and Guidance Define Steps to Protect Privacy of
			  Personal Information

           Federal law includes a number of separate statutes that provide
           privacy protections for information used for specific purposes or
           maintained by specific types of entities. The major requirements
           for the protection of personal privacy by federal agencies come
           from two laws, the Privacy Act of 1974 and the privacy provisions
           of the E-Government Act of 2002. The Office of Management and
           Budget (OMB) is tasked with providing guidance to agencies on how
           to implement the provisions of both laws and has done so,
           beginning with guidance on the Privacy Act, issued in 1975.

           The Privacy Act places limitations on agencies' collection,
           disclosure, and use of personal information maintained in systems
           of records. The act describes a "record" as any item, collection,
           or grouping of information about an individual that is maintained
           by an agency and contains his or her name or another personal
           identifier. It also defines "system of records" as a group of
           records under the control of any agency from which information is
           retrieved by the name of the individual or by an individual
           identifier. The Privacy Act requires that when agencies establish
           or make changes to a system of records, they must notify the
           public through a "system of records notice": that is, a notice in
           the Federal Register identifying, among other things, the type of
           data collected, the types of individuals about whom information is
           collected, the intended "routine" uses of data, and procedures
           that individuals can use to review and correct personal
           information.^13 In addition, the act requires agencies to publish
           in the Federal Register notice of any new or intended use of the
           information in the system, and provide an opportunity for
           interested persons to submit written data, views, or arguments to
           the agency.

           Several provisions of the act require agencies to define and limit
           themselves to specific predefined purposes. For example, the act
           requires that to the greatest extent practicable, personal
           information should be collected directly from the subject
           individual when it may affect an individual's rights or benefits
           under a federal program. The act also requires that an agency
           inform individuals whom it asks to supply information of (1) the
           authority for soliciting the information and whether disclosure of
           such information is mandatory or voluntary; (2) the principal
           purposes for which the information is intended to be used; (3) the
           routine uses that may be made of the information; and (4) the
           effects on the individual, if any, of not providing the
           information. In addition, the act requires that each agency that
           maintains a system of records store only such information about an
           individual as is relevant and necessary to accomplish a purpose of
           the agency.

           Agencies are allowed to claim exemptions from some of the
           provisions of the Privacy Act if the records are used for certain
           purposes. For example, records compiled for criminal law
           enforcement purposes can be exempt from a number of provisions,
           including (1) the requirement to notify individuals of the
           purposes and uses of the information at the time of collection and
           (2) the requirement to ensure the accuracy, relevance, timeliness,
           and completeness of records. In general, the exemptions for law
           enforcement purposes are intended to prevent the disclosure of
           information collected as part of an ongoing investigation that
           could impair the investigation or allow those under investigation
           to change their behavior or take other actions to escape
           prosecution.

           The E-Government Act of 2002 strives to enhance protection for
           personal information in government information systems or
           information collections by requiring that agencies conduct PIAs.
           As described earlier, a PIA is an analysis of how personal
           information is collected, stored, shared, and managed in a federal
           system. More specifically, according to OMB guidance,^14 a PIA is
           an analysis of how

           ...information is handled: (i) to ensure handling conforms to
           applicable legal, regulatory, and policy requirements regarding
           privacy; (ii) to determine the risks and effects of collecting,
           maintaining, and disseminating information in identifiable form in
           an electronic information system; and (iii) to examine and
           evaluate protections and alternative processes for handling
           information to mitigate potential privacy risks.

           Agencies must conduct PIAs before (1) developing or procuring
           information technology that collects, maintains, or disseminates
           information that is in a personally identifiable form or (2)
           initiating any new data collections involving personal information
           that will be collected, maintained, or disseminated using
           information technology if the same questions are asked of 10 or
           more people. OMB guidance also requires agencies to conduct PIAs
           in two specific types of situations: (1) when, as a result of the
           adoption or alteration of business processes, government databases
           holding information in personally identifiable form are merged,
           centralized, matched with other databases, or otherwise
           significantly manipulated and (2) when agencies work together on
           shared functions involving significant new uses or exchanges of
           information in personally identifiable form.^15

           DHS has also developed its own guidance^16 requiring PIAs to be
           performed when one of its offices is developing or procuring any
           new technologies or systems, including classified systems, that
           handle or collect personally identifiable information. It also
           requires that PIAs be performed before pilot tests are begun for
           these systems or when significant modifications are made to them.
           Furthermore, DHS has prescribed detailed requirements for PIAs.
           For example, PIAs must describe all uses of the information, and
           whether the system analyzes data in order to identify previously
           unknown patterns or areas of note or concern.
			  
			  Fair Information Practices

           The Privacy Act of 1974 is largely based on a set of
           internationally recognized principles for protecting the privacy
           and security of personal information known as the Fair Information
           Practices. A U.S. government advisory committee first proposed the
           practices in 1973 to address what it termed a poor level of
           protection afforded to privacy under contemporary law.^17 The
           Organization for Economic Cooperation and Development (OECD)^18
           developed a revised version of the Fair Information Practices in
           1980 that has, with some variation, formed the basis of privacy
           laws and related policies in many countries, including the United
           States, Germany, Sweden, Australia, New Zealand, and the European
           Union.^19 The eight principles of the OECD Fair Information
           Practices are shown in table 1.

           Table 1: Fair Information Practices
			  
			  Principle                Description                                       
Collection limitation    The collection of personal information should be  
                            limited, should be obtained by lawful and fair    
                            means, and, where appropriate, with the knowledge 
                            or consent of the individual.                     
Data quality             Personal information should be relevant to the    
                            purpose for which it is collected, and should be  
                            accurate, complete, and current as needed for     
                            that purpose.                                     
Purpose specification    The purposes for the collection of personal       
                            information should be disclosed before collection 
                            and upon any change to that purpose, and its use  
                            should be limited to those purposes and           
                            compatible purposes.                              
Use limitation           Personal information should not be disclosed or   
                            otherwise used for other than a specified purpose 
                            without consent of the individual or legal        
                            authority.                                        
Security safeguards      Personal information should be protected with     
                            reasonable security safeguards against risks such 
                            as loss or unauthorized access, destruction, use, 
                            modification, or disclosure.                      
Openness                 The public should be informed about privacy       
                            policies and practices, and individuals should    
                            have ready means of learning about the use of     
                            personal information.                             
Individual participation Individuals should have the following rights: to  
                            know about the collection of personal             
                            information, to access that information, to       
                            request correction, and to challenge the denial   
                            of those rights.                                  
Accountability           Individuals controlling the collection or use of  
                            personal information should be accountable for    
                            taking steps to ensure the implementation of      
                            these principles.                                 

           Source: OECD.

           The Fair Information Practices are not precise legal requirements.
           Rather, they provide a framework of principles for balancing the
           need for privacy with other public policy interests, such as
           national security, law enforcement, and administrative efficiency.
           Ways to strike that balance vary among countries and according to
           the type of information under consideration.
			  
			  ADVISE Is Intended to Help Identify Patterns of Interest to
			  Homeland Security Analysts

           ADVISE is a data mining tool under development that is intended to
           facilitate the analysis of large amounts of data. It is designed
           to accommodate both structured data (such as information in a
           database) and unstructured data (such as e-mail texts, reports,
           and news articles) and to allow an analyst to search for patterns
           in data, including relationships among entities (such as people,
           organizations, and events) and to produce visual representations
           of these patterns, referred to as semantic graphs. Although none
           are fully operational, DHS's planned uses of this tool include
           implementations at several departmental components, including
           Immigration and Customs Enforcement and other components. DHS is
           also considering further deployments of ADVISE. The intended
           benefit of the ADVISE tool is to help detect activities that
           threaten the United States by facilitating the analysis of large
           amounts of data that otherwise would be prohibitively difficult to
           review. DHS is currently in the process of testing the tool's
           effectiveness.
			  
			  The ADVISE Tool Provides Analytical Capabilities Intended to
			  Identify Patterns of Interest to DHS Analysts

           ADVISE provides several capabilities that help to find and track
           relationships in data. These include graphically displaying the
           results of searches and providing automated alerts when predefined
           patterns of interest emerge in the data. The tool consists of
           three main elements--the Information Layer, Knowledge Layer, and
           Application Layer (depicted in fig. 2).

^8DHS Office of Inspector General, Survey of DHS Data Mining Activities
(August 2006).

^9The mission of the Science and Technology Directorate is to act as the
primary research and development arm of DHS, providing federal, state, and
local officials with the technology and capabilities to protect the United
States homeland.

^10U.S. Department of Health, Education, and Welfare, Records, Computers
and the Rights of Citizens: Report of the Secretary's Advisory Committee
on Automated Personal Data Systems (Washington, D.C.: July 1973).

^11GAO-04-548  .

^12GAO, Data Mining: Agencies Have Taken Key Steps to Protect Privacy in
Selected Efforts, but Significant Compliance Issues Remain, [28]GAO-05-866
(Washington, D.C.: Aug. 15, 2005).

^13Under the Privacy Act of 1974, the term "routine use" means (with
respect to the disclosure of a record) the use of such a record for a
purpose that is compatible with the purpose for which it was collected. 5
U.S.C. S 552a(a)(7).

^14OMB, Guidance for Implementing the Privacy Provisions of the
E-Government Act of 2002, Memorandum M-03-22 (Washington, D.C.: Sept. 26,
2003).

^15A PIA may not be required for all systems. For example, no assessment
is required when the information collected relates to internal government
operations, when the information has been previously assessed under an
evaluation similar to a PIA, or when privacy issues are unchanged.

^16DHS Privacy Office, PIA Official Guidance (March 2006).

^17U.S. Department of Health, Education, and Welfare, Records, Computers
and the Rights of Citizens: Report of the Secretary's Advisory Committee
on Automated Personal Data Systems (Washington, D.C.: July 1973).

^18OECD, Guidelines on the Protection of Privacy and Transborder Flow of
Personal Data (Sept. 23, 1980). The OECD plays a prominent role in
fostering good governance in the public service and in corporate activity
among its 30 member countries. It produces internationally agreed-upon
instruments, decisions, and recommendations to promote rules in areas
where multilateral agreement is necessary for individual countries to make
progress in the global economy.

^19European Union Data Protection Directive ("Directive 95/46/EC of the
European Parliament and of the Council of 24 October 1995 on the
Protection of Individuals with Regard to the Processing of Personal Data
and the Free Movement of Such Data") (1995).

ADVISE Is Intended to Help Identify Patterns of Interest to Homeland Security
Analysts

The ADVISE Tool Provides Analytical Capabilities Intended to Identify Patterns
of Interest to DHS Analysts

Figure 2: Major Elements and Functions of ADVISE

  Information Layer

At the Information Layer, disparate data are brought into the tool from
various sources. These data sources can be both structured (such as
computerized databases and watch lists) and unstructured (such as news
feeds and text reports). For structured data, ADVISE contains software
applications that load the data into the Information Layer and format it
to conform to a specific predefined data structure, known as an ontology.
Generally speaking, ontologies define entities (such as a person or
place), attributes (such as name and address), and the relationships among
them.

For unstructured data, ADVISE includes several tools that extract
information about entities and attributes. As with structured data, the
output of these analyses is formatted and structured according to an
ontology. Tagging information as specific entities and attributes is more
difficult with unstructured data, and ADVISE includes tools that allow
analysts to manually identify entities, attributes, and relationships
among them. According to DHS officials, research is continuing on
developing efficient and effective mechanisms for inputting different
forms of unstructured data.

ADVISE can also include information about the data--known as
"metadata"--such as the time period to which the data pertain and whether
the data refer to a U.S. person. ADVISE metadata also include confidence
attributes, ranging from 1 to -1, which represent subjective assessments
of the accuracy of the data. Each data source has a predefined confidence
attribute. Analysts can change the confidence attribute of specific data,
but changes to confidence levels are tracked and linked to the analysts
making the changes.

  Knowledge Layer

At the Knowledge Layer, facts and relationships from the Information Layer
are consolidated into a large-scale semantic graph and various subgraphs.
Semantic graphing is a data modeling technique that uses a combination of
"nodes," representing specific entities, and connecting lines,
representing the relationships among them. Because they are well-suited to
representing data relationships and linkages, semantic graphs have emerged
as a key technology for consolidating and organizing disparate data.
Figure 3 represents the format that a typical semantic graph could take.
The Knowledge Layer contains the semantic graph of all facts reported
through the Information Layer interface and organized according to the
ontology.

Figure 3: Typical Semantic Graph

The Knowledge Layer also includes the capability to provide automatic
alerts to analysts when patterns of interest (or partial patterns) are
matched by new incoming information.

  Application Layer

At the Application Layer, analysts are able to interact with the data that
reside in the Knowledge Layer. The Application Layer contains tools that
allow analysts to perform both pattern-based and subject-based queries and
to search for data that match a specific pattern, as well as data that are
connected with a specific entity. For example, analysts could search for
all of the individuals who have traveled to a certain destination within a
given period of time, or they could search for all information connected
with a particular person, place, or organization. The resulting output of
these searches is then graphically displayed via semantic graphs.

ADVISE's Application Layer also provides several other capabilities that
allow for the further examination and adjustment of its output. An analyst
can pinpoint nodes on a semantic graph to view and examine additional
information related to them, including the source from which the
information and relationships are derived, the data source's confidence
level, and whether the data pertain to U.S. persons.

The ADVISE Application Layer also provides analysts the ability to monitor
patterns of interest in the data. Science and Technology Directorate staff
work with component staff to define patterns of interest and build an
inventory of automated searches. These patterns are continuously being
monitored in the data, and an alert is provided whenever there is a match.
For example, an analyst could define a pattern of interest as "all
individuals traveling from the United States to the Middle East in the
next 6 months" and have the ADVISE tool provide an alert whenever this
pattern emerges in the data.

ADVISE Is Expected to Benefit DHS by Helping to Detect Potentially Threatening
Activities

The current planned uses of the ADVISE tool include implementations at
several DHS components that are planning to use it in a variety of
homeland security applications to further their respective organizational
missions. Currently none of these implementations is fully operational or
widely accessible to DHS analysts. Rather, they are all still in various
phases of systems development. These applications are expected to use the
tool primarily to help analysts detect threats to the United States, such
as identifying activities and/or individuals that could be associated with
terrorism.

The intended benefit of the ADVISE tool is to consolidate large amounts of
structured and unstructured data and permit their analysis and
visualization. The tool could thus assist analysts to identify and monitor
patterns of interest that could be further investigated and might
otherwise have been missed.

None of the DHS components have fully implemented the tool in operational
systems and, as discussed earlier, testing of the tool is still under way.
Until such testing is complete and component implementations are fully
operational, the intended benefit remains largely potential.

DHS Has Not Yet Addressed Key Privacy Risks Associated with Expected Uses of the
ADVISE Tool

Use of the ADVISE tool raises a number of privacy concerns. DHS has added
security controls to the ADVISE tool, including access restrictions,
authentication procedures, and security auditing capability. However, it
has not assessed privacy risks. Privacy risks that could apply to ADVISE
include the potential for erroneous association of individuals with crime
or terrorism through data that are not accurate for that purpose, the
misidentification of individuals with similar names, and the use of data
that were collected for other purposes. A PIA would determine the privacy
risks associated with ADVISE and help officials determine what specific
controls are needed to mitigate those risks. Although department officials
believe a PIA is not needed given that the ADVISE tool itself does not
contain personal data, the E-Government Act of 2002 and related federal
guidance require the completion of PIAs from the early stages of
development. Further, if a PIA were conducted and privacy risks
identified, a number of controls exist that could be built into the tool
to mitigate those risks. For example, controls could be implemented to
ensure that personal information is used only for a specified purpose or
compatible purposes, or they could provide the capability to distinguish
among individuals that have similar names (a process known as
disambiguation) to address the risk of misidentification. Because privacy
risks such as these have not been assessed and decisions about mitigating
controls have not been made, DHS faces the likelihood that system
implementations based on the tool may require costly and potentially
duplicative retrofitting at a later date to add the needed controls.

Potential Privacy Concerns Arise with the Use of the ADVISE Tool to Process
Personal Information

Like other data mining applications, the use of the ADVISE tool in
conjunction with personal information raises concerns about a number of
privacy risks that could potentially have an adverse impact on
individuals. As the DHS Privacy Office's July 2006 report on data mining
activities notes, "privacy and civil liberties issues potentially arise in
every phase of the data mining process." ^20

Potential privacy risks can be categorized in relation to the Fair
Information Practices, which, as discussed earlier, form the basis for
privacy laws such as the Privacy Act. For example, the potential for
personal information to be improperly accessed or disclosed relates to the
security safeguards principle, which states that personal information
should be protected against risks such as loss or unauthorized access,
destruction, use, modification, or disclosure. Further, the potential for
individuals to be misidentified or erroneously associated with
inappropriate activities is inconsistent with the data quality principle
that personal data should be accurate, complete, and current, as needed
for a given purpose. Similarly, the risk that information could be used
beyond the scope originally specified is based on the purpose
specification and use limitation principles, which state that, among other
things, personal information should only be collected and used for a
specific purpose and that such use should be limited to the specified
purpose and compatible purposes.

^20DHS, Data Mining Report: DHS Privacy Office Response to House Report
108-774 (July 6, 2006), p. 12.

Like other data mining applications, the ADVISE tool could misidentify or
erroneously associate an individual with undesirable activity such as
fraud, crime, or terrorism--a result known as a false positive. False
positives may be the result of poor data quality, or they could result
from the inability of the system to distinguish among individuals with
similar names. Data quality, the principle that data should be accurate,
current, and complete as needed for a given purpose, could be particularly
difficult to ensure with regard to ADVISE because the tool brings together
multiple, disparate data sources, some of which may be more accurate for
the analytical purpose at hand than others. If data being analyzed by the
tool were never intended for such a purpose or are not accurate for that
purpose, then conclusions drawn from such an analysis would also be
erroneous.

Another privacy risk is the potential for use of the tool to extend beyond
the scope of what it was originally designed to address, a phenomenon
commonly referred to as function or mission "creep." Because it can
facilitate a broad range of potential queries and analyses and aggregate
large quantities of previously isolated pieces of information, ADVISE
could produce aggregated, organized information that organizations could
be tempted to use for purposes beyond that which was originally specified
when the information was collected. The risks associated with mission
creep are relevant to the purpose specification and use limitation
principles.

DHS Has Implemented Security Controls but Has Not Yet Assessed Privacy Risks

To address security, DHS has included several types of controls in ADVISE.
These include authentication procedures, access controls, and security
auditing capability. For example, an analyst must provide a valid user
name and password in order to gain access to the tool. Further, upon
gaining access, only users with appropriate security clearances may view
sensitive data sets. Each service requested by a user--such as issuing a
query or retrieving a document--is checked against the user's credentials
and access authorization before it is provided. In addition, these user
requests and the tool's responses to them are all recorded in an audit
log.

While inclusion of controls such as these is a key step in guarding
against unauthorized access, use, disclosure, or modification, such
controls alone do not address the full range of potential privacy risks.
The need to evaluate such risks early in the development of information
technology is consistently reflected in both law (the E-Government Act of
2002) and related federal guidance. The E-Government Act requires that a
PIA be performed before an agency develops or procures information
technology that collects, maintains, or disseminates information in a
personally identifiable form. Further, both OMB and DHS PIA guidance
emphasize the need to assess privacy risks from the early stages of
development.^21

However, although DHS officials are considering performing a PIA, no PIA
or other privacy risk assessment has yet been conducted. The DHS Privacy
Office^22 instructed the Science and Technology Directorate that a PIA was
not required because the tool alone did not contain personal data.^23
According to the Privacy Office rationale, only specific system
implementations based on ADVISE that contained personal data would likely
require PIAs, and only at the time they first began to use such data.
However, guidance on conducting PIAs makes it clear that they should be
performed at the early stages of development. OMB's PIA guidance requires
PIAs at the IT development stage, stating that they "should address the
impact the system will have on an individual's privacy, specifically
identifying and evaluating potential threats relating to elements
identified [such as the nature, source, and intended uses of the
information] to the extent these elements are known at the initial stages
of development." Regarding ADVISE, the tool's intended uses include
applications containing personal information. Thus the requirement to
conduct a PIA from the early stages of development applies.

^21DHS PIA guidance states that "[t]he purpose of a PIA is to demonstrate
that system owners and developers have consciously incorporated privacy
protections throughout the entire life cycle of a system. This involves
making certain that privacy protections are built into the system from the
start, not after the fact when they can be far more costly or could affect
the viability of the project." In addition, OMB guidance states that
"[a]gencies should commence a PIA when they begin to develop a new or
significantly modified IT system."

^22The DHS Privacy Office was created in response to the Homeland Security
Act of 2002, Pub. L. No. 107-296, S 222, 116 Stat. 2155 (Nov. 25, 2002).
The Privacy Officer is responsible for, among other things, "assuring that
the use of technologies sustain[s], and do[es] not erode privacy
protections relating to the use, collection, and disclosure of personal
information."

^23It is important to note the distinction between the PIA requirement,
based on the E-Government Act, and the requirements of the Privacy Act.
Because the ADVISE tool itself does not contain any data, it is not
considered a system of records for purposes of the Privacy Act and thus is
not subject to the requirements of that law. As ADVISE implementations
move from development to operations, they may lead to the creation or
modification of systems of records, which would require the development of
appropriate privacy notices to be published in the Federal Register and
other actions to protect privacy.

As of November 2006, the ADVISE program office and DHS Privacy Office were
in discussions regarding the possibility of conducting a privacy
assessment similar to a PIA but modified to address the development of a
technological tool. No final decision has yet been made on whether or how
to proceed with a PIA. However, until such an assessment is performed, DHS
cannot be assured that privacy risks have been identified or will be
mitigated for system implementations based on the tool.

Privacy Protection Controls to Mitigate Identified Risks Exist and Could Be
Built into ADVISE

A variety of privacy controls can be built into data mining software
applications, including the ADVISE tool, to help mitigate risks identified
in PIAs and protect the privacy of individuals whose information may be
processed. DHS has recognized the importance of implementing such privacy
protections when data mining applications are being developed.
Specifically, in its July 2006 report, the DHS Privacy Office recommended
instituting controls for data mining activities that go beyond conducting
PIAs and implementing standard security controls. Such measures could be
applied to the development of the ADVISE tool.^24 Among other things, the
DHS Privacy Office recommended that DHS components use data mining tools
principally as investigative tools and not as a means of making automated
decisions regarding individuals.^25 The report also emphasizes that data
mining should produce accurate results and recommends that DHS adopt data
quality standards for data used in data mining. Further, the report
recommends that data mining projects give explicit consideration to using
anonymized data when personally identifiable information is involved.
Although some of the report's recommendations may apply only to
operational data mining activities, many reflect system functionalities
that can be addressed during technology development.

^24The Privacy Office's report states that ADVISE is a "technology" and
not a data mining program. Accordingly, the report's recommendations
ostensibly would not apply to ADVISE. However, the report acknowledges
that uses of ADVISE may constitute data mining, in which case the
recommendations would apply.

^25ADVISE does not provide an automated means for making decisions about
individuals. Rather, it is an analysis tool to aid analysts in identifying
relationships and patterns of interest.

Based on privacy risks identified in a PIA, controls exist that could be
implemented in ADVISE to mitigate those risks. For example, controls could
be implemented to enforce use limitations associated with the purpose
specified when the data were originally collected. Specifically, software
controls could be implemented that require an analyst to specify an
allowable purpose and check that purpose against the specified purposes of
the databases being accessed.

Regarding data quality risks, the ADVISE tool currently does not have the
capability to distinguish among individuals with similar identifying
information, nor does it have a mechanism to assess the accuracy of the
relationships it uncovers. To address the risk of misidentification,
software could be added to the tool to distinguish among individuals that
have similar names, a process known as disambiguation. Disambiguation
tools have been developed for other applications. Additionally, although
the ADVISE tool includes a feature that allows analysts to designate
confidence levels for individual pieces of data, no mechanism has been
developed to assess the confidence of relationships identified by the
tool. While software specifically to determine data quality would be
difficult to develop, other controls exist that could be readily used as
part of a strategy for mitigating this risk. For example, anonymization
could be used to minimize the exposure of personal data, and operational
procedures could be developed to restrict the use of analytical results
containing personal information that could have data quality concerns. To
implement anonymization, the tool would need the software capability to
handle anonymized data or have a built-in data anonymizer. DHS currently
does not have plans to build anonymization into the ADVISE tool.^26

Until a PIA that identifies the privacy risks of ADVISE is conducted and
privacy controls to mitigate those risks are implemented, DHS faces the
risk that privacy concerns will arise during implementation of systems
based on ADVISE that may be more difficult to address at that stage and
possibly require costly retrofitting.

^26In addition, a feature was to be implemented in January 2007 that would
enforce an internal DHS rule regarding how long information about U.S.
persons can be maintained in intelligence data bases. However, because
this control is designed to respond only to the DHS rule--and not to
identified privacy risks--it leaves potential concerns unaddressed about
how personal information is used when it is maintained and processed by
ADVISE.

Conclusions

The ADVISE tool is intended to provide the capability to ingest large
amounts of data from multiple sources and to display relationships that
can be discerned within the data. Although the ADVISE tool has not yet
been fully implemented and its effectiveness is still being evaluated, the
chief intended benefit is to help detect activities threatening to the
United States by facilitating the analysis of large amounts of data.

The ADVISE tool incorporates security controls intended to protect the
information it processes from unauthorized access. However, because ADVISE
is intended to be used in ways that are likely to involve personal data, a
range of potential privacy risks could be involved in its operational use.
Thus, it is important that those risks be assessed--through a PIA--so that
additional controls can be established to mitigate them. However, DHS has
not yet conducted a PIA, despite the fact that the E-Government Act and
related OMB and DHS guidance emphasize the need to assess privacy risks
early in systems development. Although DHS officials stated that they
believe a PIA is not required because the tool alone does not contain
personal data, they also told us they are considering conducting a
modified PIA for the tool. Until a PIA is conducted, little assurance
exists that privacy risks have been rigorously considered and mitigating
controls established. If controls are not addressed now, they may be more
difficult and costly to retrofit at a later stage.

Recommendations for Executive Action

To ensure that privacy protections are in place before DHS proceeds with
implementations of systems based on ADVISE, we recommend that the
Secretary of Homeland Security take the following two actions:

           o immediately conduct a privacy impact assessment of the ADVISE
           tool to identify privacy risks, such as those described in this
           report, and

           o implement privacy controls to mitigate potential privacy risks
           identified in the PIA.
			  
			  Agency Comments and Our Evaluation

           We received oral and written comments on a draft of this report
           from the DHS Departmental GAO/Office of Inspector General Liaison
           Office. (Written comments are reproduced in appendix II.) DHS
           officials generally agreed with the content of this report and
           described actions initiated to address our recommendations. DHS
           also provided technical comments, which have been incorporated in
           the final report as appropriate.

           In its comments DHS emphasized the fact that the ADVISE tool
           itself does not contain personal data and that each deployment of
           the tool will be reviewed through the department's privacy
           compliance process, including, as applicable, development of a PIA
           and a system of records notice. DHS further stated that it is
           currently developing a "Privacy Technology Implementation Guide"
           to be used to conduct a PIA for ADVISE. Although we have not
           reviewed the guide, it appears to be a positive step toward
           developing a PIA process to address technology tools such as
           ADVISE.

           It is not clear from the department's response whether the privacy
           controls identified based on applying the Privacy Technology
           Implementation Guide to ADVISE are to be incorporated into the
           tool itself. We believe that any controls identified by a PIA to
           mitigate privacy risks should be implemented, to the extent
           possible, in the tool itself. Specific development efforts that
           use the tool will then have these integrated controls readily
           available, thus reducing the potential for added costs and
           technical risks. The department also requested that we change the
           wording of our recommendation; however, we have retained the
           wording in our draft report because it clearly emphasizes the need
           to incorporate privacy controls into the ADVISE tool itself.

           As agreed with your office, unless you publicly announce the
           contents of this report earlier, we plan no further distribution
           until 30 days from the report date. At that time, we will send
           copies of this report to the Secretary of Homeland Security and
           other interested congressional committees. Copies will be made
           available to others on request. In addition, this report will be
           available at no charge on our Web site at www.gao.gov .

           If you have any questions concerning this report, please call me
           at (202) 512-6240 or send e-mail to [email protected] . Contact
           points for our Offices of Congressional Relations and Public
           Affairs may be found on the last page of this report. Key
           contributors to this report are listed in appendix III.

           Sincerely yours,

           Linda D. Koontz
			  Director, Information Management Issues
			  
			  Appendix I: Objectives, Scope, and Methodology

           Our objectives were to determine the following:

           o the planned capabilities, uses, and associated benefits of the
           Analysis Dissemination, Visualization, Insight, and Semantic
           Enhancement (ADVISE) tool and

           o whether potential privacy issues could arise from using the
           ADVISE tool to process personal information and how the Department
           of Homeland Security (DHS) has addressed any such issues.

           To address our first objective, we identified and analyzed the
           tool's capabilities, planned uses, and associated benefits. We
           reviewed program documentation, including annual program execution
           plans, and interviewed agency officials responsible for managing
           and implementing the program, including officials from the DHS
           Science and Technology Directorate and the Lawrence Livermore and
           Pacific Northwest National Laboratories. We also viewed a
           demonstration of the tool's semantic graphing capability. In
           addition, we interviewed officials at DHS components to identify
           their current or planned uses of ADVISE, the progress of their
           implementations, and the benefits they hope to gain from using the
           tool. These components included Immigrations and Customs
           Enforcement and other components. We also interviewed officials
           from the Interagency Center of Applied Homeland Security
           Technology (ICAHST), who are responsible for conducting testing of
           the tool's capabilities. We also visited ICAHST at the John
           Hopkins Applied Physics Laboratory in Laurel, Maryland, to view a
           demonstration of its testing activities. We did not conduct work
           or review implementations of ADVISE at the DHS Office of
           Intelligence and Analysis.

           To address our second objective, we identified potential privacy
           concerns that could arise from using the ADVISE tool by reviewing
           relevant reports, including prior GAO reports and the DHS Privacy
           Office 2006 report on data mining. We identified and analyzed DHS
           actions to comply with the Privacy Act of 1974 and the
           E-Government Act of 2002. We interviewed technical experts within
           the DHS Science and Technology Directorate and personnel
           responsible for implementing ADVISE at DHS components to assess
           privacy controls included in the ADVISE tool. We also interviewed
           officials from the DHS Privacy Office. We performed our work from
           June 2006 to December 2006 in the Washington, D.C., metropolitan
           area. Our work was performed in accordance with generally accepted
           government auditing standards.
			  
			  Appendix II: Comments from the Department of Homeland Security
			  
			  Appendix III: GAO Contact and Staff Acknowledgments
			  
			  GAO Contact

           Linda D. Koontz, (202) 512-6240 or [email protected]
			  
			  Staff Acknowledgments

           In addition to the individual named above, John de Ferrari,
           Assistant Director; Idris Adjerid; Nabajyoti Barkakati; Barbara
           Collier; David Plocher; and Jamie Pressman made key contributions
           to this report.
			  
			  GAO�s Mission

           The Government Accountability Office, the audit, evaluation and
           investigative arm of Congress, exists to support Congress in
           meeting its constitutional responsibilities and to help improve
           the performance and accountability of the federal government for
           the American people. GAO examines the use of public funds;
           evaluates federal programs and policies; and provides analyses,
           recommendations, and other assistance to help Congress make
           informed oversight, policy, and funding decisions. GAO's
           commitment to good government is reflected in its core values of
           accountability, integrity, and reliability.
			  
			  Obtaining Copies of GAO Reports and Testimony

           The fastest and easiest way to obtain copies of GAO documents at
           no cost is through GAO's Web site ( www.gao.gov ). Each
           weekday, GAO posts newly released reports, testimony, and
           correspondence on its Web site. To have GAO e-mail you a list of
           newly posted products every afternoon, go to www.gao.gov and
           select "Subscribe to Updates."
			  
			    Order by Mail or Phone

           The first copy of each printed report is free. Additional copies
           are $2 each. A check or money order should be made out to the
           Superintendent of Documents. GAO also accepts VISA and Mastercard.
           Orders for 100 or more copies mailed to a single address are
           discounted 25 percent. Orders should be sent to:

           U.S. Government Accountability Office 441 G Street NW, Room LM
           Washington, D.C. 20548

           To order by Phone: Voice: (202) 512-6000 TDD: (202) 512-2537 Fax:
           (202) 512-6061
			  
			  To Report Fraud, Waste, and Abuse in Federal Programs

           Contact:

           Web site: [33]www.gao.gov/fraudnet/fraudnet.htm E-mail:
           [email protected] Automated answering system: (800) 424-5454 or
           (202) 512-7470
			  
			  Congressional Relations

           Gloria Jarmon, Managing Director, ][email protected] (202)
           512-4400 U.S. Government Accountability Office, 441 G Street NW,
           Room 7125 Washington, D.C. 20548
			  
			  Public Affairs

           Paul Anderson, Managing Director, [email protected] (202)
           512-4800 U.S. Government Accountability Office, 441 G Street NW,
           Room 7149 Washington, D.C. 20548

(310765)

www.gao.gov/cgi-bin/getrpt?GAO-07-293 .

To view the full product, including the scope
and methodology, click on the link above.

For more information, contact Linda Koontz at (202) 512-6240 or
[email protected].

Highlights of [38]GAO-07-293 , a report to the Chairman, Committee on
Appropriations, House of Representatives

February 2007

DATA MINING

Early Attention to Privacy in Developing a Key DHS Program Could Reduce
Risks

The government's interest in using technology to detect terrorism and
other threats has led to increased use of data mining. A technique for
extracting useful information from large volumes of data, data mining
offers potential benefits but also raises privacy concerns when the data
include personal information.

GAO was asked to review the development by the Department of Homeland
Security (DHS) of a data mining tool known as ADVISE (Analysis,
Dissemination, Visualization, Insight, and Semantic Enhancement).
Specifically, GAO was asked to determine (1) the tool's planned
capabilities, uses, and associated benefits and (2) whether potential
privacy issues could arise from using it to process personal information
and how DHS has addressed any such issues. GAO reviewed program
documentation and discussed these issues with DHS officials.

[39]What GAO Recommends

To ensure that privacy protections are in place, GAO is recommending that
the Secretary of Homeland Security immediately conduct a privacy impact
assessment of the ADVISE tool and implement privacy controls, as needed,
to mitigate any identified risks.

DHS generally agreed with the content of this report and described actions
initiated to address GAO's recommendations.

ADVISE is a data mining tool under development intended to help DHS
analyze large amounts of information. It is designed to allow an analyst
to search for patterns in data--such as relationships among people,
organizations, and events--and to produce visual representations of these
patterns, referred to as semantic graphs (see fig.). None of the three
planned DHS implementations of ADVISE that GAO reviewed are fully
operational. (GAO did not review uses of the tool by the DHS Office of
Intelligence and Analysis.) The intended benefit of the ADVISE tool is to
help detect threatening activities by facilitating the analysis of large
amounts of data. DHS is currently in the process of testing the tool's
effectiveness.

Use of the ADVISE tool raises a number of privacy concerns. DHS has added
security controls to the tool; however, it has not assessed privacy risks.
Privacy risks that could apply to ADVISE include the potential for
erroneous association of individuals with crime or terrorism and the
misidentification of individuals with similar names. A privacy impact
assessment would identify specific privacy risks and help officials
determine what controls are needed to mitigate those risks. ADVISE has not
undergone such an assessment because DHS officials believe it is not
needed given that the tool itself does not contain personal data. However,
the tool's intended uses include applications involving personal data, and
the E-Government Act and related guidance emphasize the need to assess
privacy risks early in systems development. Further, if an assessment were
conducted and privacy risks identified, a number of controls could be
built into the tool to mitigate those risks. For example, controls could
be implemented to ensure that personal information is used only for a
specified purpose or compatible purposes, and they could provide the
capability to distinguish among individuals that have similar names to
address the risk of misidentification. Because privacy has not been
assessed and mitigating controls have not been implemented, DHS faces the
risk that ADVISE-based system implementations containing personal
information may require costly and potentially duplicative retrofitting at
a later date to add the needed controls.

Generic Semantic Graph

References

Visible links
  26. http://www.gao.gov/cgi-bin/getrpt?GAO-04-548
  27. http://www.gao.gov/cgi-bin/getrpt?GAO-06-403T
  28. http://www.gao.gov/cgi-bin/getrpt?GAO-05-866
  38. http://www.gao.gov/cgi-bin/getrpt?GAO-07-293
*** End of document. ***