Estimating the Undocumented Population: A "Grouped Answers"	 
Approach to Surveying Foreign-Born Respondents (29-SEP-06,	 
GAO-06-775).							 
                                                                 
As greater numbers of foreign-born persons enter, live, and work 
in the United States, policymakers need more			 
information--particularly on the undocumented population, its	 
size, characteristics, costs, and contributions. This report	 
reviews the ongoing development of a potential method for	 
obtaining such information: the "grouped answers" approach. In	 
1998, GAO devised the approach and recommended further study. In 
response, the Census Bureau tested respondent acceptance and	 
recently reported results. GAO answers four questions. (1) Is the
grouped answers approach acceptable for use in a national survey 
of the foreign-born? (2) What further research may be needed? (3)
How large a survey is needed? (4) Are any ongoing surveys	 
appropriate for inserting a grouped answers question series (to  
avoid the cost of a new survey)? For this study, GAO consulted an
independent statistician and other experts, performed test	 
calculations, obtained documents, and interviewed officials and  
staff at federal agencies. The Census Bureau and DHS agreed with 
the main findings of this report. DHS agreed that the National	 
Survey of Drug Use and Health is not an appropriate survey for	 
inserting a grouped answers question series.			 
-------------------------Indexing Terms------------------------- 
REPORTNUM:   GAO-06-775 					        
    ACCNO:   A61607						        
  TITLE:     Estimating the Undocumented Population: A "Grouped       
Answers" Approach to Surveying Foreign-Born Respondents 	 
     DATE:   09/29/2006 
  SUBJECT:   Data collection					 
	     Illegal aliens					 
	     Immigration					 
	     Monitoring 					 
	     Population statistics				 
	     Statistical data					 
	     Statistical methods				 
	     Surveys						 
	     Testing						 

******************************************************************
** This file contains an ASCII representation of the text of a  **
** GAO Product.                                                 **
**                                                              **
** No attempt has been made to display graphic images, although **
** figure captions are reproduced.  Tables are included, but    **
** may not resemble those in the printed version.               **
**                                                              **
** Please see the PDF (Portable Document Format) file, when     **
** available, for a complete electronic file of the printed     **
** document's contents.                                         **
**                                                              **
******************************************************************
GAO-06-775

     

     * Results in Brief
     * Background
          * Grouped Answers Reduce "Question Threat" and Allow Indirect
          * Characteristics, Costs, and Contributions Can Potentially Be
          * Statistical Information Is Needed on the Undocumented Popula
          * Surveys Are a Key Information Source
          * The Grouped Answers Approach Has Been Tested in Surveys Fiel
     * Experts Seem to Accept "Grouped Answers" Questions If Fielde
          * Keys to Acceptance Are Fielding by a Private Sector Organiza
          * Advocates and Experts Suggest Ways to Maximize Respondent Co
          * GSS Data and Independent Statistical Consultant Review Show
     * Various Tests Are or May Be Needed
          * Questions for Further Research Were Suggested by the GSS Tes
          * Studies Should Test Whether Respondents Pick the Correct Box
     * Some 6,000 Foreign-Born Respondents Are Needed for "Reasonab
     * The Most Efficient Field Strategy Does Not Seem Feasible
          * Four Ongoing Large-Scale Data Collections Sometimes Accept A
          * No Ongoing Large-Scale Data Collection Met Our Criteria
     * Observations
          * Testing So Far Affirms That the Grouped Answers Approach Is
          * Two New Questions about "Next Steps"
     * Agency Comments
     * Key Characteristics Can Be Estimated
     * Some Program Costs Can Be Estimated
     * Contributions Might Be Estimated
     * Logically, Estimates Can Be Made of Undocumented Children
     * Other Estimates May Be Possible
     * GAO Contact
     * Staff Acknowledgments
     * GAO's Mission
     * Obtaining Copies of GAO Reports and Testimony
          * Order by Mail or Phone
     * To Report Fraud, Waste, and Abuse in Federal Programs
     * Congressional Relations
     * Public Affairs

Report to the Subcommittee on Terrorism, Technology and Homeland Security,
Committee on the Judiciary, U.S. Senate

United States Government Accountability Office

GAO

September 2006

ESTIMATING THE UNDOCUMENTED POPULATION

A "Grouped Answers" Approach to Surveying Foreign-Born Respondents

GAO-06-775

Contents

Letter 1

Results in Brief 5
Background 7
Experts Seem to Accept "Grouped Answers" Questions If Fielded by a Private
Sector Organization 27
Various Tests Are or May Be Needed 34
Some 6,000 Foreign-Born Respondents Are Needed for "Reasonably Precise"
Estimates of the Undocumented 40
The Most Efficient Field Strategy Does Not Seem Feasible 45
Observations 53
Agency Comments 56
Appendix I Scope and Methodology 60
Appendix II Estimating Characteristics, Costs, and Contributions of the
Undocumented Population 64
Appendix III A Review of Census Bureau and GAO Reports on the Field Test
of the Grouped Answer Method 68
Appendix IV A Brief Examination of Responses Observed while Testing an
Indirect Method for Obtaining Sensitive Information 73
Appendix V The Issue of Informed Consent 82
Appendix VI A Note on Variances and "Mirror Image" Estimates 84
Appendix VII Comments from the Department of Commerce 86
Appendix VIII Comments from the Department of Homeland Security 89
Appendix IX Comments from the Department of Health and Human Services 90
Appendix X GAO Contact and Staff Acknowledgments 92
Bibliography 93

Tables

Table 1: Approximate Number of Foreign-Born Respondents Needed to Estimate
Percentage Undocumented within 2, 3, or 4 Percentage Points at 90 Percent
Confidence Level, Using Two-Card Grouped Answers Data 43
Table 2: Approximate Number of Foreign-Born Respondents Needed to Estimate
Percentage Undocumented, within 2, 3, or 4 Percentage Points, at 95
Percent Confidence Level, Using Two-Card Grouped Answers Data 43
Table 3: Survey Appropriateness: Whether Surveys Meet Criteria Based on
the Grouped Answers Design 50
Table 4: Survey Appropriateness: Whether Surveys Meet Table 3 (Design
Based) Criteria and Additional Criteria Based on Immigrant Advocates'
Views 52
Table 5: Experts GAO Consulted on Immigration Issues or Immigration
Studies 60

Figures

Figure 1: Immigration Status Card 1, Grouped Answers 8
Figure 2: Immigration Status Card 2 11
Figure 3: Cards 1 and 2 Compared 13
Figure 4: SIPP Flash Card 21
Figure 5: Training Card 1 23
Figure 6: Training Card 2 24
Figure 7: Immigration Status Card Tested in GSS 25

Abbreviations

ACS American Community Survey BLS Bureau of Labor Statistics CASI Computer
Assisted Self Interview

CPS Current Population Survey DHS Department of Homeland Security GSS
General Social Survey HHS Department of Health and Human Services INS
Immigration and Naturalization Service NAWS National Agricultural Workers
Survey NCHS National Center for Health Statistics

NHIS National Health Interview Survey NORC National Opinion Research
Center NRC National Research Council NSDUH National Survey on Drug Use and
Health NSF National Science Foundation OMB Office of Management and Budget
SAMHSA Substance Abuse and Mental Health Services Administration SIPP
Survey of Income and Program Participation

This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed in
its entirety without further permission from GAO. However, because this
work may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this material
separately.

United States Government Accountability Office

Washington, DC 20548

September 29, 2006

The Honorable Jon Kyl Chairman The Honorable Dianne Feinstein Ranking
Minority Member Subcommittee on Terrorism, Technology and Homeland
Security Committee on the Judiciary United States Senate

As greater numbers of foreign-born persons enter, live, and work in the
United States, policymakers and the general public increasingly place high
priority on issues involving immigrants. Because separate policies, laws,
and programs apply to different immigration statuses, valid and reliable
information is needed for populations defined by immigration status.
However, government statistics generally do not include such information.

The information most difficult to obtain concerns the size,
characteristics, costs, and contributions of the population referred to in
this report as undocumented or currently undocumented.1 Such information
is needed because, for example, large numbers of undocumented persons
arrive each year, and the Census Bureau has realized that information on
the size of the undocumented population would help estimate the size of
the total U.S. population, especially for years between decennial
censuses.2 More generally, information about the undocumented
population-and about changes in that population-can contribute to
policy-related planning and evaluation efforts.

1Our previous reports and those of other government agencies have
sometimes used the terms undocumented, illegal aliens, illegal immigrants,
unauthorized immigrants, and not legally present. We use undocumented
here, because this report concerns a technique for surveying the
foreign-born and an ongoing federally funded survey uses this term as a
response category when asking about legal status. We define undocumented
as foreign-born persons who are illegally present in the United States.
Foreign-born persons (that is, persons not born as U.S. citizens) were
born outside the United States to parents who were both not U.S. citizens
at the time of the birth.

2Most recently, the Census Bureau has stated that among its "enhancement
priorities" to "improve estimates of net international migration" are
efforts to research ways of estimating "international migrants by migrant
status (legal migrants, temporary migrants, quasi-legal migrants,
unauthorized migrants, and emigrants)" with the overall purpose of
producing annual estimates of the U.S. population. ("The U.S. Census
Bureau's Intercensal Population Estimates and Projections Program: Basic
Underlying Principles," paper distributed by the Census Bureau at its
conference on Population Estimates: Meeting User Needs, Alexandria,
Virginia, July 19, 2006.)

As you know, in 1998, we devised an approach to surveying foreign-born
respondents about their immigration status.3 This self-report,
personal-interview approach groups answers so that no respondent is ever
asked whether he, she, or anyone else is undocumented. In fact, no
individual respondent is ever categorized as undocumented. Logically,
however, grouped answers data can provide indirect estimates of the
undocumented population. Generally, grouped answers questions on
immigration status would be asked as part of a larger survey that includes
direct questions on demographic characteristics and employment and might
include questions on school attendance, use of medical facilities, and so
forth; some surveys also ask specific questions that can help estimate
taxes paid. Potentially, combining the answers to such questions with
grouped answers data can provide further information on the
characteristics, costs, and contributions of the undocumented population.

We reported the first results of preliminary tests of the grouped answers
approach, primarily with Hispanic farmworkers, in 1998 and 1999; the
majority of the preliminary test interviews were fielded by Aguirre
International of Burlingame, California.4 We also recommended that the
Immigration and Naturalization Service (INS) and the Census Bureau further
develop and test the method. In response, the Census Bureau contracted for
a test as part of the 2004 General Social Survey (GSS), which is fielded
by the National Opinion Research Center (NORC) at the University of
Chicago, with "core funding" provided by a grant from the National Science
Foundation (NSF).5 The Census Bureau's analysis of the 2004 GSS data
became available in 2006.

3GAO, Immigration Statistics: Information Gaps, Quality Issues Limit
Utility of Federal Data to Policymakers, GAO/GGD-98-164 (Washington, D.C.:
July 31, 1998), and Survey Methodology: An Innovative Technique for
Estimating Sensitive Survey Items, GAO/GGD-00-30 (Washington, D.C.:
November 1999).

4See GAO/GGD-98-164 and GAO/GGD-00-30 .

In this report, we respond to your request that we review the ongoing
development of the grouped answers approach and related issues. We address
four questions: (1) Is the grouped answers approach "acceptable" for use
in a national survey of the foreign-born population?6 (2) What kinds of
further research are or may be needed, based on the results of tests
conducted thus far and expert opinion? (3) How large a survey is needed to
provide "reasonably precise" estimates of the undocumented population,
using grouped answers data? (4) Are there appropriate ongoing surveys in
which the grouped answers question series might eventually be inserted
(thus avoiding the costs of fielding a new survey)?

To answer these questions, we

           o  consulted private sector experts in immigration issues and
           studies, including immigrant advocates, immigration researchers,
           and others;7

           o  consulted an independent statistical expert, Dr. Alan
           Zaslavsky, and other experts in statistics and surveys;8

           o  reanalyzed the data from the 2004 GSS test and subjected both
           our analysis and the Census Bureau's analysis to review by the
           independent statistical expert;

           o  performed test calculations, using specific assumptions; and

           o  identified ongoing surveys that might be candidates for
           piggybacking the grouped answers question series, gathered
           documents on those surveys, and met with officials and staff at
           the federal agencies that conduct or sponsor them.9

5The GSS is a long-standing series of nationally representative
personal-interview self-report surveys, each consisting of a "core"
question series and additional "modules." The funding for fielding the
core question series is provided by a grant from NSF. The modules are
question series added through grants from and contracts with a variety of
sources. The Census Bureau contracted for a grouped answers module in the
2004 GSS. The bulk of the funding for that Census-GSS contract had been
provided to the Census Bureau by the Department of Homeland Security
(DHS). This test of the grouped answers approach was in response to our
earlier recommendation in GAO/GGD-98-164 .

6The acceptability of the grouped answers approach for use in a national
survey is defined here primarily in terms of (1) the responses of
immigrant advocates when the grouped answers approach is explained to them
(that is, objecting versus not objecting to or accepting the method) and
(2) respondents' tendency to pick a box when the grouped answers
immigration status question is posed to them (rather than their refusing
or saying that they "don't know"). The opinions of other experts-for
example, those who have conducted studies of immigrants-are also relevant,
as are interviewer judgments about respondent reactions.

7In all, we consulted over 20 private sector immigration experts (listed
in appendix I, table 5). Because of the importance of immigrant advocates'
views on the issues in surveying immigrants, table 5 identifies the
experts representing immigrant advocate organizations. For purposes of
this report, we define immigrant advocate organizations as those whose
purpose includes representing the immigrants' point of view. More
generally, in reporting the views of the experts we consulted, we
recognize that in some cases other knowledgeable persons might have
differing views.

We also met with other relevant federal agencies.10 Appendix I describes
our methodology and the scope of our work in more detail. We conducted our
work in accordance with generally accepted government auditing standards
between July 2005 and September 2006.

8Alan Zaslavsky is Professor of Statistics, Department of Health Care
Policy, Harvard Medical School, Boston, Massachusetts. We selected Dr.
Zaslavsky because he (1) is independent with respect to the method we
discuss; (2) is a noted statistician who has received many awards, has
advised multiple executive agencies on the design and analysis of
large-scale surveys, and serves on the National Research Council's (NRC)
Committee for National Statistics at the National Academy of Sciences; and
(3) has developed innovative statistical approaches. We also sought the
advice of two other noted statisticians who had advised us in earlier work
on this method (Dr. Fritz Scheuren and Dr. Mary Grace Kovar of NORC at the
University of Chicago) and GAO colleagues with expertise in statistics.

9We talked with four agencies sponsoring or conducting these surveys: the
Census Bureau in the Department of Commerce, the Bureau of Labor
Statistics (BLS) in the Department of Labor, and the National Center for
Health Statistics (NCHS) and the Substance Abuse and Mental Health
Services Administration (SAMHSA) in the Department of Health and Human
Services (HHS). Survey-related staff at these agencies provided
information on the specific surveys. Additionally, we deemed some staff at
these agencies to be experts in statistics and survey research.

10These included the Statistical and Science Policy Branch of the Office
of Information and Regulatory Affairs in the Office of Management and
Budget (OMB), the Employment and Training Administration in the Department
of Labor (DOL), and the Office of Immigration Statistics within the Policy
Directorate and the Research and Evaluation Division, Office of Policy and
Strategy, U.S. Citizenship and Immigration Services in the Department of
Homeland Security (DHS).

                                Results in Brief

Acceptance of the grouped answers approach appears to be high among
immigrant advocates and respondents. The advocates we interviewed
generally accepted the approach-with provisos such as fielding by a
university or other private sector organization, appropriate data
protection (including protections against government misuse), and
high-quality survey procedures. The independent statistician, reviewing
the Census Bureau's analysis and our reanalysis of the 2004 GSS test of
respondent acceptance, concluded that the grouped answers approach is
"generally usable" for surveys interviewing foreign-born respondents in
their homes.11

Based on the results of the GSS test and on consultations and interviews
with varied experts, further work is or may be needed to

           o  Expand knowledge about respondent acceptance. For example, the
           2004 GSS test did not cover persons who are "linguistically
           isolated" in the sense that no member of their household age 14 or
           older speaks English "very well".12

           o  Test the accuracy of responses or respondents' intent to answer
           accurately.13 To date, no tests of response accuracy, or the
           intent to answer accurately, have been conducted, although a
           number of relevant designs can be identified.

Thousands of foreign-born respondents would be needed to obtain
"reasonably precise" grouped answers estimates of the undocumented
population.14 Our calculations and work with statisticians showed that
while many factors are involved and it is not possible to guarantee a
specific level of precision, roughly 6,000 interviews would be likely to
be sufficient to support estimates of the size of the undocumented
population and major subgroups within it (especially high-risk subgroups,
defined by characteristics such as age 18 to 40, recently arrived,
employed15). Quantitative estimates are also possible; for example, major
program costs associated with the undocumented population may also be
estimated, given appropriate program data.

11Our reanalysis differed from the Census Bureau's in that we eliminated
19 GSS cases that we deemed ineligible because, for example, interviewing
took place over the telephone rather than in person, as required by the
grouped answers approach; we found that 6 respondents of more than 200
failed to provide usable, specific answers.

12The GSS allowed bilingual household members to help respondents with
limited English skills. Our earlier testing with farmworkers was conducted
in Spanish, but no testing has covered linguistically isolated
non-Hispanic respondents. About 4 percent of the foreign-born population
both (1) does not speak Spanish and (2) is linguistically isolated (that
is, is part of a household in which no member age 14 or older speaks
English "very well"). Although this may seem a small percentage, it is
possible that non-Hispanic undocumented persons are concentrated in this
group.

13The distinction between accurate responses and the intent to answer
accurately is necessary because some respondents may mistakenly think that
they are, for example, in a legal status.

None of the ongoing, large-scale national surveys we identified appear to
be appropriate for piggybacking the grouped answers question series. One
self-report personal interview survey is fielded by a private sector
organization (under a contract with a Department of Health and Human
Services (HHS) agency); however, that survey focuses on the use of illegal
drugs, and we believe that direct questions on drug use might heighten the
sensitivity of the questions on immigration status. We believe other
ongoing surveys to be inappropriate; for example, one asks other sensitive
questions (on HIV status) and takes respondents' names and Social Security
numbers. Additionally, the Census Bureau fields these surveys.

Whether further research or a new survey would be justified depends on
issues such as how policymakers weigh the need for such information
against potential costs.

We received comments on a draft of this report from the Department of
Commerce (Census Bureau), the Department of Homeland Security (DHS), and
the Department of Health and Human Services (DHHS). The Census Bureau and
DHS generally agreed with the main findings of the report, and DHHS agreed
that the National Survey of Drug Use and Health would not be appropriate
for "piggy-backing" the grouped answers question series. These agencies
also provided other technical comments (see appendices VII, VIII, and IX).

14We define "reasonably precise" as a 90 percent or 95 percent confidence
interval spanning plus or minus 2 to 4 percentage points. A 90 percent or
95 percent confidence interval is the interval within which the parameter
in question would be expected to fall 90 percent or 95 percent of the
time, if the sampling and interval estimation procedures were repeated in
an infinite number of trials.

15In many cases, the method would not be suitable for low-risk subgroups.
(High-risk and low-risk refer to subgroups with above-average and
below-average percentages of undocumented persons, respectively.)

                                   Background

Grouped Answers Reduce "Question Threat" and Allow Indirect Estimates of the
Undocumented

Survey questions about sensitive topics carry a "threat" for some
respondents, because they fear that a truthful answer could result in some
degree of negative consequence (at a minimum, social disapproval). The
grouped answers approach is designed to reduce this threat when asking
about immigration status.

Three key points about the grouped answers approach are that

           1. no respondent is ever asked whether he or she, or anyone else,
           is undocumented;
           2. two pieces of information are separately provided by two
           subsamples of respondents (completely different people-no one is
           shown both immigration status cards); and
           3. taking the two pieces of information together-like two
           different pieces of a puzzle-allows indirect estimation of the
           undocumented population, but no individual respondent (and no
           piece of data on an individual respondent) is ever categorized as
           undocumented.

We discuss each point in some detail.16

           1. No respondent is ever asked whether he or she is in the
           undocumented category. Unlike questions that ask respondents to
           choose among specific answer categories, the grouped answers
           approach combines answer categories in sets or "boxes," as shown
           in figure 1.

16The grouped answers approach derives from (1) the residual method
described by Henry S. Schryock and Jacob S. Siegel and Associates, The
Methods and Materials of Demography (Washington, D.C.: U.S. Government
Printing Office, 1980), and Robert Warren and Jeffrey S. Passel, "A Count
of the Uncountable: Estimates of Undocumented Aliens Counted in the 1980
Census," Demography, 24:3 (1987): 375-93, and (2) earlier indirect
survey-based techniques, such as "randomized response" (see Stanley
Warner, "A Survey Technique for Eliminating Evasive Answer Bias," Journal
of the American Statistical Association, 60 (1965): 63-69, and Bernard
Greenberg and others, "The Unrelated Questions Randomized Response Model:
Theoretical Framework," Journal of the American Statistical Association,
64 (1969): 520-39.

Figure 1: Immigration Status Card 1, Grouped Answers

Box B includes the sensitive answer category--currently
"undocumented"-along with other categories that are nonsensitive.17

Each respondent is asked to "pick the Box"-Box A, Box B, or Box C-that
contains the specific answer category that applies to him or her.
Respondents are told, in effect: If the specific category that applies to
you is in Box B, we don't want to know which one it is, because right now
we are focusing on Box A categories.18

By using the boxes, the interview avoids "zeroing in" on the sensitive
answer. The specific categories shown in the boxes in figure 1 are grouped
so that

           o  one would expect many respondents who are here legally, as well
           as those who are undocumented, to choose Box B,19 and
           o  there is virtually no possibility of anyone deducing which
           specific category within Box B applies to any individual
           respondent.

2. Two pieces of information are provided separately by two subsamples of
respondents (no one is shown both immigration status cards). Respondents
are divided into two subsamples, based on randomization procedures or
rotation (alternation) procedures  conducted outside the interview
process. (For example, a rotation procedure might specify that within an
interviewing area, every other household will be designated as subsample 1
or subsample 2.)

17Note that Box B in figure 1 uses the term currently "undocumented"-with
quotation marks around undocumented. We believe this wording may help
communicate with undocumented respondents who either (1) had a legal
status in the past (for example, entered with a temporary visa but have
now overstayed and thus lost their legal status) or (2) are likely to
acquire a legal status in the near future (for example, entered illegally
and applied for legal status but have not yet received it). Potentially,
the quotation marks might help communicate with respondents who have some
kind of document (for example, a "matricula card" issued by the Mexican
government) but who do not have a valid legal immigration status that
allows U.S. residence.

18In the test with Hispanic farmworkers, interviewers explained: "Because
we're using the boxes-we WON'T `zero in' on anything somebody might not
want to tell us."

19In future, changes in percentages of foreign-born in various statuses
might warrant changes in groupings across the boxes. Additionally, the
specific legal statuses defined by law might change, requiring a change in
the legal statuses shown on the cards.

This "split sample" procedure has been used routinely for many surveys
over the years. As applied to the grouped answers approach, the two
subsamples are shown alternative flash cards. Immigration Status Card 1,
described above, represents one way to group immigration statuses in three
boxes. A second immigration status flash card (Immigration Status Card 2,
shown in figure 2) groups the same statuses differently.

Figure 2: Immigration Status Card 2

The alternative immigration-status cards can be thought of as "mirror
images" in that

           o  the two nonsensitive legal statuses in Box A of Card 1 appear
           in Box B of Card 2 and

           o  the two nonsensitive legal statuses in Box B of Card 1 appear
           in Box A of Card 2.

However, the undocumented status always appears in Box B.

Interviewers ask survey respondents in subsample 1 about immigration
status with respect to Card 1. They ask survey respondents in subsample 2
(completely different persons) about immigration status with respect to
Card 2. Each respondent is shown one and only one immigration-status flash
card. There are no highly unusual or complicated interviewing
procedures.20

Because the two subsamples of respondents are drawn randomly or by
rotation, each subsample represents the foreign-born population and, if
sufficiently large, can provide "reasonably precise" estimates of the
percentages of the foreign-born population in the boxes on one of the
alternative cards.

Incidentally, a respondent picking a box that does not include the
sensitive answer-for example, a respondent picking Box A or Box C in
figure 1-can be asked follow-up questions that pinpoint the specific
answer category that applies to him or her. Thus, direct information is
obtained on all legal immigration statuses. The data on some of the legal
categories can be compared to administrative data to check the
reasonableness of responses. Additionally, these data provide estimates of
legal statuses, which are useful when, for example, policymakers review
legislation on the numbers of foreign-born persons who may be admitted to
this country under specific legal status programs.

3. No individual respondent is ever categorized as undocumented, but
indirect estimates of the undocumented population can be made. Using two
slightly different pieces of information provided by the two different
subsamples allows indirect estimation of the size of the currently
undocumented population-by simple subtraction.

The only difference between Box B of Card 1 and Box A of Card 2 is the
inclusion of the currently "undocumented" category in Box B of Card 1.
Figure 3 shows both cards together for easy comparison.

20Unlike some other indirect estimation techniques, the grouped answers
approach does not require unusual stratagems as part of the survey
interview, such as asking respondents to make a secret random selection of
a question.

Figure 3: Cards 1 and 2 Compared

Thus, the percentage of the foreign-born population who are currently
undocumented can be estimated as follows:

           o  Start with the percentage of subsample 1 respondents who report
           that they are in Box B of Card 1 (hypothetical figure: 62 percent
           of subsample 1).
           o  Subtract from this the percentage of subsample 2 who say they
           are in Box A on Card 2 (hypothetical figure: 33 percent of
           subsample 2).

           o  Observe the difference (29 percent, based on the hypothetical
           figures); this represents an estimate of the percentage of the
           foreign-born population who are undocumented.

Alternatively, a "mirror-image" estimate could be calculated, using Box B
of Card 2 and Box A of Card 1.21

To estimate the numerical size of the undocumented population, a grouped
answers estimate of the percentage of the foreign-born who are
undocumented would be combined with a census figure. For example, the

2000 census counted 31 million foreign-born, and the Census Bureau issued
an updated estimate of 35.7 million for 2005. The procedure would be to
simply multiply the percent undocumented (based on the grouped answers
data and the subtraction procedure) by a census count or an updated
estimate for the year in question.

These procedures ensure that no respondents-and no data on any specific
respondent-are ever separated out or categorized as undocumented, not even
during the analytic process of making indirect, group-level estimates.

To further ensure reduction of "question threat," the grouped answers
question series begins with flash cards that ask about nonsensitive topics
and familiarize respondents with the 3-box approach. For each
nonsensitive-topic card, interviewers ask the respondent which box applies
to him or her, saying: If it's Box B, we do not want to know which
specific category applies to you.

In this way, most respondents should understand the grouped answers
approach before seeing the immigration-status card.

21The result of the subtraction would be the same, either way-assuming
that the same percentage of subsample 1 and subsample 2 chose Box C.

To help ensure accurate responses, respondents who choose Box A can be
asked a series of clarifying questions.22 (No follow-up questions are
addressed to anyone choosing Box B.) The questions for Box A respondents
are designed to prompt them to, essentially, reclassify themselves in Box
B, if that is appropriate.23

The grouped answers question series can potentially be applied in a
large-scale general population survey, where the questions on immigration
status would be added for the foreign-born respondents-provided that an
appropriate survey can be identified. If a new survey of the general
foreign-born population were planned, it would involve selecting a general
sample of households and then screening out the households that do not
include one or more foreign-born persons.

Finally, we note that while the initial version of the grouped answers
approach involved three alternative flash cards (and was termed the
"three-card method"), we recently devised the version described here,
which uses two cards rather than three. The two-card method is simpler, is
easier to understand, and provides more precise estimates. All cards are
alike in that they feature three boxes in which specific answer categories
are grouped.

Characteristics, Costs, and Contributions Can Potentially Be Estimated

Generally, grouped answers questions on immigration status would be asked
as part of a larger survey that includes direct questions on demographic
characteristics and employment and might include questions on school
attendance, use of medical facilities, and so forth; some surveys also ask
specific questions that can help estimate taxes paid. Potentially,
combining the answers to such questions with grouped answers data can be
used to provide further information on the characteristics, costs, and
contributions of the undocumented population.

22For example, in the test with Hispanic farmworkers, respondents who
picked Box A and said they were legal permanent residents (they had a
green card) were asked (1) under which program they had applied for a
green card (Family Unity, employer, and so forth), (2) whether they had
received the card (or had applied but not yet received it), (3) how they
received it (in person or by mail), and (4) whether they had then applied
for U.S. citizenship-and if so, whether they had received citizenship.

23If a respondent decides to reclassify himself or herself in Box B, on
the basis of follow-up questions, survey procedures can record only the
Box B classification-and delete the original Box A classification, as well
as any answers to Box A follow-up questions. This prevents retention of
any detailed immigration-status material on respondents in Box B.

For example, the numbers of undocumented persons in major subgroups -such
as demographic or employment status subgroups-can be estimated, provided
that the sample of foreign-born persons interviewed is sufficiently large.

Grouped answers data collected from adult respondents can also be used to
estimate the number of children in various immigration statuses, including
undocumented-provided that an additional question is asked.24
Additionally, when combined with separate quantitative data (for example,
data on program costs per individual), grouped answers data can be used to
estimate quantitative information (such as program costs) for the
undocumented population as a whole-or, again, depending on sample size,
for specific subgroups.

The procedures for deriving these more complex indirect estimates are
described in appendix II. No grouped answers respondent is ever
categorized as undocumented.

Statistical Information Is Needed on the Undocumented Population

The foreign-born population of the United States is large and growing- as
is the undocumented population within it. Congressional policymakers, the
U.S. Commission on Immigration Reform, and the National Research Council's
(NRC) Committee on National Statistics have indicated a need for
statistical information on the undocumented population, including its
size, characteristics, costs, and contributions.

The Census Bureau estimates that as of 2005, foreign-born residents (both
legally present and undocumented) numbered 35.7 million and accounted for
at least one-tenth of all persons residing in each of 15 states and the
District of Columbia.25 These figures represent substantial increases over
the prior 15 years. For example, in 1990 the foreign-born population
totaled fewer than 20 million; only 3 states had a population more than
one-tenth foreign-born. One result is that as the Department of Labor has
testified, foreign-born workers now constitute almost 15 percent of the
U.S. labor force, and the numbers of such workers are growing.26

24The additional question would ask for the number of foreign-born
children in the household who are in each box of the same immigration
status card that the adult respondent used to report which box he or she
is in. However, this questioning approach has not been tested.

25The 15 states and their percentages of foreign-born residents in 2005
were Arizona, 14.5; California, 27.2; Colorado, 10.1; Connecticut, 12.5;
Florida, 18.5; Hawaii, 17.2; Illinois, 13.6; Maryland, 11.7;
Massachusetts, 14.4; Nevada, 17.4; New Jersey, 19.5; New York, 21.4; Rhode
Island, 12.6; Texas, 15.9; Washington, 12.2. The percentage in the
District of Columbia was 13.1.

A new paper from the Department of Homeland Security (DHS) puts the
"unauthorized" immigrant population at 10.5 million as of January 2005 and
indicates that if recent trends continued, the figure for January 2006
would be 11 million.27 The Pew Hispanic Center's indirect estimate of the
undocumented population as of 2006 is 11.5 million to 12 million. These
estimates represent roughly one-third of the entire foreign-born
population.28 DHS has variously estimated the size of the undocumented
population as of January 2000 as 7 million and 8.5 million  .29 Government
and other estimates for 1990 numbered only 3.5 million.30

These various indirect estimates of the undocumented population are based
on the "residual method." Residual estimation (1) starts with a census
count or survey estimate of the number of foreign-born residents who have
not become U.S. citizens and (2) subtracts out estimated numbers of
legally present individuals in various categories, based on administrative
data and assumptions (because censuses and surveys do not ask about legal
status). The remainder, or residual, represents an indirect estimate of
the size of the undocumented population.

To illustrate the role of administrative data and assumptions, residual
estimates draw on counts of the number of new green cards issued each
year. But they also require assumptions to account for emigration and
deaths among those who received green cards in earlier years.

26Statement of Ronald Bird, Chief Economist, Office of the Assistant
Secretary for Policy, U.S. Department of Labor, before the Committee on
the Judiciary, U.S. Senate, July 5, 2006.

27Michael Hoefer, Nancy Rytina, and Christopher Campbell, Estimates of the
Unauthorized Immigrant Population Residing in the United States: January
2005 (Washington, D.C.: Department of Homeland Security, Office of
Immigration Statistics, August 2006).

28Jeffrey S. Passel, "The Size and Characteristics of the Unauthorized
Migrant Population in the U.S.: Estimates Based on the March 2005 Current
Population Survey," Research Report (Washington, D.C.: Pew Hispanic
Center, Mar. 7, 2006).

29The first figure is from U.S. Immigration and Naturalization Service,
Office of Policy and Planning, Estimates of the Unauthorized Immigrant
Population Residing in the United States: 1990 to 2000 (Washington, D.C.:
January 2003); the second is from Hoefer, Rytina, and Campbell.

30While different estimates are based on different definitions of
undocumented, and there are questions about data reliability, it seems
clear that the population of undocumented foreign-born persons is large
and has increased rapidly.

A recent DHS paper providing residual estimates of the undocumented
population includes ranges of estimates based on alternative assumptions
made for two key components.31 For example, "by lowering or raising the
emigration rates 20 percent . . . the estimated unauthorized immigrant
population would range from 10.0 million to 11.0 million."32 The DHS paper
also lists assumptions that were not subjected to alternative
specifications. We believe the DHS paper represents an advance because, up
to now, analysts producing residual estimates have generally not made
public statements regarding the precision of the estimates. (Some critics
have, however, indicated that residual estimates are likely to lack
precision.33)

While the residual approach has been used to profile the undocumented
population on two characteristics-age and country of birth-it is limited
with respect to estimating (1) current geographic location and (2) current
employment and benefit use. The reason is that current characteristics of
legally present persons are not maintained in administrative records;
analysts must therefore rely largely on assumptions  .34 In contrast, the
grouped answers method does allow for the possibility of estimating
current characteristics based on current self-reports.

During the mid-1990s, the U.S. Commission on Immigration Reform determined
that better statistical "information on legal status and type of immigrant
[is] crucial" to assessing immigration policy. Indeed, the Commission
called for a variety of improvements in estimates of the costs and
benefits associated with undocumented immigration.35 NRC's Committee on
National Statistics further emphasized the need for better information on
costs, especially state and local costs.36 (If successfully fielded, the
grouped answers method might help provide general information on such
costs-and, potentially, specific information for large states such as
California. Sample size limitations would be likely to prohibit separate
analyses for specific local areas, small states, and states with low
percentages of foreign-born or undocumented.)

31The alternative assumptions were made for levels of (1) American
Community Survey (ACS) undercounting of "unauthorized" immigrants and (2)
emigration from the United States on the part of legal immigrants counted
as having been "admitted" between 1980 and 2004.

32Hoefer, Rytina, and Campbell, p. 6.

33See Kenneth Hill, "Estimates of Legal and Unauthorized Foreign-Born
Population for the United States and Selected States Based on Census
2000," presentation at the U.S. Census Bureau Conference, Immigration
Statistics: Methodology and Data Quality, Alexandria, Virginia, February
13-14, 2006. A similar point was made by Jacob S. Siegel and David A.
Swanson, The Methods and Materials of Demography, 2nd ed. (San Diego,
Calif.: Elsevier Academic Press, 2004), p. 479.

34Administrative records on where legal immigrants live are based on their
residence (or intended residence) at the time when legal permanent
resident status was attained; these records have not been subsequently
updated. There are no administrative records on current activities of
legal permanent residents, such as employment.

Over the years, we have received numerous congressional requests related
to estimating costs associated with the undocumented population.37 Recent
Census Bureau research and conferences reflect the realization that
undocumented immigration is a key component of current population growth
and that there is a resultant need for information on this group.38
Additionally, some of the immigrant advocates we interviewed expressed an
interest in being able to better describe the contributions of the
undocumented population.

Surveys Are a Key Information Source

Various national surveys ask foreign-born respondents to provide
information about themselves and, in some cases, other persons in their
households. While such surveys provide a wealth of information on a wide
variety of areas, including some sensitive topics, national surveys
generally do not ask about current immigration status-with the exception
of a question on U.S. citizenship, which is included in several surveys.

35See U.S. Commission on Immigration Reform, U.S. Immigration Policy:
Restoring Credibility: 1994 Report to Congress (Washington, D.C.: U.S.
Government Printing Office, 1994), pp. 179-86.

36NRC, Committee on National Statistics, Local Fiscal Effects of Illegal
Immigration: Report of a Workshop (Washington, D.C.: National Academy
Press, 1996), p. 1-2.

37See, for example, GAO, Illegal Alien Schoolchildren: Issues in
Estimating State-by-State Costs, GAO-04-733 (Washington, D.C.: June 23,
2004), and Undocumented Aliens: Questions Persist about Their Impact on
Hospitals' Uncompensated Care Costs, GAO-04-472 (Washington, D.C.: May 21,
2004). For a more general discussion, see GAO/GGD-98-164 , ch. 2,
"Policy-Related Information Needs."

38Census Bureau staff told us that this research includes J. Gregory
Robinson, "Memorandum for Donna Kostanich," DSSD A.C.E. Revision II
Memorandum Series No. PP-36, U.S. Bureau of the Census, Washington, D.C.,
December 31, 2002.

As we reported earlier, it is believed that direct questions on
immigration status "are very sensitive, and negative reactions to them
could affect the accuracy of responses to other questions on [a]
survey."39 Two surveys that have asked respondents directly about
immigration status for several years are

           o  the National Agricultural Workers Survey (NAWS), an ongoing
           annual cross-sectional self-report survey of farmworkers, fielded
           by Aguirre International, a private sector firm under contract to
           the Department of Labor, since 1988,40 and
           o  the Survey of Income and Program Participation (SIPP), a
           longitudinal panel survey of the general population, conducted by
           the Census Bureau, which has asked immigration status questions
           since 1996.

           Of the two, SIPP is the more relevant, because its immigration
           status questions have been administered to a sample of the general
           foreign-born population.

           SIPP has asked an adult respondent-informant from each household
           to provide information about himself or herself and about others
           in his or her household, including which immigration-status
           category applied to each person when he or she came to this
           country. Answers are facilitated by a flash card that lists major
           legal immigration statuses (see fig. 4).41 A further question asks
           whether each person obtained a green card after arriving in this
           country. The SIPP questions come close to asking about-but do not
           actually allow an estimate of-the number of foreign-born U.S.
           residents who are currently undocumented.42 According to the
           Census Bureau, SIPP is now scheduled to be "reengineered," but the
           full outlines of the revised effort have not been set.

           Figure 4: SIPP Flash Card

           The Grouped Answers Approach Has Been Tested in Surveys Fielded by
			  Private Sector Organizations
			  
			  In the middle to late 1990s, the grouped answers question series
           was subjected to preliminary development and testing with Hispanic
           respondents, including interviews with farmworkers conducted by
           Aguirre International, under contract to GAO.43 In these tests,
           every respondent picked a box.44 However, these interviews were
           not conducted under conditions of a typical large-scale survey in
           which interviewers initiate contact with respondents in their
           homes.45

           To further test respondents' acceptance of the grouped answers
           approach, the Census Bureau created a question module with 3-box
           flash cards and contracted for it to be added to the 2004 GSS.
           When presenting the survey to respondents, interviewers explained
           that NORC of the University of Chicago fielded the GSS survey,
           with "core funding" from an NSF grant.46 The Census Bureau's
           question module included cards from the three-card version of the
           grouped answers approach-which features only one immigration
           status category in Box A. The cards used were

                        o  the two training cards shown in figures 5 and 647
                        and

                        o  the immigration status card shown in figure 7.48

           Figure 5: Training Card 1

           Figure 6: Training Card 2

           Figure 7: Immigration Status Card Tested in GSS

           Training card 1 shows different types of houses arranged in three
           boxes. Respondents are asked to indicate the type of house they
           lived in when in their home country-by picking a box. They are
           told that if the answer is in Box B, we don't need to know which
           specific type applies to them, because right now we are focusing
           on Box A.

           Training card 2 shows different modes of transportation, again
           arranged in three boxes. Respondents are asked to indicate the
           mode of transportation they used the most recent time they
           traveled from their home country to the United States-again by
           picking a box. They are again told that if it's in Box B, we don't
           need to know which specific mode applies.

           Additionally, the GSS-Census Bureau module asked interviewers to
           (1) judge respondents' understanding of the 3-box format, (2)
           observe whether respondents objected or "kept silent for a while"
           when presented with the immigration status card, and (3) record
           any comments that respondents made about the cards. As the Census
           Bureau has noted, the module was a partial test because only one
           immigration status card was tested.

           Data and documentation from this field test became available in
           late 2005. A Census Bureau analysis of these data (completed in
           2006 and reproduced in full in appendix IV), indicates that of 237
           foreign-born respondents, 216 (roughly 90 percent) chose a box, 4
           gave other answers, and 17 refused or said "don't know." The
           Census Bureau took this "as an indication that most foreign-born
           who are asked about their migrant status in this format would
           understand the question, know the answer, and answer willingly."

           Further, the Census Bureau paper stated that

                        o  the "overwhelming majority of foreign-born
                        respondents" picked a box on the immigration status
                        card without-according to interviewers-any objection,
                        hesitation, or periods of silence;

                        o  while some interviewers did not give a judgment or
                        were confused about rating respondents'
                        understanding, about 80 percent of respondents were
                        coded as understanding and about 10 percent as not;49
                        and

                        o  some respondents' comments, written in by
                        interviewers, indicated that although the GSS is a
                        "personal interview" survey, telephone interviews had
                        been substituted, in some cases, and this meant that
                        respondents could not see the cards-making the use of
                        the 3-box format difficult.

           The Census Bureau's paper highlighted various limitations of the
           2004 GSS test, including (1) testing only one immigration status
           card, (2) underrepresenting Hispanics, and (3) in some instances
           interviewing over the telephone (instead of in person), so that
           respondents did not see the flash cards.50

           Experts Seem to Accept �Grouped Answers� Questions If Fielded by
			  a Private Sector Organization
			  
			  The acceptability of the grouped answers approach appears to be
           high, when implemented in surveys fielded by a university or
           private sector organization. Many immigration experts, including
           advocates, accepted the grouped answers approach, although some
           conditioned their acceptance on a quality implementation in a
           survey fielded by a university or other private sector
           organization. An independent statistical expert believed that the
           grouped answers approach would be generally usable with survey
           respondents.

           Keys to Acceptance Are Fielding by a Private Sector Organization,
			  Data Protections, and Quality Implementation
			  
			  Some of the researchers and advocates we contacted were extremely
           enthusiastic about the potential for new data. No one objected to
           statistical, policy-relevant information being developed on the
           size, characteristics, costs, and contributions of the
           undocumented population. Overall, the immigration experts we
           contacted (listed in appendix I, table 5) accepted the
           grouped-answers question approach-although advocates sometimes
           conditioned their acceptance on, for example, the questions being
           asked in a survey fielded by a university or private sector
           organization-with data protections built in. Many also offered
           suggestions for maximizing cooperation by foreign-born respondents
           or ideas about how advocacy organizations might help.

           Some advocates indicated that a key condition of their support
           would be that (1) the grouped answers question on immigration
           status be asked by a university or private sector organization and
           (2) identifiable data (that is, respondents' answers linked to
           personal identifiers) be maintained by that organization. Two
           advocate organizations specifically stated that they "could not
           endorse," or implied they would not support, the grouped answers
           approach, assuming the data were collected and maintained by, in
           one case, the Census Bureau and, in the other case, the
           government. Many other immigration experts and advocates preferred
           that grouped answers data on immigration status be collected by a
           university or other reputable private sector organization pledged
           to protect the data.

           The immigration advocates said that private sector fielding of a
           grouped answers survey and protection of such data from
           nonstatistical uses that might harm immigrants were key issues
           because

                        o  Some foreign-born persons are from countries with
                        repressive regimes and thus have more fear of (less
                        trust in) government than the typical U.S.-born
                        person.

                        o  Despite current law protecting individual data
                        from disclosure, some persons believe that
                        information collected by a government agency such as
                        the Census Bureau is routinely shared (or that in
                        some circumstances it might be shared) across
                        government agencies. Further, one advocate pointed
                        out that the Congress could change the current law,
                        eliminating that protection. (Although the grouped
                        answers approach does not identify anyone as
                        undocumented, it does provide some information
                        regarding each respondent's immigration status.)

                        o  Extremely large-scale data collections-notably,
                        the American Community Survey (ACS)-can yield
                        estimates for areas small enough that if the data
                        were publicly available, they could be used for
                        nonstatistical, nonpolicy purposes. Some advocates
                        referred to the World War II use of census data to
                        identify the areas where specific numbers of persons
                        of Japanese origin or descent resided. They also
                        pointed out that Census Bureau data on
                        ethnicity-including counts of Arab Americans-are
                        publicly available by zip code. (The Census Bureau,
                        unlike other government agencies and private sector
                        survey organizations, is associated with extremely
                        large-scale data collections, and some persons may
                        not fully differentiate Census Bureau data collection
                        efforts of different sizes.)

                        o  Hostility to or lack of trust in the Census Bureau
                        might result in potentially lower response rates for
                        foreign-born persons, based on the World War II
                        experience of the Japanese or a more recent incident
                        in which Census Bureau staff helped a DHS enforcement
                        unit access publicly available data on ethnicity by
                        zip code. 51, DHS stated that it did not use these
                        data and had not requested the information by zip
                        code.52 The Census Bureau clarified its position on
                        providing help to others requesting publicly
                        available data.53

           Various advocates saw the issues listed above as linked to their
           own acceptance, as well as to respondent acceptance, of a survey.
           Linking these issues to respondent acceptance of a survey was, in
           some cases, echoed by other immigration experts we consulted.54
           Some immigrant advocates and other immigration experts counseled
           us that if there were an increase in enforcement efforts in the
           interior of the United States (as opposed to border-crossing
           areas), foreign-born respondents' acceptance of the grouped
           answers questions would be likely to decrease-at least, if the
           questions were asked in a survey fielded by the government.

           One advocate expressly stated a preference for a grouped answers
           survey with funding by a nongovernment entity, such as a
           foundation. We discussed with a number of immigrant advocates who
           objected to a government-fielded survey the possibility of a
           survey fielded by a private sector organization with government
           funding. In some cases, we specifically referred to one or both of
           the following surveys, which (1) have been conducted for many
           years without inappropriate data disclosures and (2) ask direct
           sensitive questions:

           o  the National Survey on Drug Use and Health (NSDUH), fielded by
           RTI International under a contract from HHS's Substance Abuse and
           Mental Health Services Administration (SAMHSA), and
           o  the National Agricultural Workers Survey (NAWS), fielded by
           Aguirre International, under a contract from the Department of
           Labor.55

           The advocates' response was generally to accept the concept of
           government funding of a university's or private sector survey
           organization's field work, provided that appropriate protections
           of the data were built into the funding agreement.

           GAO's contract with Aguirre International for early testing of the
           grouped answers approach with farmworker respondents specified
           that data on respondents' answers would be "stripped of
           person-identifiers and related information." Additionally, the GSS
           "core funding" grant with NSF and its contractual arrangements
           with sponsors of question modules-such as the grouped-answers
           question insert contracted for by the Census Bureau- do not
           involve the transfer of any data other than publicly available
           data, stripped of identifiers, and limited so as to avoid the
           possibility of "deductive disclosure" with respect to respondent
           identities or local areas.56

           Various advocates said that their acceptance was also contingent
           on factors such as

                        1. high-quality data, including coverage of persons
                        who have limited English proficiency, with special
                        attempts to reach those who are linguistically
                        isolated (that is, members of households in which no
                        one 14 or older speaks English "very well") and to
                        overcome other potential barriers (such as cultural
                        differences);

                                     2. appropriate presentation of the
                                     survey, including an appropriate
                                     explanation of its purpose and how
                                     respondents were selected for interview;
                                     and

                                     3. transparency-that is, keeping the
                                     immigrant community informed about or
                                     involved in the development and progress
                                     of the survey.

           One advocate specifically said that her organization's support
           would be contingent on both (1) the development of more
           information on respondent acceptance within the Asian
           community-particularly among Asians who have limited English
           proficiency or are linguistically isolated-and (2) a survey
           implementation that is planned to adequately communicate with
           Asian respondents, including those who are linguistically isolated
           or have little education.57 Although one-fourth of the 2004 GSS
           test respondents were Asian, the test was conducted in English
           (allowing help from bilingual household members), and no other
           tests have included linguistically isolated Asians.58

           Advocates and Experts Suggest Ways to Maximize Respondent
			  Cooperation and Offer Their Assistance
			  
			  Advocates and other experts made several suggestions for
           maximizing respondent cooperation with a survey using the grouped
           answers question series-that is, maximizing response rates for
           such a survey as well as maximizing authentic participation.

           Advocates suggested that the survey (1) avoid taking names or
           Social Security numbers,59 (2) hire interviewers who speak the
           respondents' home-country language, (3) let respondents know why
           the questions are being asked and how their households came to be
           selected, (4) conduct public relations efforts, (5) obtain the
           support of opinion leaders, (6) select a survey group from a
           well-known and trusted university to collect the data, and (7) ask
           respondents about their contributions to the American economy
           through, for example, working and paying taxes.

           Additionally, survey experts suggested

                        o  using audio-Computer Assisted Self Interview
                        (audio-CASI),60 
                        o  carefully explaining to respondents how anonymity
                        of response is protected, and
                        o  paying respondents $25 or $30 for participating in
                        the interview.

           Survey experts viewed these elements as key ways of boosting
           response rates or encouraging authentic responses to sensitive
           questions. For example, NAWS, which uses respondent incentives,
           achieves extremely high response rates within cooperating farms-97
           percent in 2002, with a $20 payment to farmworkers selected.

           Some immigrant advocates also offered suggestions for how their
           organizations or other advocates might help the effort to develop
           and field the grouped answers approach, including

           GSS Data and Independent Statistical Consultant Review Show
			  �General Usability� of the Grouped Answers Approach
			  
			  As we report above, the Census Bureau's recent analysis of the
           2004 GSS grouped answers data concluded that the "overwhelming
           majority of foreign-born respondents" picked a box without
           objection, hesitation, or silence. The Census Bureau reported,
           more specifically, that roughly 90 percent (216 of 237
           respondents) chose a box, 4 gave other answers, and 17 refused to
           answer or said "don't know."

           Our subsequent analysis excluded 19 of the 237 respondents in the
           Census Bureau analysis because

                        1. providing contacts at local organizations to help
                        with arrangements for future research,
                        2. developing or reviewing Box A follow-up questions,
                        and
                        3. serving on an advisory board with other
                        representatives from immigrant communities.61

                        o  4 were not foreign-born (for example, 1 had been
                        born abroad to parents who had, by the time he was
                        born, become naturalized U.S. citizens);
                        o  1 was not classifiable as either foreign-born or
                        not foreign-born (because he did not know whether his
                        parents were born in the United States);
                        o  4 others were known to have been interviewed on
                        the telephone, based on written-in interviewers'
                        comments recorded in the computer file (for example,
                        one wrote that the respondent could not see the cards
                        because the interview was on the telephone); and
                        o  10 others were subsequently found to have been
                        interviewed on the telephone, based on a special GSS
                        hand check of the interview forms for respondents who
                        had refused or said "don't know," which was carried
                        out in response to our request. 62

           As a result, in our analysis we found that only 6 personally
           interviewed foreign-born GSS respondents refused or said "don't
           know." 63 One of the 6 was an 18-year-old Mexican who told the
           interviewer that he did not know whether or not he was a legal
           immigrant. Additionally, we found that the 4 respondents who gave
           "other answers" had provided usable information (for example, one
           called out that he had a student visa) and thus could be recoded
           into an appropriate box.

           After reviewing the two analyses of the GSS test data-the one that
           the Census Bureau performed and the other we performed-Dr.
           Zaslavsky concluded that

           The test confirms the general usability of the [grouped-answers
           approach] with subjects similar to the target population for its
           potential large-scale use-that is, foreign-born members of the
           general population. Out of about 218 respondents meeting
           eligibility criteria and who were most likely administered the
           cards in person (possibly including a few who had telephone
           interviews but responded without problems), only 9 did not respond
           by checking one of the 3 boxes. Of these, 3 provided verbal
           information that allowed coding of a box, and 6 declined to answer
           the question altogether. Furthermore, several of these [6] raised
           similar difficulties with other 3-box questions on nonsensitive
           topics (type of house where born, mode of transportation to enter
           United States), suggesting that the difficulties with the question
           format were at least in part related to the format and not to the
           particular content of the answers. Thus, indications were that
           there would not be a systematic bias due to respondents whose
           immigration status is more sensitive being unwilling to address
           the 3-box format.

           Dr. Zaslavsky emphasized the importance of minimizing or
           completely avoiding telephone interviews when using the grouped
           answers approach-or, alternatively, providing advance copies of
           the cards to respondents before interviewing over the telephone.64
           (Dr. Zaslavsky's written review is presented in full in appendix
           III.)

           Various Tests Are or May Be Needed
			  
			  The findings on respondent acceptance-that is, the GSS test-raised
           some unanswered questions about acceptance that experts said
           should be addressed. Additionally, the experts said that one or
           more tests of response validity are needed to determine whether
           respondents "pick the correct box" versus systematically avoiding
           Box B.

           Questions for Further Research Were Suggested by the GSS Test
			  
			  The independent reviewer of the GSS analyses (Dr. Zaslavsky)
           concluded that

           four issues should be addressed in future field tests:

           (a) Equivalent acceptability of all forms of the response card,

           (b) Usability with special populations including those with low
           literacy, the linguistically isolated, and concentrated immigrant
           populations,

           (c) Methods that avoid telephone interviews, or reduce bias and
           nonresponse due to use of the telephone,

           (d) Use of follow-up questions to improve the accuracy of box
           choices.

           As the independent expert explained with respect to point (b), GSS
           undercoverage of the foreign-born population occurred at least in
           part because interviews were conducted only in English,  although
           household members could help respondents with limited English.65
           Various colleagues and experts we talked with supported points (a)
           through (d). We further note that points (a) and (c) were covered
           or touched on in the Census Bureau's paper reporting its analysis
           of the 2004 GSS data. In our discussions with Census Bureau staff,
           they also mentioned that further tests of acceptance should
           include (d) follow-up questions for Box A respondents.

           Additionally, some advocates and an immigration researcher
           suggested improving the cards, which might minimize the potential
           for "don't know" or inaccurate answers. A survey expert suggested
           using focus groups to further explore respondent perceptions of
           the cards-and to potentially improve them.66

           Earlier testing covered a key portion of the populations (Hispanic
           farmworkers) cited in (b) above, was conducted in Spanish, and
           included Box A follow-up questions as recommended in (d) above.67
           In those interviews, every respondent picked a box. However,

                        1. No language other than Spanish or English has been
                        used in testing; thus, as one immigrant advocate
                        pointed out, no testing has focused on linguistically
                        isolated Asians (those living in households in which
                        no adult member speaks English).

                                     2. The interviews with Hispanic
                                     farmworkers were not conducted under
                                     typical conditions of a household
                                     survey.

                                     3. Only one immigration status card was
                                     tested with Hispanic farmworkers and in
                                     the GSS.

           Therefore, we agree that the acceptance-testing issues the experts
           raised should be considered in assessing the grouped answers
           approach.

           Studies Should Test Whether Respondents Pick the Correct Box
			  
			  Several experts told us that tests of respondent accuracy-or at
           least respondents' intent to respond accurately-should be
           conducted. These experts emphasized that grouped answers data
           would not be useful if substantial numbers of respondents were to
           systematically avoid picking Box B (that is, to not pick the box
           with the undocumented category). However, one immigration study
           expert believed that if a response validity study involved lengthy
           delays, fielding a grouped answers survey should proceed in
           advance of a validity study.

           We agree with the experts' position that tests are needed to
           determine whether respondents systematically avoid Box B (even
           after Box A follow-up check questions). Tests of response validity
           would ideally be conducted with the methods of encouraging
           truthful answers that experts mentioned, such as (1) explaining
           why the survey is being conducted, how the respondent was
           selected, and how the anonymity of answers is ensured, and (2)
           using audio-CASI and, if appropriate, paying respondents for
           participating in the interview. And, as the Census Bureau pointed
           out, such a study should include the full grouped answers question
           series, including follow-up questions, and it should test both
           Card 1 and Card 2. Even if small numbers of respondents were to
           respond inaccurately, it would be helpful to estimate this and
           adjust for any resulting bias.

           We discussed various approaches to conducting validity studies
           with immigration experts, including immigrant advocates, and with
           agencies conducting surveys. In reviewing these approaches, we
           found that response validity tests vary according to whether they
           are conducted before, during, or after a survey is fielded.

           Before a large-scale survey is conducted. The grouped answers
           question series could be asked of a special sample of respondents
           for whom the answers are known, in advance, by study investigators
           on an individual-respondent basis. Such knowledge might be based,
           for example, on information that recent applicants for green cards
           have submitted to DHS.68 "Firewalls" could be used to prevent
           survey information from being given to DHS.  We discussed this
           approach with DHS; however, experts criticized a DHS-based
           validity study on both methodological and public relations
           grounds.69 An alternative source of data on individuals'
           immigration statuses might avoid these problems, but no
           alternative source has yet been identified.

           Before or as part of a large-scale survey. In either situation
           (that is, in a presurvey study or as part of a survey),
           respondents could be asked if they would be willing to participate
           in special validity-test activities in return for a payment of,
           say, $25 or $30 for each activity. Later, after interviewing had
           been completed in a given location-not as part of the interview
           process-a sample of respondents who chose Box A (that is, those
           who claimed to be here legally) could be asked to

                        o  participate in a focus group in which respondents
                        would discuss how they felt answering the grouped
                        answers questions when the interviewer came to their
                        house and, also, could possibly be asked to fill out
                        a "secret ballot" indicating whether they had
                        answered authentically in the earlier home interview;
                        o  give permission for a record check and provide
                        information that could subsequently be used in a
                        record check (for example, their name, date of birth,
                        and Social Security number) and permission to check
                        these data with the Social Security Administration;70
                        or
                        o  show his or her documentation (for example, green
                        card) to a documents expert.71

           These checks would logically be focused on Box A respondents, for
           most of whom such checks would be less threatening. We believe
           that it is reasonable to assume that most respondents who chose
           Box B picked the correct box. Further, because the survey
           interview states that there are no more questions on immigration
           if the respondent picks Box B, pursuing follow-up validity checks
           might be deemed inappropriate for Box B respondents.72

           After data are collected. With a large-scale survey, it would be
           possible to conduct comparative analyses after the data were
           collected. We provide three examples.73

                        1. Grouped answers estimates of the percentage
                        undocumented could be compared for (a) all
                        foreign-born versus (b) high-risk groups, such as
                        those who arrived in the United States within the
                        past 5 or 10 years. The expectation would be that
                        with valid responses, a higher estimate of the
                        percentage undocumented would be obtained for those
                        who arrived more recently-because, for example,
                        persons who had arrived recently were not here during
                        the amnesty in the late 1980s.74 
                        2. Comparisons could be made of (a) Box A estimates
                        of specific legal statuses and the approximate dates
                        received-notably, the numbers of persons claiming to
                        have received valid green cards in 1990 or more
                        recently-with (b) publicly available DHS reports of
                        the numbers of green cards issued from 1990 to the
                        survey date.75 
                        3. Analysts could compare (a) grouped answers
                        estimates of the number undocumented overall to (b)
                        estimates of total undocumented obtained by the
                        residual method.76

           Wherever possible, Card 1 and Card 2 should be tested separately
           for accuracy of response.

           The advantage of conducting a validity study in advance of a
           survey is that if significant problems surface, adjustments in the
           approach can be made. Or if the problems are substantial and
           cannot be easily corrected-and if the anticipated survey were to
           be fielded mostly or only to collect grouped answers data-then
           that survey could be postponed or canceled. However, the results
           of validity tests conducted during or after a survey can be used
           to interpret the data and, potentially, to adjust estimates if it
           appears that, for example, 5 to 10 percent of undocumented
           respondents had erroneously claimed to be in Box A of Card 1. As
           one expert noted, conducting an advance study does not preclude
           conducting a subsequent study during or after the survey.

           Some 6,000 Foreign-Born Respondents Are Needed for �Reasonably
			  Precise� Estimates of the Undocumented
			  
			  Although several factors are involved, and it is not possible to
           guarantee a specific level of precision in advance, we estimate
           that roughly 6,000 foreign-born respondents, or more, would be
           needed for a grouped answers survey.77 As we explain below, this
           is based on (1) a precision requirement (that is, a 95 percent
           confidence interval consisting of plus or minus 3 percentage
           points), (2) assumptions about the sampling design of the survey
           in which the questions are asked, and (3) the assumption that
           approximately 30 percent of the foreign-born population is
           currently undocumented.

           An indirect grouped answers estimate of the undocumented
           population generally requires interviews with more foreign-born
           respondents than a corresponding hypothetical direct estimate
           would-assuming it were possible to ask such questions directly in
           a major national survey. One key reason is that the main sample of
           foreign-born respondents must be divided into two subsamples. Half
           the respondents answer each immigration status card. On this basis
           alone, one would have to double the sample size required for a
           direct estimate based on a question asked of all respondents.
           Further, the estimate of undocumented, which is achieved by
           subtraction, combines two separate estimates, each characterized
           by some degree of uncertainty.78

           Determining the number of respondents required for a "reasonably
           precise" estimate of the percentage of the foreign-born population
           who are undocumented involves three key factors:

                        1. specification of a precision level-that is, choice
                        of a 90 percent or 95 percent confidence level and an
                        interval defined by plus or minus 2, 3, or 4
                        percentage points;
                        2. information on (or assumptions about) the sampling
                        design for the main survey and for subsamples 1 and
                        2; and
                        3. to the extent possible, consideration of the
                        likely distribution of the foreign-born population
                        across immigration status categories, including the
                        various legal categories and the undocumented
                        category.79

           With respect to the first factor involved in determining sample
           size, some agencies-for example, the Census Bureau and the Bureau
           of Labor Statistics (BLS)-use the 90 percent confidence level.
           Other agencies use the 95 percent level.

           With respect to the second factor, the sampling design of a
           large-scale, nationally representative, personal-interview survey
           is based on probabilistic area sampling rather than simple random
           sampling of individuals. This often reduces the precision of
           estimates (relative to simple random sampling).80 The reason is
           that persons selected for interview are clustered in a limited
           number of areas or neighborhoods (and residents of a particular
           neighborhood may tend to be similar). It is possible that the
           design for selecting subsamples 1 and 2 could increase precision;
           however, it is not possible to predict by how much.81

           With respect to the third factor, existing residual estimates
           point to a fairly even 3-way split between three main
           categories-undocumented, U.S. citizen, and legal permanent
           resident. However, there is some uncertainty associated with these
           estimates, the distribution may vary across subgroups, and the
           percentages may change in future.82 Therefore, a range of
           distributions is relevant.

           Taking each of these factors into account (to the extent possible)
           and using conservative assumptions, we estimated the approximate
           numbers of respondents required for indirect estimates of the
           undocumented population that are "reasonably precise."

           Table 1 shows required sample sizes for the 90 percent confidence
           level, table 2 for the 95 percent level, with precision at plus or
           minus 2, 3, and 4 percentage points. In estimating these required
           sample sizes, we made conservative assumptions and specified a
           range of possibilities for the distribution with respect to the
           undocumented category.

           To identify a single, rough figure for the sample size needed for
           reasonably precise estimates, we focused on

                        1. the 95 percent level, which is more certain and,
                        we believe, preferable;

                        2. the 30 percent column, because a current residual
                        estimate of the undocumented population is in this
                        range; and

                        3. the middle row (for plus or minus 3 percentage
                        points), which is a midpoint within the area of
                        "reasonable precision" as defined above.

           With this focus, we estimate that roughly 6,000 or more
           respondents would be required.83 

39 GAO/GGD-98-164 , p. 3.

40While NAWS data collections are fielded annually, results are generally
reported every other year. See U.S. Department of Labor, Findings from the
National Agricultural Workers Survey (NAWS) 2000-2002: A Demographic and
Employment Profile of United States Farm Workers. Research Report 9
(Washington, D.C.: March 2005).

41The SIPP flash card has neither an undocumented category nor an "other
status not listed" category. However, persons reported to have an
immigration status not on the SIPP card-which would logically include
undocumented persons as well as a small number of persons in various minor
legal immigration categories-are tallied separately.

42Although NAWS and SIPP have received OMB clearance (under the Paperwork
Reduction Act), and although no special field problems have emerged, it is
difficult to say whether field problems might arise in future. Reasons
include question-threat and related problems depending, in part, on
contextual factors, such as current levels of immigration enforcement in
the nonborder areas of the United States, and the perceived relevance of
the question to the survey.

43The contract specified that Aguirre would provide GAO data on actual
responses that had been "stripped of person-identifiers and related
information."

44Additionally, GAO conducted cognitive interviews focused on testing the
appropriateness of the icons used on the cards (see GAO/GGD-00-30 , pp.
44-45). Cognitive interviewing focuses on the mental processes of the
respondent while he or she is answering a survey question. The goals are
to find out what each respondent thinks the question is asking, what the
specific words or phrases (or icons on a card) mean to him or her, and how
he or she formulates an answer. Typically, cognitive interviewing is an
iterative process in which the findings or problems identified in each set
of interviews are used to modify the questions to be tested in the next
set of interviews.

45 GAO/GGD-98-164 and GAO/GGD-00-30 .

46The GSS consists of a "core" question series and additional "modules."
The funding for fielding the core question series is provided by a grant
from NSF. The modules are question series added through a variety of
grants and contracts.

47An expert reviewer of a draft of this report noted that the housing
types on the training card shown in figure 5 are not all mutually
exclusive; that is, a single family house can be located on a farm.

48These cards were initially subjected to 1997-98 developmental tests 
conducted with more than 100 Hispanic immigrants who were farmworkers or
in other situations such as applying for aid at a legal clinic
specializing in immigration cases-such that a fair number of those
interviewed seemed relatively likely to be undocumented. See GAO/GGD-00-30
and GAO/GGD-98-164 .

49The Census Bureau's paper said that field representatives reported that
the remaining respondents were in doubt and may not have understood.

50The Census Bureau's paper also noted that the nonresponse rate for the
GSS overall (that is, averaged across a combination of U.S.-born and
foreign-born persons selected for the sample) was 29.6 percent. (Persons
who are selected for interview but not interviewed may be either
native-born or foreign-born; because they were never asked and never
reported where they were born, a specific response rate for the
foreign-born cannot be calculated.)

51See Samia El-Badry and David A. Swanson, "Providing Special Census
Tabulations to Government Security Agencies in the United States: The Case
of Arab-Americans," paper presented at the 25th International Population
Conference of the International Union for the Scientific Study of
Population, Tours, France, July 18-23, 2005. One advocate was particularly
concerned about the possibility that lower respondent cooperation might
have resulted from these incidents and, if so, might have led to
underrepresentation of these communities in Census Bureau data.
Additionally, one advocate questioned whether local estimates of the
undocumented might, in future, facilitate possible efforts to base
apportionment on population counts that do not include undocumented
residents. We note that most large-scale personal-interview surveys do not
include sufficient numbers of foreign-born respondents to allow indirect
grouped answers estimates of undocumented persons for small geographic
areas, such as zip codes.

52See "U.S. Customs and Border Protection Statement on Census Data,"
Department of Homeland Security, Press Office, Washington, D.C., August
13, 2004.

53Charles Louis Kincannon, Director, "Procedures for Providing Assistance
to Requestors for Special Data Products Known as Special Tabulations and
Extracts," memorandum to Associate Directors, Division Chiefs, Bureau of
the Census, Washington, D.C., August 26, 2004.

54It might be noted that SIPP officials told us that when the Census
Bureau conducted the SIPP survey and asked about immigration status,
interviewers did not experience field problems. However, SIPP asks about
immigration status at the time when respondents came to this country (and
one other question); SIPP stopped short of a specific question on current
undocumented status-and the SIPP data do not allow indirect estimation of
the number who are currently undocumented.

55These two examples involve agencies that are viewed neutrally by the
immigrant advocates we talked with. (Agencies that are viewed negatively
by some immigrant advocates are DHS and the Census Bureau.)

56GSS receives funding for its core questions through a grant from NSF.
GSS interviewers and advance letters told respondents about the NSF
sponsorship. Additionally, respondents were told that one purpose of the
survey was to inform government officials.

57This would mean communication that takes account of cultural as well as
language concerns.

58The 2004 GSS was limited to respondents who either were fluent in
English or were helped by a household member who was fluent in English;
some persons with limited English proficiency are likely to have been
reached. The preliminary testing and development of the grouped answers
approach offered a choice of Spanish or English interviews. However,
linguistically isolated non-Hispanics have not yet been included in any
test.

59Later in this report, we describe potential ways of testing whether
respondents "pick the correct box"-ways that do not require routine
collection of respondent names and Social Security numbers as part of the
main survey.

60CASI, or Computer Assisted Self Interview, means that the respondent
himself or herself uses a laptop to view the questions and flash cards and
to indicate his or her answers. Audio-CASI adds earphones so that
questions and instructions can be spoken to the respondent while he or she
views the questions on the screen. Audio-CASI programming can be completed
in any one of several languages. Experts told us that studies have shown
increased reporting of sensitive items when audio-CASI is used.

61Two advocates mentioned positively the transparency that the Census
Bureau works toward through outreach to immigrant-advocate organizations.
This outreach includes explanation of data collection goals and policies.

GSS Data and Independent Statistical Consultant Review Show "General Usability"
of the Grouped Answers Approach

62GSS Director Tom Smith graciously arranged for a hand check of
interviews coded refusal or "don't know," thus providing key information
to us in time for this report. (Specific mode-of-interview data for all
2004 GSS respondents will not be available until the end of 2006.) The GSS
Director also said that, overall, about 10 percent of the 2004 GSS
interviews were conducted over the telephone.

63Similar numbers refused or said "don't know" on the two 3-box training
cards. Specifically, 8 respondents refused or said "don't know" on the
housing card, 6 on the transportation card.

64Alternatively, we believe that it might be possible to estimate the bias
incurred by including a small number of telephone interviews in the
analysis (or by eliminating them from the analysis).

65Questions were asked and answers were apparently given in English.

66The pretesting and cognitive testing conducted on the cards so far has
been limited to certain groups of Hispanics. We believe that testing with
other groups, potentially including focus group testing, could be
important before large-scale implementation. It also might be appropriate
to change specific categories and definitions of statuses on the cards,
depending on future changes in laws.

67In fact, a key part of the earlier testing focused on the development of
icons to help respondents with limited literacy.

68NCHS has suggested that some kind of validity test at the individual
level is needed. Interviewing persons whose status is known in advance is
a classic approach.

69One expert scoffed at a validity test limited to persons whose
immigration status is known to DHS. An immigrant advocate pointed to the
issues that arose when the Census Bureau helped DHS obtain publicly
available information on ethnicity by zip code; she indicated that a
public relations problem could result even if only carefully crafted,
carefully protected sharing of information took place.

70One immigrant advocacy organization pointed out that it would be
important in such a study to protect the data so that the agency checking
records (in this instance, the Social Security Administration) could not
discover information about any identifiable respondent. Protective
approaches might include (1) using code numbers and a "third party" model
and (2) adding numerous "fake" cases to the checklist and notifying the
agency that this was being done. (See GAO, Record Linkage and Privacy:
Issues in Creating New Federal Research and Statistical Information.
GAO-01-126SP (Washington, D.C.: April 2001).)

71The ideas for these approaches are an outgrowth of our discussions
concerning NSDUH with SAMSHA. The NSDUH project officer said that as part
of that survey (which is fielded by RTI International in Research Triangle
Park, N.C., under a contract with SAMHSA), a sample of respondents were
offered $25 for a hair sample and $25 for a urine sample. Ninety percent
of those offered the incentive payments provided one or both samples.

72It would be important to craft such a study so that respondents would
not be tempted to distort information in order to receive payment. One
immigrant advocate suggested asking "what other experience federal
agencies have had with paying a select group of respondents to participate
in a validity test" to determine "whether the payment approach is
considered scientifically sound." One way of addressing this concern might
be to offer all or some Box B respondents a "minimal threat" follow-up
opportunity, such as participating in a focus group, which could also be
associated with a payment.

73Other possible comparative analyses might also be useful. DHS suggested
comparisons to results from the Latin American Migration Project and the
New Immigrant Survey.

74This is a version of the standard "known groups" validity test-an
approach that NCHS suggested using if it is not possible to conduct
individual checks.

75An expert in immigration studies suggested this test. As DHS's comments
indicate, such a test would involve adjusting the DHS figures on, for
example, the number of green cards issued in specific years to account for
subsequent return-migration and mortality, as well as taking account of
survey undercoverage. For information on adjustments needed in comparisons
involving green cards, see Nancy F. Rytina, Estimates of the Legal
Permanent Resident Population and Population Eligible to Naturalize in
2004 (Washington, D.C.: Department of Homeland Security, Office of
Immigration Statistics, February, 2006), p. 3, table 2. For an analogous
comparison for U.S. citizenship, see Jeffrey S. Passel, Rebecca L. Clark,
and Michael Fix, "Naturalization and Other Current Issues in U.S.
Immigration: Intersections of Data and Policy," in Proceedings of the
Social Statistics Section of the American Statistical Association: 1997
(Alexandria, Va.: American Statistical Association, 1997).

76This test was suggested by another expert in immigration studies.
Residual estimates are based primarily on comparing (1) administrative
data on the number of legal immigrants with (2) census counts or survey
estimates of the number of foreign-born residents who have not become U.S.
citizens.

77A sample of foreign-born is contained within a general sample of the
household population. As we explain in a later section of this report, an
efficient way to survey the foreign-born is by piggybacking on an
existing, ongoing large-scale survey of the total household population,
which includes foreign-born persons-if an appropriate ongoing survey can
be identified. A higher-cost alternative would be to identify a new sample
of the total household population and screen (by mini-interviews conducted
by telephone or in person or both) for households that contain one or more
foreign-born persons.

78The size of the error associated with a grouped answers estimate
relative to a direct estimate depends on the distribution of immigration
statuses. Assuming that 33.3 percent of foreign-born persons are in the
undocumented category, 33.3 percent are in the set of legal statuses in
Card 1, Box A, and 33.3 percent are in the set in Card 2, Box A, we would
expect the error associated with a grouped answers estimate of the
percentage undocumented to be twice that associated with a corresponding
direct estimate.

79If there is no information on the distribution of immigration status,
then a potentially very large sample size would be estimated, based on a
"worst case scenario" distribution. However, if there is information, this
may allow a given level of precision to be attained with a smaller sample.

80To illustrate how this occurs in practice, referring to the National
Health Interview Survey (NHIS), NCHS told us that an estimate of the
percentage of persons who are foreign-born, 18 to 39 years old, and U.S.
citizens is characterized by a variance that is roughly 1.6 times the
variance that would be associated with a corresponding estimate based on
simple random sampling. (In theory, a complex sampling design could reduce
the variance rather than increasing it.)

81The independent statistical consultant (Dr. Zaslavsky) advised us that
rotating the use of immigration status cards 1 and 2 in every other
household interviewed (balancing the use of alternative cards within areas
or clusters) might increase precision. The logic is that because some
areas are defined by factors such as income and ethnicity-which might be
related to immigration status-rotation would help ensure balance on these
factors.

82For example, it is possible that new immigration laws would allow large
numbers of currently undocumented persons to legalize their status.

83We believe these are reasonable choices but we realize that others might
focus on, for example, more precise estimation (plus or minus 2 percentage
points).

Table 1: Approximate Number of Foreign-Born Respondents Needed to Estimate
Percentage Undocumented within 2, 3, or 4 Percentage Points at 90 Percent
Confidence Level, Using Two-Card Grouped Answers Data

                                  Percent undocumented foreign-born (range of
Estimate within 2, 3, or 4                   possibilities)
percentage points                 10%       30%     50%     70%        90%
2                                   10,700    9,900   8,100   5,500  2,100 
3                                    4,800    4,400   3,600   2,500    900 
4                                    2,700    2,500   2,000   1,400    500 

Source: GAO analysis.

Note: Estimated numbers of respondents were calculated assuming that (1)
foreign-born persons who are not undocumented are evenly split between the
legal statuses in Box A, Card 1, and Box A, Card 2 (a conservative
assumption in that it maximizes the required number of respondents), (2)
sample selection design for the main survey and for subsamples 1 and 2
increases the variance of an estimate of undocumented by 1.6 (which does
not build in potential reductions in variance that might occur with a
careful design for the selection of subsamples 1 and 2); and (3) for
simplicity, no respondents choose Box C.

Table 2: Approximate Number of Foreign-Born Respondents Needed to Estimate
Percentage Undocumented, within 2, 3, or 4 Percentage Points, at 95
Percent Confidence Level, Using Two-Card Grouped Answers Data

                                  Percent undocumented foreign-born (range of
Estimate within 2, 3, or 4                   possibilities)
percentage points                 10%      30%      50%     70%        90%
2                                  15,200   14,000   11,500   7,800  2,900 
3                                   6,800   6,200a    5,100   3,500  1,300 
4                                   3,800    3,500    2,900   2,000    700 

Source: GAO analysis.

Note: Estimated numbers of respondents were calculated assuming that (1)
foreign-born persons who are not undocumented are evenly split between the
legal statuses in Box A, Card 1, and Box A, Card 2 (a conservative
assumption in that it maximizes the required number of respondents), (2)
sample selection design for the main survey and for subsamples 1 and 2
increases the variance of an estimate of undocumented by 1.6 (which does
not build in potential reductions in variance that might occur with a
careful design for the selection of subsamples 1 and 2); and (3) for
simplicity, no respondents choose Box C.

aThis is the approximate number of foreign-born respondents needed for an
overall estimate of the percentage undocumented with a confidence interval
of plus or minus 3 percentage points at the (preferred) 95% confidence
level, assuming that 30% of the foreign-born are undocumented.

High-risk subgroups-subgroups with higher percentages of undocumented
(such as adults 18 to 44 and persons who arrived in the United States
within the past 10 years)-would require fewer respondents for the same
level of precision, as illustrated in the tables' middle and right
columns. For example, if about 70 percent of a subgroup were undocumented,
a survey with about 3,500 respondents in that subgroup would produce an
estimate of the percentage of the subgroup that is undocumented, correct
to within approximately plus or minus 3 percentage points at the 95
percent confidence level.

Low precision could obtain for smaller subgroups in which there are
relatively few undocumented persons (for example, 10 percent or less),
particularly if-as assumed in tables 1 and 2-there is an even split of
legally present foreign-born persons across the Box A categories of
immigration status cards 1 and 2.84

The independent statistician we consulted indicated that if more than one
grouped answers survey is conducted, combining data across two or more
surveys could help provide larger numbers of respondents for subgroup
analysis. For example, if a large-scale survey were conducted annually,
analysts could combine 2 or 3 years of data to obtain more precise
estimates. (One caveat is that combining data from multiple survey years
reduces the time-specificity associated with the resulting estimate.)

Finally, we note that to estimate the numerical size of the undocumented
population,

           o  A grouped answers estimate of the percentage of the
           foreign-born who are undocumented would be combined with a census
           count of the foreign-born or an updated estimate. For example, the
           2000 census counted 31 million foreign-born persons, and the
           Census Bureau later issued an updated estimate of 35.7 million for
           2005.

           o  The specific procedure would be to multiply the percentage
           undocumented (based on the grouped answers data and the
           subtraction procedure) by a census count or an updated estimate of
           the foreign-born population for the year in question.

The precision of the resulting estimate of the numerical size of the
undocumented population would be affected by (1) the precision of the
grouped answers percentage estimate, which is closely related to sample
size, as described above, and (2) any bias in the census count or updated
estimate of the foreign-born population.85 The precision of the grouped
answers percentage is taken into account by using a percentage range (for
example, the estimate plus or minus 3 percentage points) when multiplying.
Although the amount of bias in a census count or updated estimate is
unknown, we believe that any such bias would have a proportional impact on
the calculated numerical estimate of the undocumented population.86

84However, if the percentage undocumented overall were to sharply
decrease, it might be appropriate to change the groupings on the cards to
mitigate this factor.

85Such bias might arise from problems in accurately covering the
foreign-born population. An additional caveat is that coverage of the
undocumented may be lower than coverage of other foreign-born persons. We
examined coverage issues in GAO/GGD-98-164 .

To illustrate the proportional impact, we assume that a census count for
total foreign-born is 5 percent too low. Using that count in the
multiplication process would cause the resulting estimate of the size of
the undocumented population to be 5 percent lower than it should be.87 The
situation is analogous for subgroups.88

Overall, it seems clear that reasonably precise grouped answers estimates
of the undocumented population and its characteristics require large-scale
data collection efforts but not impossibly large ones.

            The Most Efficient Field Strategy Does Not Seem Feasible

A low-cost field strategy would be to insert the new question series in an
existing, nationally representative, large-scale survey-that is, to pose
the grouped answers questions to the foreign-born respondents already
being interviewed. However, based on our review of on-going large-scale
surveys, the insertion strategy does not seem feasible. Specifically, we
identified four potentially relevant surveys but none met criteria based
on the grouped answers design and other criteria based on immigrant
advocates' concerns.

86This assumes that the census count or updated estimate is a constant.

87Suppose hypothetically that an updated estimate for some future year
estimates the foreign-born population as 40 million and that a grouped
answers estimate of the percentage of foreign-born who are undocumented is
30 percent. Multiplying 40 million by 30 percent would yield an estimate
of 12 million undocumented (hypothetical data). Further suppose that the
true size of the foreign-born population, in that future year, were
actually 42 million. Multiplying 42 million by 30 percent would yield 12.6
million- a result just 5 percent higher than 12 million.

88In contrast, analysts have pointed to a potentially disproportionate,
magnifying impact of bias in census counts (or error in updated estimates
of the size) of the foreign-born population on residual estimates of the
number who are undocumented. See Kenneth Hill, "Estimates of Legal and
Unauthorized Foreign-Born Population for the United States and Selected
States Based on Census 2000," presentation at the U.S. Census Bureau
Conference, Immigration Statistics: Methodology and Data Quality,
Alexandria, Virginia, February 13-14, 2006. Siegel and Swanson (p. 479)
make a similar point.

The dollar costs associated with inserting a grouped answers module are
difficult to calculate in advance because many factors are involved.
However, to suggest the "ball park" within which the cost of a grouped
answers insert might be categorized, if an insertion were possible, we
present the following two examples.

           o  The GSS test, in which a grouped answers question module was
           inserted, cost approximately $100 per interview (more than 200
           interviews were conducted). On average, the question series took
           3.25 minutes. Logically, per-interview costs are likely to be
           higher in relatively small surveys than in larger surveys with
           thousands of foreign-born respondents.

           o  For the much larger Current Population Survey (CPS), with
           interviews covering native-born and foreign-born persons in more
           than 50,000 households, the Census Bureau and BLS told us that "an
           average 10-minute supplement cost $500,000 in 2005."89 This
           implies $10 per interview at the 50,000 level, but per-interview
           costs might be higher when the question series applied to only a
           portion of the respondents. Additional costs might apply for flash
           cards and foreign-language interviews. BLS noted that still other
           costs would apply for advance testing and subsequent analyses
           requested by the customer.

A more costly option would be to ask the grouped answers question series
in a follow-back survey of foreign-born respondents identified in
interviewing for an existing survey. (In-person self-report interviews can
cost $400 to $600 each.) More costly still would be the development of a
new, personal-interview survey of a representative sample of the
foreign-born population devoted to migration issues; the main reason is
that there would be additional costs in "screening out" households without
foreign-born persons.

We identified four potentially relevant ongoing large-scale surveys. All
have prerequisites and processes for accepting (or not accepting) new
questions. We also developed six criteria for assessing the
appropriateness of each survey as a potential vehicle for fielding the
grouped answers approach. Three criteria are based on design requirements,
and three are based on the views of immigrant advocates. We found that no
ongoing large-scale survey met all criteria.

89More than 6,000 of these households included one or more foreign-born
persons.

Four Ongoing Large-Scale Data Collections Sometimes Accept Additional Questions

We identified four nationally representative, ongoing large-scale surveys
in which respondents are or could be personally interviewed.90 Three of
these conduct most or all interviews in person:

           1. the Current Population Survey (CPS), sponsored by BLS and the
           Census Bureau and fielded by Census;

           2. the National Health Interview Survey (NHIS), sponsored by the
           National Center for Health Statistics (NCHS) and fielded by the
           Census Bureau; and

           3. the National Survey on Drug Use and Health (NSDUH), sponsored
           by SAMHSA and fielded by RTI International, a private sector
           contractor.

The fourth survey is the American Community Survey (ACS), a much larger
survey fielded by the Census Bureau and using "mixed mode" data
collection. The majority of the data are based on mailed questionnaires or
telephone interviews, with the remaining data based on personal
interviews. In addition, there is one personal-interview follow-back
survey that uses the ACS frame and data to draw its sample.91 Other
follow-back surveys might eventually be possible.

For any of these four surveys, inserting a new question or set of
questions (or fielding a "follow-back" survey based on respondents'
answers in the main survey) requires approvals by the Office of Management
and Budget (OMB), the agencies that sponsor or field the surveys, and in
cases in which data are collected by a private sector organization, the
organization's institutional review board.

The prerequisites for an ongoing survey's accepting new questions
typically include low anticipated item nonresponse, pretesting and pilot
testing (including debriefing of respondents and interviewers) that
indicate a minimum of problems, review by stakeholders to determine
acceptability, and tests that indicate no effect on either survey response
rates or answers to the main survey's existing questions.92 Another
prerequisite would be the expectation of response validity.93

90A fifth survey, SIPP, a large-scale in-person survey, is scheduled to be
"reengineered" to provide an "effective alternative to the current SIPP."
It is anticipated that administrative data will be combined with survey
data, although the exact directions that the revised effort will take are
not yet known. (We defined large-scale as 50,000 or more interviews,
including native-born and foreign-born respondents. The foreign-born
represent about 12 percent of the national population, implying that a
survey of 50,000 U.S. residents could be expected to collect data on
roughly 6,000 foreign-born persons.)

91This follow-back survey concerns alcohol use and alcoholism; it is
sponsored by the National Institute of Alcohol Abuse and Alcoholism. OMB
told us that, in part because ACS is a new survey, very few other
follow-up efforts, if any, are likely to be approved in the next few
years.

Additionally, multiple agencies mentioned a need for prior "cognitive
interviewing," compatibility with existing items (so that there is no need
to change existing items), and no significant increase in "respondent
burden" (by, for example, substantially lengthening the interview).94

Agencies sponsoring or conducting large-scale surveys varied on the
perceived relevance of immigration to the main topic of their survey. For
example, BLS noted that some of its customers would be interested in data
on immigration status by employment status (among the foreign-born), and
the Census Bureau has indicated the relevance of undocumented immigration
to population estimation. But some other agencies saw little relevance to
the large-scale surveys they sponsored or conducted. Resistance to
including a grouped answers question series might occur where an agency
perceives little or no benefit to its survey or its customers.

Additionally, one agency raised the issue of informed consent, which we
discuss in appendix V.

92For example, with respect to possible impacts on answers to main-survey
questions, SAMHSA (which sponsors the NSDUH) indicated a concern that
asking about immigration status might make respondents less likely to
provide honest answers to questions about illegal behaviors such as drug
use (potentially because of fear of such actions as deportation).

93As we discussed in a previous section, experts told us that it is
important to demonstrate that respondents, especially undocumented
respondents, "pick the correct box"-or at least to demonstrate that they
intend to pick the correct box (rather than avoiding Box B).

94Cognitive interviewing focuses on the mental processes of the respondent
while he or she is answering a survey question. The goals are to find out
what each respondent thinks the question is asking, what the specific
words or phrases (or icons on a card) mean to him or her, and how he or
she formulates an answer. Typically, cognitive interviewing is an
iterative process in which the findings or problems identified in each set
of interviews are used to modify the questions to be tested in the next
set of interviews.

No Ongoing Large-Scale Data Collection Met Our Criteria

Based on the design of the grouped answers approach, as tested to date,
two criteria for an appropriate survey are (1) personal interviews in
which respondents can view the 3-box cards and (2) a self-report format in
which questions ask the respondents about their own status (rather than
asking one adult member of a household to report information on others). A
third criterion is that the host survey not include highly sensitive
direct questions that could affect foreign-born respondents' acceptance of
the grouped answers questions.95 We based these criteria on the results of
the GSS test, our knowledge of the grouped answers approach, and general
logic.

As shown in table 3, one of the surveys we reviewed (the CPS) does not
meet the self-report criterion; that is, it accepts proxy responses. Two
other surveys (the NHIS and NSDUH) do not meet the criterion of an absence
of highly sensitive questions, since they include questions on HIV status
(NHIS) and the use of illegal drugs (NSDUH). Conducting a follow-back
survey based on ACS would meet all three criteria.96

95For example, if a respondent had already admitted engaging in a behavior
related to illegal activity, he or she might be less likely to accurately
answer a question on immigration status. Of course, if future testing were
to indicate that a particular type of sensitive item did not affect
immigration responses, this criterion would be dropped.

96The ACS is a mixed-mode rather than a solely personal-interview survey.
It gathers information on all members of a household based, in some cases,
on a single adult respondent-informant rather than randomly selecting one
or more respondents in each household and asking them to provide
information about themselves. However, one follow-back personal interview
survey has based its sample selection on the ACS frame and its data. We
further note that if a follow-back survey based on the CPS could be
conducted, then-provided that the follow-back was designed for self-report
personal interviews-it would meet the criteria in table 3.

Table 3: Survey Appropriateness: Whether Surveys Meet Criteria Based on
the Grouped Answers Design

                                  Three      
                               design-based  
                                 criteria    
                              1. Are the     2. Are all         3. Are direct 
                              data gathered  respondents        questions not 
               Specific       in personal    selected to        highly        
Survey type survey         interviews?    self-report?       sensitive?    
Ongoing     Current        YES. Mostly,   No. An adult       YES, not      
survey      Population     for in-person  respondent reports highly        
               Survey (CPS)   waves;  16% of on self and        sensitive.b   
                              foreign-born   provides proxy     
                              interviewed by responses for      
                              telephone, in  others in his or   
                              the in-person  her household.     
                              waves.a        In-person data for 
                                             6,744 households   
                                             with 1 or more     
                                             foreign-born       
                                             members (2006).    
               National       YES. Mostly;   YES. For some      No. There are 
               Health         17% of         questions, but not direct        
               Interview      foreign-born   all, 4,829         questions on  
               Survey (NHIS)  sample adults  foreign-born       HIV, other    
                              interviewed by adults self-report STDs.c        
                              telephone.     (2004).            
               National       YES. All       YES. 7,364         No. There are 
               Survey of Drug interviewed in foreign-born age   direct        
               Use and Health person.        12 and older and   questions on  
               (NSDUH)                       4,934 foreign-born respondent's  
                                             age 18+            use and sale  
                                             self-report        of drugs like 
                                             (2004).            marijuana and 
                                                                cocaine.      
Potential   Potential      YES. A         YES. A follow-back YES,  not     
follow-back American       follow-back    could specify      highly        
survey      Community      could specify  self-report only.  sensitive.    
               Survey (ACS)   personal       (ACS data include  
               follow-back    interviews     both self-report   
               survey, by the only. (ACS is  data and proxy     
               Census         mixed  mode,   data in which one  
               Bureau-on all  mostly mail.)  member of a        
               or a sample of                household provides 
               all                           responses for      
               foreign-born                  others.)           
               on whom ACS                                      
               data were                                        
               collected                                        

Source: GAO analysis.

aThe CPS includes successive data collections or "waves" to update data
over time, at selected households. In some waves, interviews are conducted
in person; in others, by telephone.

bBased on the core CPS questionnaire. (Different modules or supplements
may be added in particular survey years or CPS waves.)

cHIV refers to human immunodeficiency virus. STDs refers to sexually
transmitted diseases.

The views of immigrant advocates, which were echoed by some other experts,
suggested three additional criteria for a candidate "host" survey:

           1. data collection by a university or private sector organization,

           2. no request for the respondent's name or Social Security number,
           and

           3. protection from possible release of grouped answers survey data
           for small geographic areas (to guard against estimates of the
           undocumented for such areas).

The experts based their views on (1) methodological grounds (foreign-born
respondents would be more likely to cooperate, and to respond truthfully,
if all or some of these criteria were met) and (2) concerns about privacy
protections at the individual or group levels.97 These criteria are
potentially important, in part because the success of a self-report
approach hinges on the cooperation of individual immigrants and, most
likely, also on the support of opinion leaders in immigrant communities.98
With respect to the first criterion above, we note that with the exception
of initial GAO pretests, all tests of the grouped answers approach have
involved data collection by a university or private sector organization.
Without further tests, we do not know whether acceptance would be equally
high in a government-fielded survey.

As shown in table 4, an ACS follow-back would potentially not meet any of
the three criteria based on immigrant advocates' views. Only one survey
(NSDUH) met all three criteria based on immigrant advocates' views-and
because of its sensitive questions on drug use, that survey did not meet
the design-based table 3 criteria.

97With respect to the individual level, Census Bureau staff told us that
they are extremely careful not to disclose information, that such
disclosure is prohibited by law, and that the Census Bureau explains this
to respondents. However, they also said that some respondents erroneously
believe that all government agencies share information with one another or
might do so under certain circumstances.

98We note that the relevance of the criteria in table 4 would likely be
heightened if interior enforcement efforts (that is, those conducted away
from border areas) were to sharply increase.

Table 4: Survey Appropriateness: Whether Surveys Meet Table 3 (Design
Based) Criteria and Additional Criteria Based on Immigrant Advocates'
Views

                                         Three     
                                      additional   
                                       criteria    
                                       based on    
                                       immigrant   
                                      advocates'   
                                         views     
                                                   2. Are                     
                                                   interviews                 
                                                   anonymous                  
                            Meets                  (that is,   3. Is sample   
                            all      1. Does a     no names or too small for  
                            table 3  nongovernment Social      reliable       
                            (design  organization  Security    small-area
               Specific     based)   conduct field numbers are estimates of
Survey type survey       criteria work?         taken)?     undocumented?a
Ongoing     Current      No.      No. The       No. Takes   YES.           
survey      Population            Census Bureau names.      
               Survey (CPS)          conducts                  
                                     field work.b              
               National     No.      No. The       No. Takes   YES.           
               Health                Census Bureau both names  
               Interview             conducts      and Social  
               Survey                field work.c  Security    
               (NHIS)                              numbers.    
               National     No.      YES.          YES.        YES.           
               Survey of                                       
               Drug Use and                                    
               Health                                          
               (NSDUH)                                         
Potential   Potential    YES.     No. Only the  No. Takes   Potentially,   
follow-back American              Census Bureau names in    no. A          
               Community             can conduct   the initial follow-back    
               Survey (ACS)          field work.   survey, and might be       
               follow-back                         a           extremely      
               survey by                           follow-back large. (Also,  
               the Census                          would be    small-area     
               Bureau-on                           based on    releases are   
               all or a                            knowing     not prohibited 
               sample of                           each        by law or      
               foreign-born                        person's    policy.)       
               on whom data                        identity.   
               were                                            
               collected                                       

Source: GAO analysis.

Note: Table 3 criteria are personal interviews; respondent reports on
himself or herself; no highly sensitive direct questions.

aFor this report, we define "small area" as below the county level.

bFor CPS, only the Census Bureau can conduct a follow-back.

cFor NHIS, a follow-back by a private sector organization might be
possible.

In conclusion, we did not find a large-scale survey that would be an
appropriate vehicle for "piggybacking" the grouped answers question
series.

                                  Observations

For more than a decade, the Congress has recognized the need to obtain
reliable information on the immigration status of foreign-born persons
living in the United States-particularly, information on the undocumented
population-to inform decisions about changing immigration law and policy,
evaluate such changes and their effects, and administer relevant federal
programs.

Until now, reliable data on the undocumented population have seemed
impossible to collect. Because of the "question threat" associated with
directly asking about immigration status, the conventional wisdom was that
foreign-born respondents in a large-scale national survey would not accept
such questions-or would not answer them authentically.

Testing So Far Affirms That the Grouped Answers Approach Is Promising

Using the grouped answers approach to ask about immigration status seems
promising because it reduces question threat and is statistically logical.
Additionally, this report has established that

           o  The grouped answers approach is acceptable to most foreign-born
           respondents tested (thus far) in surveys fielded by private sector
           organizations; it is also acceptable-with some conditions, such as
           private sector fielding of the survey-to the immigrant advocates
           and other experts we consulted.
           o  A variety of research designs are available to help check
           whether respondents choose (or intend to choose) the correct box.
           o  The grouped answers approach requires a fairly large number of
           personal interviews with foreign-born persons (we estimate 6,000)
           to achieve reasonably precise indirect estimates of the
           undocumented population overall and within high-risk subgroups.

           However, the most cost-efficient method of fielding a grouped
           answers question series-piggybacking on an existing survey-does
           not seem feasible. Rather, fielding the grouped answers approach
           would require a new survey focused on the foreign-born. This
           raises two new questions about "next steps"-and the answers
           depend, in large part, on policymaker judgments, as described
           below.

           Two New Questions about �Next Steps�
			  
			  Question 1: Are the costs of a new survey justified by information
           needs? DHS stated (in its comments on a draft of this report) that
           the "information on immigration status and the characteristics of
           those immigrants potentially available through this method would
           be useful for evaluating immigration programs and policies." The
           Census Bureau has indicated that information on the undocumented
           would help estimate the total population in intercensal years. And
           an expert reviewer emphasized that a new survey of the
           foreign-born would be likely to help estimate the total
           population.99

           Additionally, policymakers might deem a new survey of the
           foreign-born to be desirable for other reasons than obtaining
           grouped answers data. Notably, an immigration expert who reviewed
           a draft of this report pointed out that a survey focused on the
           foreign-born might provide more in-depth, higher-quality data on
           that population than existing surveys that cover both the
           U.S.-born and foreign born populations. For example, more general
           surveys, such as the ACS and CPS (1) ask a more limited set of
           migration questions than is possible in a survey focused on the
           foreign- born, (2) are not designed with a primary goal of
           maximizing participation by the foreign-born (for example, are not
           conducted by private sector organizations), and (3) as DHS pointed
           out in comments on a draft of this report, may not be designed to
           cover persons who are only temporarily linked to sampled
           households, because such persons may have arrived only recently in
           the United States and are temporarily staying with relatives.100

           A new survey aimed at obtaining grouped answers data on
           immigration status would require roughly 6,000 (or more) personal,
           self-report interviews with foreign-born adults. Other in-person,
           self-report interviews in large-scale surveys have cost $400 to
           $600 each. A major additional cost would be obtaining a
           representative sample of foreign-born persons; this would likely
           require a much larger survey of the general population in which
           "mini-interviews" would screen for households with one or more
           foreign-born individuals.

           We did not study the likely costs of such a data collection or
           options for reducing costs. However, survey costs can be estimated
           (based on, for example, the experience of survey organizations),
           and policymakers can, in future, weigh those costs against the
           information need-keeping in mind the results of research on the
           grouped answers approach, to date, and experts' opinions on
           research needed.

           Question 2: What further tests of the grouped answers method, if
           any, should be conducted before planning and fielding a new
           survey? On one hand, advance testing could

                        o  assess response validity (that is, whether
                        respondents pick-or intend to pick-the correct box)
                        before committing funds for a survey and in time to
                        allow adjustments to the question series;

                        o  further delineate respondent acceptance and
                        explore the impact on acceptance of factors such as
                        government funding-or funding by a particular
                        agency-in order to inform decisions about whether or
                        how to conduct a survey;101 and

                        o  as suggested in DHS's comments on a draft of this
                        report, help determine the cost of a full-scale
                        survey.102

           On the other hand, extensive advance testing would likely delay
           the survey--and may not be needed because

                        o  response validity could be assessed-and respondent
                        acceptance could be further delineated-concurrently
                        with or subsequent to the survey rather than in
                        advance,103

                        o  the need for advance testing of response validity
                        would be lessened if policymakers see a need for more
                        or better survey data on the foreign-born additional
                        to the need for grouped answers data on immigration
                        status (see discussion in question 1, above);

                        o  the value of advance testing would be lessened if
                        changes in immigration law and policy occurred
                        between the time of an advance test and the main
                        survey, because such changes could affect the context
                        in which the survey questions are asked and, hence,
                        change the operant levels of acceptance and validity;
                        and

                        o  survey costs can be estimated-albeit more
                        roughly-on the basis of the experience of survey
                        organizations.

           Given the arguments for and against advance testing, it seems
           appropriate for these to be weighed by policymakers.

           Agency Comments
			  
			  We provided a draft of this report to and received comments from
           the Department of Commerce, the Department of Homeland Security,
           and the Department of Health and Human Services (see appendices
           VII, VIII, and IX, respectively). The Office of Management and
           Budget provided only technical comments, and the Department of
           Labor did not comment.

           The Census Bureau agreed with the report's discussion of

                        o  the grouped answers method, including its
                        strengths and limitations;
                        o  the Census Bureau-GSS evaluation, including the
                        conclusions of the independent consultant (Alan
                        Zaslavsky); and
                        o  the need for a "validity study" to determine
                        whether the grouped answers method can "generate
                        accurate estimates" of the undocumented population.

           The Census Bureau also provided technical comments, which we used
           to clarify the report, as appropriate.

           The Department of Homeland Security stated that the kinds of
           information that the grouped answers approach would provide, if
           successfully implemented, would be useful for evaluating
           immigration programs and policies. DHS further called for pilot
           testing by GAO to assess the reliability of data collection and to
           help estimate the costs of an eventual survey.104 As we indicate
           in the "observations" section of this report, two key decisions
           for policymakers concern

                        o  whether to invest in a new survey and
                        o  whether substantial testing is required in advance
                        of planning and fielding a survey.

           We believe that depending on the answers to these questions,
           another issue-one we cannot address in this report-would concern
           identifying the most appropriate agency for conducting or
           overseeing (1) tests of the grouped answers and (2) an eventual
           survey of the foreign-born population. However, we believe that
           conducting or overseeing such tests or surveys is a management
           responsibility and, accordingly, is not consistent with GAO's role
           or authorities. DHS made other technical comments which we
           incorporated in the report where appropriate.105

           The Department of Health and Human Services (HHS) agreed that the
           NSDUH would not be an appropriate vehicle for a grouped answers
           question series. Commenting on a draft of this report, HHS said
           that the report should include more information on variance
           calculations and on "mirror-image" estimates.106 Therefore, we (1)
           added a footnote illustrating the variance costs of a grouped
           answers estimate relative to a corresponding direct estimate and
           (2) developed appendix VI, which gives the formula for calculating
           the variance of a grouped answers estimate and discusses "mirror
           image" estimates.

           Additionally, HHS said that interviewers should more accurately
           communicate with respondents when presenting the three-box cards.
           We believe that the text of appendix V on informed consent, based
           on our earlier discussions with privacy experts at the Census
           Bureau, deals with this issue appropriately. As we state in
           appendix V, it would be possible to explain to respondents that
           "there will be other interviews in which other respondents will be
           asked about some of the Box B categories or statuses." Finally,
           HHS made other, technical comments, which we incorporated in the
           report, as appropriate.

           The Office of Management and Budget provided technical comments.
           In addition, our discussions with OMB prompted us to re-order some
           of the points in the "observations" section of the report.

           The Department of Labor informed us that it had no substantive or
           technical comments on the draft of the report.

           We are sending copies of this report to the Director of the Census
           Bureau, Secretary of Homeland Security, Secretary of Health and
           Human Services, Secretary of Labor, Director of the Office of
           Management and Budget, and to others who are interested. We will
           also provide copies to others on request. In addition, the report
           will be available at no charge on GAO's Web site at
           http://www.gao.gov .

           If you or your staff have any questions regarding this report,
           please call me at (202) 512-2700. Contact points for our Offices
           of Congressional Relations and Public Affairs may be found on the
           last page of this report. Other key contributors to this
           assignment were Judith A. Droitcour, Assistant Director, Eric M.
           Larson, and Penny Pickett. Statistical support was provided by Sid
           Schwartz, Mark Ramage, and Anna Maria Ortiz.

           Nancy R. Kingsbury, Managing Director Applied Research and Methods

           Appendix I: Scope and Methodology
			  
			  To gain insight into the acceptability of the grouped answers
           approach, we discussed the approach with numerous experts in
           immigration studies and immigration issues, including immigrant
           advocates. Table 5 lists the experts we met with and their
           organizations.

Two New Questions about "Next Steps"

99This expert reviewer told us: "One of the biggest issues surrounding
immigration is the scale of in- and out-migration. The failure to
understand this process is one of the biggest reasons that the population
estimates were so far off at the time of the 2000 census. A survey devoted
to the foreign-born could be especially helpful in ensuring that we have
the best weights [information on population] possible, particularly if the
survey could accurately estimate illegal aliens."

100The ACS defines residence in a household as living there for 2 months
(either completed or ongoing). For a discussion of other quality issues in
the ACS, see Steven A. Camarota and Jeffrey Capizzano, "Assessing the
Quality of Data Collected on the Foreign Born: An Evaluation of the
American Community Survey (ACS): Pilot and Full Study Findings,"
Immigration Studies White Papers, Sabre Systems Inc., April 2004.
http://www.sabresys.com/whitepapers/CIS_whitepaper.pdf (Sept. 6, 2006).

101Potentially, the prospects for private sector funding could be
explored. One question would be whether it is possible to identify a
willing private sector source that is not aligned with a particular
perspective on immigration issues.

102Alternatively, survey costs can be estimated-albeit more roughly-on the
basis of the experience of survey organizations.

103 Validity tests conducted concurrent with the survey and follow-on
checks that compare survey results against (adjusted) administrative
information would seem to be appropriate, if a survey is, in fact,
fielded.

104DHS suggested that the pilot testing be conducted within a limited
geographic area.

105For example, DHS pointed to the issue of an existing survey (the
American Community Survey) defining residence in a household as living
there for 2 months (either completed or ongoing). DHS said this would
likely exclude some unauthorized and temporary migrants and indicated
that, if a new survey needs to be conducted, it should be designed to
cover all foreign-born persons residing here.

106A grouped answers estimate of the percentage of the foreign born who
are undocumented can be defined as the percentage of subsample 1 who are
in Box B, Card 1, minus the percentage of subsample 2 who are in Box A,
Card 2. Alternatively, a grouped answers estimate could be defined as the
percentage of subsample 2 who are in Box B, Card 2, minus the percentage
of subsample 1 who are in Box A, Card 1. If both calculations are
performed and two estimates are derived, they might be termed "mirror
image" estimates.

Table 5: Experts GAO Consulted on Immigration Issues or Immigration
Studies

Name and title                         Organization                        
Steven A. Camarota, Director of        Center for Immigration Studies      
Research                               
Robert Deasy, Director, Liaison and    American Immigration Lawyers        
Information                            Associationa                        
                                          
Crystal Williams, Deputy Director      
J. Traci Hong, Director of Immigration Asian American Justice Centera      
Program                                
                                          
Terry M. Ao, Director of Census and    
Voting Programs                        
Guillermina Jasso, Professor of        New York University                 
Sociology                              
Benjamin E. Johnson, Director of       American Immigration Law            
Policy, Immigration Policy Center      Foundationa                         
John L. (Jack) Martin, Director,       Federation for American Immigration 
Special Projects                       Reform                              
                                          
Julie Kirchner, Deputy Director of     
Government Relations                   
Douglas S. Massey, Professor of        Princeton University                
Sociology and Public Affairs           
Mary Rose Oakar, President             American-Arab Anti-Discrimination   
                                          Committeea                          
Thomas A. Albert, Director of          
Government Relations                   
                                          
Leila Laoudji, Deputy Director of      
Legal Advocacy                         
                                          
Kareem W. Shora, Director, Legal       
Department and Policy                  
Demetrios G. Papademetriou, President  Migration Policy Institute          
Jeffrey S. Passel, Senior Research     Pew Hispanic Center                 
Associate                              
Eric Rodriguez, Director, Policy       National Council of La Razaa        
Analysis Center                        
                                          
Michele L. Waslin, Director,           
Immigration Policy Research            
Helen Hatab Samhan, Executive Director Arab American Institute Foundationa 
James J. Zogby, President              Arab American Institutea            
                                          
Rebecca Abou-Chedid, Government        
Relations and Policy Analyst           
                                          
Nidal M. Ibrahim, Executive Director   

Source: GAO.

Note: Other immigration experts we briefly consulted with by telephone or
e-mail or in conversations at an immigration conference included George
Borjas, Professor of Economics and Public Policy, Harvard University;
Georges Lemaitre, Directorate for Employment, Labour, and Social Affairs,
Organisation for Economic Co-operation and Development, Paris, France;
Enrico Marcelli, Assistant Professor of Economics, University of
Massachusetts at Boston; Randall J. Olson, Director, Center for Human
Resource Research, The Ohio State University; and Michael S. Teitelbaum,
Vice President, Alfred P. Sloan Foundation, New York.

aOrganization advocating for immigrants or expressly dedicated to
representing their views. We call such organizations immigrant advocates,
although some may not, for example, lobby for legislation.

To ensure that we identified immigration experts from varied perspectives,
we consulted Demetrios G. Papademetriou, who is among the immigration
experts listed in table 5, and Michael S. Teitelbaum, Vice President of
the Alfred J. Sloan Foundation. With respect to immigrant advocates, we
sought to include advocates who represented (1) immigrants in general,
without respect to ethnicity; (2) Hispanic immigrants, as these are the
largest group of foreign-born residents; (3) Asian American immigrants, as
these are also a large group; and (4) Arab American immigrants, as these
have been the target of interior (that is, nonborder) enforcement efforts
in recent years.

To determine what the 2004 General Social Survey (GSS) test indicated
about the acceptability of grouped answers questions to foreign-born
respondents and its "generally usability" in large-scale surveys, we
obtained the Census Bureau's report of its analysis of those data, and we
assessed the reliability of the GSS data through a comparison of answers
to interrelated questions. Then we

           o  submitted the Census Bureau's report of its analysis to Dr.
           Alan Zaslavsky, an independent expert, for review;
           o  developed our own analysis of the GSS data and submitted our
           paper describing that analysis to the same expert;1 and
           o  summarized the expert's conclusions and appended his report and
           the Census Bureau's report (reproduced in appendixes III and IV),
           as well summarizing our conclusions.2

We used these procedures to ensure independence, given that the GSS test
was based on our earlier recommendation that the Census Bureau and the
Department of Homeland Security (DHS) test the grouped answers approach.3

1The independent review considered the Census Bureau and GAO analyses of
the GSS data in terms of (1) their overall reasonableness and
thoroughness, given the general objective (describing respondents'
acceptance and understanding), (2) key points of difference (if any)
between the two analyses or differences in conclusions, (3) whether the
analyses raised unanswered questions that should be addressed, and (4)
whether the conclusions appeared to be justified. The reviewer was also
free to comment on other aspects of the analyses.

2We believe this report independently addresses respondent acceptability
because we (1) focus on the results of the GSS test (rather than
critiquing the Census Bureau's work), (2) report how the method performed
rather than subjectively assessing its merit, and (3) relied on an
independent expert.

To describe additional research that might be needed, we outlined the
grouped answers approach and reviewed the main conclusions of the GSS test
in meetings with the immigration experts listed in table 5 and with
private sector statisticians.4 Additionally, we discussed the approach
with various federal officials and staff at agencies responsible for
fielding large-scale surveys.5

To assess the precision of indirect estimates, we addressed questions to
Dr. Zaslavsky, developed illustrative tables showing hypothetical
calculations under specified assumptions, and subjected those tables to
review.

To identify and describe candidate surveys for piggybacking the grouped
answers question series, we set minimum criteria for consideration
(nationally representative, mainly or only in-person interviews, and data
on at least 50,000 persons overall, including native-born and
foreign-born). Then we identified surveys that met those criteria,
collected documents concerning the surveys, and interviewed officials and
staff at federal agencies that sponsored or conducted those surveys. We
also talked with experts in immigration about additional key criteria for
selecting an appropriate survey.

The scope of our work had several limitations. We did not attempt to
collect new data from foreign-born respondents in a survey, focus group,
or other format. We did not assess census or survey coverage of the
foreign-born or undocumented populations.6 We did not assess nonresponse
rates among foreign-born or undocumented persons selected for interview.
We did not review alternative methods of obtaining estimates of the
undocumented.

3DHS contributed to the funding of the Census Bureau's contract with the
National Opinion Research Center (NORC) for the insertion of a module
(question series) into the GSS.

4We consulted with Alan Zaslavsky, Fritz Scheuren, and Mary Grace Kovar.

5In our earlier work, we consulted with numerous other private sector
experts on immigration and statistics. For those experts, see
GAO/GGD-00-30 , p. 29.

While we consulted a number of private sector experts and sought to
include a range of perspectives, other experts may have other views.
Finally, we do not know to what extent the broad range of persons who
compose immigrant communities share the views of the immigrant advocates
we spoke with.

6In 1998, we recommended that the Commissioner of the Immigration and
Naturalization Service (INS) and the Director of the Census Bureau "devise
a plan of joint research for evaluating the quality of census and survey
data on the foreign-born," based on our discussion of the need to evaluate
coverage and possible methods for doing so (see GAO/GGD-98-164 ). This
recommendation is still open. In 2002, Census Bureau staff assumed that 15
to 20 percent of the undocumented were not enumerated in the 1990 census
and stated the belief that coverage of this group improved in the 2000
census. (See Joseph Costanzo and others, "Evaluating Components of
International Migration: The Residual Foreign-Born," Population Division
Working Paper 61, U.S. Census Bureau, Washington, D.C., June 2002, p. 22.)
However, the Census Bureau has not quantitatively estimated the coverage
of either the foreign-born population overall or the undocumented
population.

Appendix II: Estimating Characteristics,
Costs, and Contributions of the Undocumented Population

Key Characteristics Can Be Estimated

Logically, grouped answers data can be used to estimate subgroups of the
undocumented population, using the following procedures:

           1. isolate survey data for (a) the subsample 1 respondents who are
           in the desired subgroup, based on a demographic or other question
           asked in the survey (for example, if the survey included a
           question on each respondent's employment, data could be isolated
           for foreign-born who are employed), and (b) subsample 2
           respondents in that subgroup;

           2. calculate (a) the percentage of the subsample 1 subgroup
           respondents who are in each box of immigration status card 1 and
           (b) the percentage of subsample 2 subgroup respondents who are in
           each box of immigration status card 2; and

           3. carry out the subtraction procedure (percentage in Box B, Card
           1, minus percentage in Box A, Card 2), thus estimating the
           percentage of the subgroup who are undocumented.

The resulting percentage can be multiplied by a census count or an updated
estimate of the foreign-born persons who are in the subgroup (for example,
multiply the estimate of the percentage of employed foreign-born who are
undocumented by the census count or updated estimate of the number of
employed foreign-born).

These steps can be repeated to indirectly estimate the size of the
undocumented population within various subgroups defined by activity,
demographics, and other characteristics (such as those with or without
health insurance) that are asked about in the survey. Without an extremely
large survey, it would be difficult or impossible to derive reliable
estimates for subgroups with few foreign-born persons or few undocumented
persons. Ongoing surveys conducted annually have sometimes combined 2 or 3
years of data in order to provide more reliable estimates of
low-prevalence groups; however, there is a loss of time-specificity.

Some Program Costs Can Be Estimated

Program cost data are sometimes available on an average per-person basis,
and surveys sometimes ask about benefit use. In such cases, the total
costs of a program associated with a certain group can be estimated.
Program costs associated with the undocumented population might be
estimated by either (1) multiplying the estimated numbers of undocumented
persons receiving benefits by average program costs or (2) performing the
following procedures:

           1. Isolate survey data for all foreign-born subsample 1
           respondents who said they were in Box B of Card 1 and estimate
           each individual respondent's program cost.1 Then aggregate the
           individual costs to estimate the total program cost (potentially,
           millions or billions of dollars) associated with the population of
           foreign-born persons defined by the group of immigration statuses
           in Box B, Card 1.

           2. Isolate data for all foreign-born subsample 2 respondents who
           said they were in Box A of Card 2 and, as above, estimate each
           individual respondent's program costs, aggregating these to
           estimate the total program costs associated with the population of
           foreign-born persons defined by the immigration statuses in Box A,
           Card 2 (again, potentially millions or billions of dollars).

           3. Because the only difference between the immigration statuses in
           Box B, Card 1, and Box A, Card 2, is the inclusion of the
           undocumented status in Box B, Card 1, start with the total program
           cost estimate for all Box B, Card 1, respondents and subtract the
           corresponding cost estimate for Box A, Card 2, respondents.

The result of the subtraction procedure represents an indirect estimate of
program costs associated with the undocumented population. A more precise
cost estimate can be obtained by calculating an additional "mirror image"
cost estimate-this time, starting with costs estimated for respondents in
Box B of Card 2 and subtracting costs associated with respondents in Box A
of Card 1. The two "mirror image" estimates could then be averaged.

The key limitations on such procedures are sample size and the
representation of key subgroups-for example, foreign-born respondents
residing in small states and local areas. Thus, for example, it is
possible that state-level costs associated with undocumented persons might
be estimated with reasonable precision for a large state or city with many
foreign-born persons and a relatively high percentage of undocumented
(potentially, California or New York City) but not for many smaller states
or areas, unless very large samples (or samples focused on selected areas
of interest) were drawn. Further work could explore the ways that complex
analyses could be conducted to help delineate costs.

1Estimation of program costs associated with an individual respondent (or
those in very refined subgroups) is sometimes calculated based on a
combination of (1) answers to specific questions (such as whether the
person is attending public school in the school district where he or she
lives or how many emergency room visits he or she made) and (2) separately
available information on program costs per individual (for example, the
per-pupil costs of public education in specific school districts or the
per-visit costs of emergency room care).

Contributions Might Be Estimated

Contributions can be conceptualized as contributions to the economy
through work or, potentially, through taxes paid. Such contributions might
be estimated by combining grouped answers data with other survey questions
to estimate relevant subgroups, such as employed undocumented persons. In
complex analyses, these data could potentially be combined with other data
to help estimate taxes paid.

Logically, Estimates Can Be Made of Undocumented Children

Logically, other quantitative estimates might be obtained through
procedures similar to those outlined above for estimating program costs.
For example, the numbers of children in various immigration statuses might
be estimated by asking an adult respondent how many foreign-born children
(or how many foreign-born school-age children) reside in the household and
then-using the 3-box card assigned to the adult respondent-asking how many
of these children are in Box A, Box B, and Box C.2 We note that, thus far,
testing has not asked respondents to report children's immigration status
with the grouped answers approach.

Other Estimates May Be Possible

If subsamples 1 and 2 are sufficiently large, it might also be possible to
estimate the portion of the undocumented population represented by

           o  "overstays" who were legally admitted to this country for a
           specific authorized period of time but remained here after that
           period expired (without a timely application for extension of stay
           or change of status)3 and
           o  currently undocumented persons who are applicants for legal
           status and are waiting for DHS to approve (or disapprove) their
           application.

2Potentially, based on the location of the responding household, state and
local per-pupil school costs could be obtained. Totaling state and local
school costs for foreign-born children in each box would be followed by a
group-level subtraction. In this way, the costs of schooling undocumented
immigrant children could be estimated-nationally and potentially for key
states-without ever categorizing any child as undocumented and without
ever estimating the number of undocumented children in any school
district.

3See GAO, Overstay Tracking: A Key Component of Homeland Security and a
Layered Defense, GAO-04-82 (Washington, D.C.: May 21, 2004).

To estimate overstays would require a separate question on whether the
respondent had entered the country on a temporary visa. 4 To estimate
undocumented persons with pending applications would require a separate
question concerning pending applications for any form of legal status
(including, for example, applications for U.S. citizenship as well as
applications for legal permanent resident status and other legal
statuses).

The precision of such estimates would depend on factors such as sample
size, the percentages of foreign-born who came in on temporary visas or
who have pending applications of some kind, and the numbers of
undocumented persons within these groups.

4See Judith A. Droitcour and Eric M. Larson, "An Innovative Technique for
Asking Sensitive Questions: The Three-Card Method," Bulletin de
Methodologie Sociologique, 75 (July 2002): 5-23.

Appendix III: A Review of Census Bureau and GAO Reports on the Field Test
of the Grouped Answer Method 

Appendix IV: A Brief Examination of Responses Observed while Testing an
Indirect Method for Obtaining Sensitive Information 

Appendix V: The Issue of Informed Consent 
Appropriately informing each respondent about what information he or she
is being asked to provide is a key issue. On one hand, the grouped answers
approach logically conveys to each respondent exactly what he or she is
being asked to reveal about himself or herself; no one we spoke with
suggested otherwise. On the other hand, the grouped answers question
series does not indicate that the respondent is being asked to participate
in an effort that will result in estimates of all immigration statuses.
Therefore, a statement is needed to convey this information.

Officials and staff at the National Center for Health Statistics (NCHS)
were particularly concerned about this issue and believed that failing to
adequately address informed consent issues could be considered unethical.1

Privacy protection specialists at the Census Bureau said that

           o  An introductory statement before the first immigration-related
           question might be phrased, "The next questions are geared to
           helping us know more about immigration and the role that it plays
           in American life."
           o  When each respondent is shown the 3-box training cards, it
           would be possible to explain to him or her that-while the survey
           does not ask, and does not want to know, the specifics of which
           Box B category applies to him or her-there will be other
           interviews in which other respondents will be asked about some of
           the Box B categories or statuses.2 
           o  Just before showing each respondent the immigration status
           card, it should be stated-and, in fact, interviewers stated in the
           test with Hispanic farmworkers-that "Using the boxes allows us to
           obtain the information we need, without asking you to give us
           information that you might not want to." Further: "Because we're
           using the boxes, we WON'T `zero in' on anything somebody might not
           want to tell us."3

1None of the immigration experts we interviewed raised this issue,
however.

2Thus far, testing has included only one immigration status card, so test
interviewers have not told respondents that other respondents will be
providing information on some of the Box B statuses.

3See GAO/GGD-00-30 .

           o  It may also be possible to explain that the study's goal is to
           allow researchers to broadly estimate all categories or statuses
           on the card for the population of immigrants-but to indicate that
           this will be done without ever asking questions that "zero in" on
           something that some respondents might not want to disclose in an
           interview.
           o  Neither the estimation method (that is, the two cards) nor the
           specific policy relevance of immigration-status estimates would
           have to be described to all respondents. However, interviewer
           statements should be provided for responding to respondents who
           have doubts or questions.

Appendix VI: A Note on Variances and "Mirror Image" Estimates

The statistical expression and variance of a grouped answers estimate is
as follows, with the starting point being the percentage or proportion of
subsample 1 who are in Box B, Card 1, and the procedure being to subtract
from this the proportion of subsample 2 who are in Box A, Card 2 (with
cards and boxes as defined as in figure 3): 1

Grouped answers estimate = p1 - p2. where p1 = the proportion of subsample
1 in Box B, Card 1 p2 = the proportion of subsample 2 in Box A, Card 2

Variance (p1 - p2) = [(p1q1/n1) + (p2q2/n2)] where q1 = 1 - p1 = the
proportion of subsample 1 not in Box B, Card 1 q2 = 1 - p2 = the
proportion of subsample 2 not in Box A, Card 2 n1 and n2 = numbers of
respondents in subsamples 1 and 2, respectively.

The immigration status cards in figure 3 are designed so that Boxes A and
B include all major immigration statuses. This design ensures that, on
each card, the Box B categories apply to the largest possible number of
legally present respondents. In designing the cards this way, we reasoned
that this should reduce the question threat associated with choosing Box
B.

As a result, few respondents are expected to choose Box C ("some other
category not in Box A or Box B"). For example, in the 2004 GSS test, only
one foreign-born respondent of more than 200 chose Box C. Therefore, we
believe that for purposes of illustrative variance calculations, it is
reasonable to assume that no one chooses Box C. Under this assumption, the
two mirror-image estimates of the percentage of the foreign-born who are
undocumented would necessarily be exactly the same, as explained below.

Assuming that no respondent chooses Box C, then q1 = 1 - p1 = the
proportion of subsample 1 in Box A, Card 1 q2 = 1 - p2 = the proportion of
subsample 2 in Box B, Card 2

1For simplicity, the discussion in this appendix assumes simple random
sampling, for both the main sample and the selection of the two
subsamples.

The alternative, mirror-image estimate can then be defined as follows:
Mirror-image estimate = q2 - q1

As indicated above, q1 and q2 are defined in terms of p1 and p2. Using
algebraic substitution, we have: p1 - p2 = (1- q1) - (1- q2) = 1 - 1 - q1
+ q2 = q2 - q1

In other words, under the assumption that no one chooses Box C, the
mirror-image estimates of the percentage undocumented are, by definition,
identical. Thus, no precision gain follows from combining them.2 No
additional information is provided by a second, mirror-image estimate.

In contrast, quantitative indirect estimates are based on a combination of
(1) grouped answers data and (2) additional, separate quantitative data or
estimates (for example, per-person estimates of emergency-visit costs
based on respondent reports of number of emergency room visits in the past
year and other information from hospitals on per-visit costs). If the
quantitative data are tallied or totaled for individuals in each box of
each card, the result is four different figures, none of which can be
derived from the others. (There are different respondents in each box, and
each would have separately reported how many emergency room visits, for
example, he or she made in the past year.) Thus, for quantitative
estimates of this type, calculating two independent mirror-image
estimates, and averaging them, may yield a more precise result.

2Logically, if very few persons choose Box C, the precision gains from
combining the mirror-image estimates (which would necessarily be very
similar to each other) would be very small.

Appendix VII: Comments from the Department of Commerce

Appendix VIII: Comments from the Department of Homeland Security

Appendix IX: Comments from the Department of Health and Human Services

Appendix X: A Appendix X: GAO Contact and Staff Acknowledgments

GAO Contact

Nancy R. Kingsbury, (202) 512-2700 or [email protected].

Staff Acknowledgments

Key GAO staff contributing to this report include Judith A. Droitcour,
Eric M. Larson, and Penny Pickett. Statistical support was provided by Sid
Schwartz, Mark Ramage, and Anna Maria Ortiz.

Bibliography

Bird, Ronald. Statement of Ronald Bird, Chief Economist, Office of the
Assistant Secretary for Policy, U.S. Department of Labor, before the
Committee on the Judiciary, U.S. Senate, July 5, 2006.

Boruch, Robert, and Joe S. Cecil. Assuring the Confidentiality of Social
Research Data. Philadelphia: University of Pennsylvania Press, 1979.

Camarota, Steven A., and Jeffrey Capizzano. "Assessing the Quality of Data
Collected on the Foreign Born: An Evaluation of the American Community
Survey (ACS): Pilot and Full Study Findings," Immigration Studies White
Papers, Sabre Systems Inc., April 2004.
http://www.sabresys.com/whitepapers/CIS_whitepaper.pdf (Sept. 6, 2006).

Costanzo, Joseph, and others, "Evaluating Components of International
Migration: The Residual Foreign-Born," Population Division Working Paper
61, U.S. Census Bureau, Washington, D.C., June 2002, p. 22.

Droitcour, Judith A., and Eric M. Larson, "An Innovative Technique for
Asking Sensitive Questions: The Three-Card Method," Bulletin de
Methodologie Sociologique, 75 (July 2002): 5-23.

El-Badry, Samia, and David A. Swanson, "Providing Special Census
Tabulations to Government Security Agencies in the United States: The Case
of Arab-Americans," paper presented at the 25th International Population
Conference of the International Union for the Scientific Study of
Population, Tours, France, July 18-23, 2005.

Hill, Kenneth. "Estimates of Legal and Unauthorized Foreign-Born
Population for the United States and Selected States Based on Census
2000." Presentation at the U.S. Census Bureau Conference, Immigration
Statistics: Methodology and Data Quality, Alexandria, Virginia, February
13-14, 2006.

Hoefer, Michael, Nancy Rytina, and Christopher Campbell. Estimates of the
Unauthorized Immigrant Population Residing in the United States: January
2005. Washington, D.C.: Department of Homeland Security, Office of
Immigration Statistics, August 2006.

GAO. Undocumented Aliens: Questions Persist about Their Impact on
Hospitals' Uncompensated Care Costs, GAO-04-472. Washington, D.C.: May 21,
2004.

GAO. Illegal Alien Schoolchildren: Issues in Estimating State-by-State
Costs, GAO-04-733 .  Washington, D.C.: June 23, 2004.

GAO. Overstay Tracking: A Key Component of Homeland Security and a Layered
Defense, GAO-04-82. Washington, D.C.: May 21, 2004.

GAO. Record Linkage and Privacy: Issues in Creating New Federal Research
and Statistical Information. GAO-01-126SP. Washington, D.C.: April 2001.

GAO. Survey Methodology: An Innovative Technique for Estimating Sensitive
Survey Items, GAO/GGD-00-30 . Washington, D.C.: November 1999.

GAO. Immigration Statistics: Information Gaps, Quality Issues Limit
Utility of Federal Data to Policymakers, GAO/GGD-98-164. Washington, D.C.:
July 31, 1998.

Greenberg, Bernard G., and others. "The Unrelated Questions Randomized
Response Model: Theoretical Framework." Journal of the American
Statistical Association, 64 (1969): 520-39.

Kincannon, Charles Louis, "Procedures for Providing Assistance to
Requestors for Special Data Products Known as Special Tabulations and
Extracts," memorandum to Associate Directors, Division Chiefs, Bureau of
the Census, Washington, D.C., August 26, 2004.

Locander, William, and others. "An Investigation of Interview Method,
Threat, and Response Distortion." Journal of the American Statistical
Association, 71 (1976): 269-75.

National Research Council, Committee on National Statistics, Local Fiscal
Effects of Illegal Immigration: Report of a Workshop. Washington, D.C.:
National Academy Press, 1996.

Passel, Jeffrey S. "The Size and Characteristics of the Unauthorized
Migrant Population in the U.S.: Estimates Based on the March 2005 Current
Population Survey." Research Report. Washington, D.C.: Pew Hispanic
Center, March 7, 2006.

Passel, Jeffrey S., Rebecca L. Clark, and Michael Fix. "Naturalization and
Other Current Issues in U.S. Immigration: Intersections of Data and
Policy," In Proceedings of the Social Statistics Section of the American
Statistical Association: 1997. Alexandria, Va.: American Statistical
Association, 1997.

Robinson, J. Gregory. "Memorandum for Donna Kostanich." DSSD A.C.E.
Revision II Memorandum Series No. PP-36. Washington, D.C.: U.S. Bureau of
the Census, December 31, 2002.

Rytina, Nancy F. Estimates of the Legal Permanent Resident Population and
Population Eligible to Naturalize in 2004. Washington, D.C.: Department of
Homeland Security, Office of Immigration Statistics, February 2006.

Schryock, Henry S., and Jacob S. Siegel and Associates. The Methods and
Materials of Demography. Washington, D.C.: U.S. Government Printing
Office, 1980.

Siegel, Jacob S., and David A. Swanson. The Methods and Materials of
Demography, 2nd ed. San Diego, Calif.: Elsevier Academic Press, 2004.

U.S. Census Bureau, "The U.S. Census Bureau's Intercensal Population
Estimates and Projections Program: Basic Underlying Principles," paper
distributed by the Census Bureau at its conference on Population
Estimates: Meeting User Needs, Alexandria, Virginia, July 19, 2006.

U.S. Commission on Immigration Reform. U.S. Immigration Policy: Restoring
Credibility: 1994 Report to Congress. Washington, D.C.: U.S. Government
Printing Office, 1994.

U.S. Immigration and Naturalization Service, Office of Policy and
Planning. Estimates of the Unauthorized Immigrant Population Residing in
the United States: 1990 to 2000. Washington, D.C.: January 2003.

U.S. Department of Labor, Findings from the National Agricultural Workers
Survey (NAWS) 2000-2002: A Demographic and Employment Profile of United
States Farm Workers. Research Report 9. Washington, D.C.: March 2005.

Warner, Stanley. "Randomized Response: A Survey Technique for Eliminating
Evasive Answer Bias." Journal of the American Statistical Association, 60
(1995): 63-69.

Warren, Robert, and Jeffrey S. Passel. "A Count of the Uncountable:
Estimates of Undocumented Aliens Counted in the 1980 Census." Demography,
24:3 (1987): 375-93.

(460577)

GAO's Mission

The Government Accountability Office, the audit, evaluation and
investigative arm of Congress, exists to support Congress in meeting its
constitutional responsibilities and to help improve the performance and
accountability of the federal government for the American people. GAO
examines the use of public funds; evaluates federal programs and policies;
and provides analyses, recommendations, and other assistance to help
Congress make informed oversight, policy, and funding decisions. GAO's
commitment to good government is reflected in its core values of
accountability, integrity, and reliability.

Obtaining Copies of GAO Reports and Testimony

The fastest and easiest way to obtain copies of GAO documents at no cost
is through GAO's Web site ( www.gao.gov ). Each weekday, GAO posts newly
released reports, testimony, and correspondence on its Web site. To have
GAO e-mail you a list of newly posted products every afternoon, go to
www.gao.gov and select "Subscribe to Updates."

Order by Mail or Phone

The first copy of each printed report is free. Additional copies are $2
each. A check or money order should be made out to the Superintendent of
Documents. GAO also accepts VISA and Mastercard. Orders for 100 or more
copies mailed to a single address are discounted 25 percent. Orders should
be sent to:

U.S. Government Accountability Office 441 G Street NW, Room LM Washington,
D.C. 20548

To order by Phone: Voice: (202) 512-6000 TDD: (202) 512-2537 Fax: (202)
512-6061

To Report Fraud, Waste, and Abuse in Federal Programs

Contact:

Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: [email protected]
Automated answering system: (800) 424-5454 or (202) 512-7470

Congressional Relations

Gloria Jarmon, Managing Director, [email protected] (202) 512-4400 U.S.
Government Accountability Office, 441 G Street NW, Room 7125 Washington,
D.C. 20548

Public Affairs

Paul Anderson, Managing Director, [email protected] (202) 512-4800 U.S.
Government Accountability Office, 441 G Street NW, Room 7149 Washington,
D.C. 20548

transparent illustrator graphic

www.gao.gov/cgi-bin/getrpt? GAO-06-775 .

To view the full product, including the scope
and methodology, click on the link above.

For more information, contact Nancy R. Kingsbury at (202) 512-2700 or
[email protected].

Highlights of GAO-06-775 , a report to the Subcommittee on Terrorism,
Technology and Homeland Security, Committee on the Judiciary, U.S. Senate

September 2006

ESTIMATINGTHE UNDOCUMENTED POPULATION

A "Grouped Answers" Approach to Surveying Foreign-Born Respondents

As greater numbers of foreign-born persons enter, live, and work in the
United States, policymakers need more information-particularly on the
undocumented population, its size, characteristics, costs, and
contributions. This report reviews the ongoing development of a potential
method for obtaining such information: the "grouped answers" approach. In
1998, GAO devised the approach and recommended further study. In response,
the Census Bureau tested respondent acceptance and recently reported
results.

GAO answers four questions. (1) Is the grouped answers approach acceptable
for use in a national survey of the foreign-born? (2) What further
research may be needed? (3) How large a survey is needed? (4) Are any
ongoing surveys appropriate for inserting a grouped answers question
series (to avoid the cost of a new survey)?

For this study, GAO consulted an independent statistician and other
experts, performed test calculations, obtained documents, and interviewed
officials and staff at federal agencies.

The Census Bureau and DHS agreed with the main findings of this report.
DHHS agreed that the National Survey of Drug Use and Health is not an
appropriate survey for inserting a grouped answers question series.

What GAO RecommendsGAO makes no new recommendations in this report.

The grouped answers approach is designed to ask foreign-born respondents
about their immigration status in a personal-interview survey. Immigration
statuses are grouped in Boxes A, B, and C on two different flash
cards-with the undocumented status in Box B. Respondents are asked to pick
the box that includes their current status and are told, "If it's in Box
B, we don't want to know which specific category applies to you."

A random half of respondents are shown the card on the left of the figure
(Card 1), resulting in estimates of the percentage of the foreign-born
population who are in each box of that card. The other half of the
respondents are shown the card on the right, resulting in corresponding
estimates for slightly different boxes. (No one sees both cards.) The
percentage undocumented is estimated by subtraction: The percentage of the
foreign-born who are in Box B of one card minus the percentage who are in
Box A of the other card.

Immigration Status Cards 1 and 2

The grouped answers approach is acceptable to many experts and immigrant
advocates-with certain conditions, such as (for some advocates) private
sector data collection.

Most respondents tested did not object to picking a box. Research is
needed to assess issues such as whether respondents pick the correct box.
A sizable survey-roughly 6,000 or more respondents-would be needed for 95
percent confidence and a margin of error of (plus or minus) 3 percentage
points. The ongoing surveys that GAO identified are not appropriate for
collecting data on immigration status. (For example, one survey takes
names and Social Security numbers, which might affect acceptance of
immigration status questions.) Whether further research or implementation
in a new survey would be justified depends on how policymakers weigh the
need for such information against potential costs and the uncertainties of
future research.
*** End of document. ***