Managing for Results: Analytic Challenges in Measuring Performance

Managing for Results: Analytic Challenges in Measuring Performance
(Letter Report, 05/30/97, GAO/HEHS/GGD-97-138).

Pursuant to a legislative requirement, GAO reviewed the implementation
of the Government Performance and Results Act's (GPRA) requirements in
the pilot phase, focusing on: (1) the analytic and technical challenges
agencies are experiencing as they try to measure program performance;
(2) the approaches they have taken to address these challenges; and (3)
how the agencies have made use of program evaluations or evaluation
expertise in implementing performance measurement.

GAO noted that: (1) the programs included in GAO's review encountered a
wide range of serious challenges; (2) 93 percent of the officials GAO
surveyed reported at least one as a great or very great challenge, and
some were not very far along in implementing the steps required by the
Results Act; (3) eight of the 10 tasks rated most challenging emerged in
the two relatively early stages of the performance measurement process,
identifying goals and developing performance measures; (4) in developing
both goals and performance measures, respondents found it difficult to
move beyond a summary of their program's activities, such as the number
of clients served, to distinguish the desired outcome or result of those
activities; (5) sometimes selecting an outcome measure was impeded,
instead, by conflicting stakeholder views of the program's intended
results or by anticipated data collection problems; (6) issues in the
data collection stage were rated as less serious and revolved around the
programs' lack of control over data that third parties collected, but
programs may have avoided some data issues through selection of measures
for which data already existed; (7) the greatest challenge in the
analysis and reporting stage was separating a program's impact on its
objectives from the impact of external factors, primarily because many
federal programs' objectives are the result of complex systems or
phenomena outside the program's control; (8) in such cases, it is
particularly challenging for agencies to confidently attribute changes
in outcomes to their program, the central task of program impact
evaluation; (9) the programs GAO reviewed had applied a range of
analytic and other strategies to address these challenges; (10) because
they had either volunteered to be GPRA pilots or had already begun
implementing performance measurement, the programs included in GAO's
review were likely to be better suited or prepared for conducting
performance measurement than most federal programs; and (11) the
challenges experienced by the projects that are pilot testing the Act's
requirements suggest that: (a) more typical federal programs may find
performance measurement to be an even greater challenge, particularly if
they do not have access to program evaluation or other technical
resources; and (b) full-scale implementation will require several
iterations to develop valid, reliable, and useful performance reporting*

--------------------------- Indexing Terms -----------------------------

 REPORTNUM:  HEHS/GGD-97-138
     TITLE:  Managing for Results: Analytic Challenges in Measuring 
             Performance
      DATE:  05/30/97
   SUBJECT:  Data collection
             Program evaluation
             Congressional/executive relations
             Strategic planning
             Reporting requirements
             Agency missions
             Federal legislation

             
**************************************************************************
* This file contains an ASCII representation of the text of a GAO        *
* report.  Delineations within the text indicating chapter titles,       *
* headings, and bullets are preserved.  Major divisions and subdivisions *
* of the text, such as Chapters, Sections, and Appendixes, are           *
* identified by double and single lines.  The numbers on the right end   *
* of these lines indicate the position of each of the subsections in the *
* document outline.  These numbers do NOT correspond with the page       *
* numbers of the printed product.                                        *
*                                                                        *
* No attempt has been made to display graphic images, although figure    *
* captions are reproduced. Tables are included, but may not resemble     *
* those in the printed version.                                          *
*                                                                        *
* A printed copy of this report may be obtained from the GAO Document    *
* Distribution Facility by calling (202) 512-6000, by faxing your        *
* request to (301) 258-4066, or by writing to P.O. Box 6015,             *
* Gaithersburg, MD 20884-6015. We are unable to accept electronic orders *
* for printed documents at this time.                                    *
**************************************************************************


Cover
================================================================ COVER


Report to Congressional Committees

May 1997

MANAGING FOR RESULTS - ANALYTIC
CHALLENGES IN MEASURING
PERFORMANCE

GAO/HEHS/GGD-97-138

GPRA Analytic Challenges

(973806)


Abbreviations
=============================================================== ABBREV

  GPRA - Government Performance and Results Act of 1993
  OMB - Office of Management and Budget

Letter
=============================================================== LETTER


B-276736

May 30, 1997

The Honorable Fred Thompson
Chairman
The Honorable John Glenn
Ranking Minority Member
Committee on Governmental Affairs
United States Senate

The Honorable Dan Burton
Chairman
The Honorable Henry A.  Waxman
Ranking Minority Member
Committee on Government Reform and Oversight
House of Representatives

Seeking to promote improved government performance and greater public
confidence in government through better planning and reporting of the
results of federal programs, the Congress enacted the Government
Performance and Results Act of 1993 (GPRA), which is referred to as
"the Results Act" and "GPRA." The Act established a governmentwide
requirement for agencies to identify agency and program goals and to
report on their results in achieving those goals.  Recognizing that
few programs at the time were prepared to track progress toward their
goals, the Act specifies a 7-year implementation time period and
requires the Office of Management and Budget (OMB) to select pilot
tests to help agencies develop experience with the Act's processes
and concepts.  The Results Act includes a pilot phase during which
about 70 programs, ranging from the U.S.  Geological Survey's
National Water Quality Assessment Program to the entire Social
Security Administration, were designated as GPRA pilot projects. 
These and other programs throughout the major agencies have been
gaining experience with the Act's requirements.  GPRA mandates that
we review the implementation of the Act's requirements in this pilot
phase and comment on the prospects for compliance by federal agencies
as governmentwide implementation begins in 1997.  This report is one
component of our response to that mandate.  Specifically, this report
answers the following questions:  (1) What analytic and technical
challenges are agencies experiencing as they try to measure program
performance?  (2) What approaches have they taken to address these
challenges?  And, in particular, because program evaluation studies
are similarly focused on measuring progress toward program goals and
objectives, (3) How have agencies made use of program evaluations or
evaluation expertise in implementing performance measurement? 
Indeed, the Act recognizes and encourages a complementary role for
program evaluation by requiring agencies to describe its use in
performance planning and reporting. 

To obtain this information, we conducted structured interviews with
program officials in 20 departments and major agencies with
experience in performance measurement.  Generally, in each agency, we
selected one official GPRA pilot program and one other program that
had begun to measure program performance.  We selected programs to
represent diversity in program purpose, size, and other factors that
we thought might affect their experience.  For each program, we
attempted to interview both the program official responsible for
performance measures and a program evaluator or other analyst who had
assisted in this effort.  Since no evaluator was identified in some
programs, while in others, the evaluator was the person responsible
for the performance measurement effort, we conducted 68 structured
interviews with officials from 40 programs.  We asked program
officials to rate the difficulty of challenges or tasks at each of
four stages in the performance measurement process that we defined
for the purposes of this review: 

  identifying goals:  specifying long-term strategic goals and annual
     performance goals that include the outcomes of program
     activities;

  developing performance measures:  selecting measures to assess
     programs' progress in achieving their goals or intended
     outcomes;

  collecting data:  planning and implementing the collection and
     validation of data on the performance measures; and

  analyzing data and reporting results:  comparing program
     performance data with the annual performance goals and reporting
     the results to agency and congressional decisionmakers. 

Then, for each stage, we asked program officials to describe how they
approached their most difficult challenge and whether and how they
used prior studies and technical staff.  A more complete description
of the scope of this review is included in appendix I. 


   RESULTS IN BRIEF
------------------------------------------------------------ Letter :1

The programs included in our review encountered a wide range of
serious challenges--93 percent of the officials we surveyed reported
at least one as a great or very great challenge.  In addition, some
were not very far along in implementing the steps required by the
Results Act.  Eight of the 10 tasks rated most challenging emerged in
the two relatively early stages of the performance measurement
process:  identifying goals and developing performance measures.  For
example, in the stage of identifying goals, respondents found it
particularly difficult to translate long-term strategic goals into
annual performance goals.  This was often because the program had a
long-term mission that made it difficult to predict the level of
results that might be achieved on an annual basis. 

In developing both goals and performance measures, respondents found
it difficult to move beyond a summary of their program's
activities--such as the number of clients served--to distinguish the
desired outcome or result of those activities--such as the improved
health of the individuals served or the community at large.  For
some, the concept of "outcome" was unfamiliar and difficult
especially for program officials focused on day-to-day activities. 
Sometimes selecting an outcome measure was impeded, instead, by
conflicting stakeholder views of the program's intended results or by
anticipated data collection problems.  Issues in the data collection
stage were rated as less serious and revolved around the programs'
lack of control over data that third parties collected, but programs
may have avoided some data issues through selection of measures for
which data already existed. 

The greatest challenge in the analysis and reporting stage was
separating a program's impact on its objectives from the impact of
external factors, primarily because many federal programs' objectives
are the result of complex systems or phenomena outside the program's
control.  In such cases, it is particularly challenging for agencies
to confidently attribute changes in outcomes to their program--the
central task of program impact evaluation.  Although the Act does not
require impact evaluations, it does require programs to measure
progress toward achieving their goals and explain why a performance
goal was not met.  Because they recognized that simple examination of
outcome measures would not accurately reflect their program's
performance, many of the respondents believed that they ought to
separate the influence of other factors on their program's goals in
order to establish program impact. 

The programs we reviewed had applied a range of analytic and other
strategies to address these challenges.  To overcome uncertainties in
formulating performance goals that were achievable on an annual
basis, some programs had adopted a multiyear planning horizon for
their performance goals, while others had modified their annual goals
to target more proximate ones over which they had more control.  A
wide variety of approaches was used to help define performance
measures, including developing a model of the relationships between
federal, state, and local government activities to identify the
uniquely federal role.  Programs that found reliance on others' data
as their greatest data collection challenge tended to either
introduce data verification procedures or search for alternative data
sources.  The programs employed several different approaches to
attempt to isolate a program's impact from other influences,
including conducting special studies and monitoring external factors
at the subnational level, where their influence was easier to
observe.  Overall, the programs we reviewed had somewhat more
difficulty in resolving their most difficult challenges related to
selecting measures and analyzing performance than in identifying
goals and collecting data; they were less likely to have developed an
approach to meeting these challenges, and they reported less
confidence in the approaches they had developed. 

Because they had either volunteered to be GPRA pilots or had already
begun implementing performance measurement, the programs included in
our review were likely to be better suited or prepared for conducting
performance measurement than most federal programs.  In addition,
they had the advantage of technical resources:  half of these
programs had been the subject of previous evaluations, and almost all
had access to staff trained or experienced in performance measurement
or program evaluation.  Most of our respondents found this assistance
helpful, and many said they could have used more such assistance. 
For example, an evaluator assisting one program adapted a data
collection instrument from a prior study to collect data on outcomes
that were considered difficult to measure.  Also, an administrator
trained in evaluation methods, faced with program outcomes known to
be subject to external influences, developed a series of outcome
measures and looked at the similarity of results across them to
assess program performance. 

The challenges experienced by the projects that are pilot testing the
Act's requirements suggest that (1) more typical federal programs may
find performance measurement to be an even greater challenge,
particularly if they do not have access to program evaluation or
other technical resources; and (2) full-scale implementation will
require several iterations to develop valid, reliable, and useful
performance reporting systems.  In addition, in cases in which
factors outside the program's control are acknowledged to have
significant influence on key program results, it may be important to
supplement performance measure data with impact evaluation studies to
provide an accurate picture of program effectiveness. 


   BACKGROUND
------------------------------------------------------------ Letter :2

The Results Act seeks to improve the efficiency, effectiveness, and
public accountability of federal agencies as well as to improve
congressional decision-making.  It aims to do so by promoting a focus
on program results and providing the Congress with more objective
information on the achievement of statutory objectives.  The Act
outlines a series of steps whereby agencies are required to identify
their goals, measure performance, and report on the degree to which
those goals were met.  The Act requires executive branch agencies to
develop, by the end of fiscal year 1997, a strategic plan and to
submit their first annual performance plan to OMB in the fall of
1997.  Starting in March of the year 2000, each agency is to submit a
report comparing its performance for the previous fiscal year with
the goals in its annual performance plan.  However, OMB also asked
all agencies to include performance measures, if available, with
their budget requests for fiscal year 1998 in order to encourage
planning for meeting the Act's requirements.  (App.  II describes the
Act's requirements in more detail.) For the purpose of this review,
we identified four stages in the performance measurement process to
represent the analytic tasks involved in producing these documents. 
Figure 1 depicts the correspondence between these stages and the
Act's requirements. 

   Figure 1:  A Comparison of Our
   Four Stages of the Performance
   Measurement Process With GPRA
   Requirements

   (See figure in printed
   edition.)

In the past, some agencies have conducted program evaluations to
provide information to program managers and the Congress about
whether a program is working well or poorly, and why.  Most
evaluations of program effectiveness, or program impact, include the
basic planning and analysis steps that the Act requires agencies to
take:  defining and clarifying program goals and objectives,
developing measures of program outcomes, and collecting and analyzing
data to draw conclusions about program results.  However, program
impact evaluation goes further to establish the causal connection
between outcomes and program activities, separate out the influence
of extraneous factors, develop explanations for why those outcomes
occurred, and thus isolate the program's contribution to those
changes.  Thus, where programs are expected to produce changes as a
result of program activities, such as job placement activities for
welfare recipients, outcome measures can tell whether the welfare
caseload decreased.  However, a systematic evaluation of a program's
impact would be needed to assess how much of the observed change was
due to an improved economy or to the program.  In addition, a
systematic evaluation of how a program was implemented can provide
important information about why a program did or did not succeed and
suggest ways to improve it.  However, because the tasks involved
raise technical and logistical challenges, evaluating program impact
generally requires a planned study and, frequently, considerable time
and expense. 

The Results Act recognizes the complementary nature of performance
measurement and program evaluation, requiring a description of
previous program evaluations used and a schedule for future program
evaluations in the strategic plan, and a summary of program
evaluation findings in the annual performance report.  In addition,
because of the similarities between performance measurement and
program evaluation, we expected that experience with or access to
expertise in program evaluation would assist agencies in addressing
the challenges of performance measurement.  Therefore, we included in
our survey programs other than the official GPRA pilots that were
said to have had experience in measuring program results and that may
have had program evaluation experience.  In addition, we interviewed
program officials responsible for performance measurement and program
evaluators or other analysts who had assisted in this effort, if
available, and we asked whether prior studies or technical staff had
been involved in the various performance measurement tasks. 


   AGENCIES ARE STILL IN EARLY
   IMPLEMENTATION PHASE OF
   PERFORMANCE MEASUREMENT
------------------------------------------------------------ Letter :3

Despite having volunteered to begin measuring program performance,
most of the programs we reviewed had not yet gone through all the
steps of the performance measurement process.  Almost all our
respondents (over 96 percent) reported that their programs had begun
the first three stages of performance measurement, and 85 percent had
started data analysis and reporting.  But only about 27 percent had
actually completed all four stages (see table 1).  Overall, programs
were furthest along with the stage of identifying goals, and least
with the reporting stage, but they did not, of course, need to
"complete" one stage before starting another, because performance
measurement is recognized to be an iterative process in which
measures will be improved over time.  For example, if data are
unavailable for the annual performance report, agencies are permitted
to provide whatever data are available, with a notation as to their
incomplete status, and to provide the data in subsequent reports. 



                                         Table 1
                         
                         Percentage of Respondents Reporting That
                              Their Programs Have Completed
                         Performance Measurement Stages (for the
                           Total Sample and Selected Subgroups)

                                                                  Analyzing  Completed at
                                     Developing                    data and     least one
Program             Identifying     performance    Collecting     reporting  round of all
characteristic            goals        measures          data       results   four stages
-----------------  ------------  --------------  ------------  ------------  ------------
Total sample                66%             57%           54%           53%           27%

Program purpose
-----------------------------------------------------------------------------------------
Provide services             64              59            54            49            26
 or military
 defense
Develop                      65              65            60            60            37
 information
Administer                   78              33            44            56            11
 regulations

GPRA status
-----------------------------------------------------------------------------------------
Official pilot               87              67            60            70            38
Other                        50              50            50            40            19

Annual budget
-----------------------------------------------------------------------------------------
Less than $100               77              62            77            62            42
 million
Between $100                 59              48            41            48            15
 million and $1
 billion
Greater than $1              64              64            50            46            29
 billion

Locus of control
-----------------------------------------------------------------------------------------
Federal                      70              62            50            68            30
State                        67              57            52            47            18
Local or                     89              56            90            73            36
 quasigovernmental
 organization
-----------------------------------------------------------------------------------------
Regulatory programs were far behind in completing at least one round
of all four stages (11 percent), apparently because of their
difficulty with specifying performance measures and data collection. 
Official GPRA pilots were twice as likely to have gone through all
four stages as other programs (38 percent and 19 percent,
respectively), in part because they were much further along in goal
identification than the other programs (87 percent compared with 50
percent).  Staff from smaller programs reported their programs were
much further along (42 percent had completed all four stages) and
were more likely to have completed at least one reporting cycle than
larger programs.  This could stem partly from the fact that most of
the small programs in our sample were GPRA pilots (85 percent).  As
such, many would have already submitted to OMB both an annual
performance plan and an annual performance report.  However, the
small programs as a whole were also more likely to have completed
data collection than the GPRA pilots as a group (77 percent compared
with 60 percent).  In general, little difference in progress was seen
between state- and federally administered programs across the first
three stages, but state-administered programs were not as far along
in analysis and reporting, or in completing a full cycle of the
process, as programs run at either the federal or local level. 
Differences in progress among programs with different funding sources
were inconsistent. 


   PROGRAMS' GREATEST CHALLENGES
   GENERALLY CAME IN THE EARLY
   STAGES OF IMPLEMENTING
   PERFORMANCE MEASUREMENT
------------------------------------------------------------ Letter :4

Almost all of the programs included in our review encountered serious
challenges--93 percent of our respondents rated at least 1 of 30
potential challenges as a great or very great challenge.  Most
respondents (74 percent) identified a great challenge in the stage of
identifying goals; 69 percent identified at least one in the stage of
developing performance measures.  Fewer reported encountering a great
challenge in the later stages of data collection and reporting
results (50 and 34 percent, respectively). 

To indirectly assess which of our four stages of performance
measurement--identifying goals, developing measures, collecting data,
or analyzing and reporting results--provided the most difficult
challenges for these agencies, we rank-ordered each of 30 potential
challenges by respondents' mean ratings of their difficulty.  We
found 8 of the 10 challenges with the highest mean ratings among the
two early, relatively conceptual stages of specifying the program's
goals--especially as the outcomes or results of program
activities--and selecting objective, quantifiable measures of them
(see table 3).  Three challenges pertained to the stage of
identifying goals and five to developing measures.  Issues in the two
later stages of data collection and analysis were generally rated
less challenging except for two items--ascertaining the accuracy and
quality of performance data and separating a program's impact on its
objectives from the impact of external factors--which, although not
specifically required by the Act, is often needed to confidently
attribute results to the program.  (In this and subsequent tables,
the number of valid cases reflects those that had begun that
performance measurement stage and experienced the challenge.)



                          Table 2
          
           The Performance Measurement Stage and
           Mean Rating of the 10 Challenges Rated
               Most Difficult by Respondents

                                            Mean     Valid
Analytic stage  Challenge               rating\a     cases
--------------  ----------------------  --------  --------
Identifying     Translating general,        3.36        59
 goals           long-term strategic
                 goals to more
                 specific, annual
                 performance goals and
                 objectives
                Distinguishing between      3.27        63
                 outputs and outcomes
                Specifying how the          3.20        61
                 program's operations
                 will produce the
                 desired outputs and
                 outcomes
Developing      Getting beyond program      3.52        65
 performance     outputs--that is,
 measures        summaries of program
                 activities--to
                 develop outcome
                 measures of the
                 results of those
                 activities
                Specifying                  3.25        65
                 quantifiable, readily
                 measurable
                 performance
                 indicators
                Developing interim or       3.09        54
                 alternative measures
                 for program effects
                 that may not show up
                 for several years
                Estimating a                3.03        60
                 reasonable level for
                 expected performance
                Defining common,            2.96        46
                 national performance
                 measures for
                 decentralized
                 programs
Collecting      Ascertaining the            2.92        60
 data            accuracy of and
                 quality of
                 performance data
Analyzing data  Separating the impact       3.11        45
 and reporting   of the program from
 results         the impact of other
                 factors external to
                 it
----------------------------------------------------------
\a On a scale of 1 ("little or no challenge") to 5 ("a very great
challenge"). 

In most programs, respondents rated the same general mix of problems
as their most difficult, except for the regulatory programs, for
which three of their five greatest challenges came from the later two
stages.  The problem these regulatory programs ranked as most
difficult was separating the impact of the program on its objectives
from the impact of external factors.  They also reported difficulty
with ascertaining the accuracy and quality of performance data and
with acquiring the exact data wanted and in the form desired.  This
might be explained by these programs' reliance on the regulated
parties themselves to provide data on their own level of compliance. 

Across all stages, the official pilots rated the potential challenges
we posed as less difficult, on the average, than did the other
programs.  Pilots also included two challenges from later stages
among their top five most difficult--separating the impact of the
program from that of external factors and using data collected by
others--while the other programs did not.  We do not know whether
this may have been influenced by the pilots' greater experience than
the other programs with a full reporting cycle. 


      LONG-TERM MISSIONS, RARE
      EVENTS, AND DIFFICULTIES IN
      CONCEPTUALIZING OUTCOMES
      MADE SPECIFYING ANNUAL GOALS
      DIFFICULT
---------------------------------------------------------- Letter :4.1

Considering first the challenges in the stage of identifying goals,
the three greatest challenges were (1) translating general, long-term
strategic goals to more specific, annual performance goals and
objectives; (2) distinguishing between outputs and outcomes; and (3)
specifying how the programs' operations would produce the desired
outputs and outcomes (see table 3).\1 About twice as many respondents
rated these as great or very great challenges compared to reducing
the program to a few broad, general goals. 



                          Table 3
          
            Respondents' Ratings of the Level of
          Difficulty Posed by Potential Challenges
                    in Identifying Goals


                            Percentage
                        rating this as      Mean
                          a great or a  challeng
                            very great         e     Valid
Potential challenge          challenge  rating\a     cases
----------------------  --------------  --------  --------
Translating general,                49      3.36        59
 long-term strategic
 goals to more
 specific, annual
 performance goals and
 objectives
Distinguishing between              46      3.27        63
 outputs and outcomes
Specifying how the                  44      3.20        61
 program's operations
 will produce the
 desired outputs and
 outcomes
Reconciling                         25      2.40        60
 potentially
 conflicting goals
Reducing the program                23      2.74        62
 to a few broad,
 general goals
Accommodating state or              18      2.79        38
 local goals and
 objectives
Identifying critical                19      2.48        58
 external factors
Specifying objectives               15      2.30        53
 for the entire
 program rather than
 just certain parts of
 it
Distinguishing this                 13      2.14        56
 program's goals from
 those of related
 programs
----------------------------------------------------------
\a On a scale of 1 ("little or no challenge") to 5 ("a very great
challenge"). 

In identifying goals (and performance measures), respondents found it
difficult to respond to the Act's encouragement for agencies to move
beyond summarizing their program's activities--such as measuring the
number of clients served-- to distinguishing the desired outcome or
result of those activities--such as improving the health of the
individuals served or the community at large.  Some of our
respondents explained that translating strategic goals for long-term
missions--such as supporting basic science--into annual goals was
particularly difficult because annual goals tend to be artificial and
hard to analyze given the unpredictable nature of scientific
progress.  Others reported that the constantly changing nature of
their target--for example, a developing business sector or newly
democratizing country--made annual, linear progress unlikely.  There
were also managerial, process issues cited.  As one respondent said,
"It is easier to get agreement on long-term goals, but once you begin
to break them down into annual objectives and specify how you will
achieve them, you get into disagreement over priorities, approaches,
and roles."\2

Distinguishing between outputs and outcomes was found to be a
challenge for several reasons.  First, some struggled with the basic
meaning of the concept of outcome.  One respondent noted that OMB's
definition of "outcome" varied from one set of guidance to the next. 
Another reported that the program's administrators still believed
that regulations were the outcomes and that whatever happened after a
new regulation was issued was beyond their control.  Different
administrators, staff, and stakeholders defined outcomes in multiple
ways and by their regional or national context. 

Second, some argued that the nature of their missions made it hard to
develop a measurable outcome.  For example, when the goal was to
prevent a rare event, such as a flood or presidential assassination
attempt, the fact that it did not occur is hard to attribute to a
particular function.  Similarly, some outcomes, like battles won, may
not be observed in a given year.  Thus, it may be conceptually more
difficult to define outcomes for prevention, deterrence, and other
programs that respond to rare events. 

Third, in addition to conceptual challenges, there were
administrative obstacles.  One respondent reported that because
several states had been developing their own outcome measures for
their program for some time, they had sunk costs in their existing
information systems.  Thus, they were opposed to standardizing the
measures solely so that federal administrators could come up with a
new, common measure. 

Respondents who said that their most difficult problem in identifying
goals was specifying how program operations would produce outputs and
outcomes did not report anything inherently difficult in building
logic models for programs.  Rather, they cited many of the other
potential challenges as factors that impeded this planning step, such
as the role of external factors, the unpredictability of prevention
outcomes or outcomes that may take many years to develop, and their
lack of leverage over state approaches. 


--------------------
\1 We ranked the challenges by their means, by the percentage
reporting that they were a great or very great challenge, and by how
often each challenge was reported as the greatest challenge
encountered in that stage.  These different methods resulted for the
most part in similar rankings. 

\2 OMB also found, in reviewing agency progress in strategic
planning, that virtually every agency had difficulty linking
long-range strategic mission and goals with annual performance goals. 
(John A.  Koskinen, OMB, letter to the Honorable Dan Glickman,
Secretary of Agriculture, Aug.  9, 1996.)


      A SHORT-TERM FOCUS, MULTIPLE
      STAKEHOLDERS, AND DATA
      CONSTRAINTS MADE SPECIFYING
      PERFORMANCE MEASURES
      DIFFICULT
---------------------------------------------------------- Letter :4.2

The challenges rated most difficult, on average, in specifying
performance measures were (1) getting beyond program outputs (that
is, summaries of program activities) to develop measures of outcomes
or the results of those activities; (2) specifying quantifiable,
readily measurable performance indicators; and (3) developing interim
or alternative measures for program effects that may not show up for
several years (see table 4).  Similar reasons were given for why each
of these challenges was particularly difficult. 



                          Table 4
          
            Respondents' Ratings of the Level of
          Difficulty Posed by Potential Challenges
             in Developing Performance Measures


                            Percentage
                        rating this as      Mean
                            a great or  challeng
                            very great         e     Valid
Potential challenge          challenge  rating\a     cases
----------------------  --------------  --------  --------
Getting beyond program              49      3.52        65
 outputs, that is,
 summaries of program
 activities, to
 develop outcome
 measures of the
 results of those
 activities
Specifying                          42      3.25        65
 quantifiable, readily
 measurable
 performance
 indicators
Defining common,                    39      2.96        46
 national performance
 measures for
 decentralized
 programs
Developing interim or               37      3.09        54
 alternative measures
 for program effects
 that may not show up
 for several years
Estimating a                        32      3.03        60
 reasonable level for
 expected program
 performance
Developing qualitative              29      2.84        49
 measures such as
 narrative
 descriptions where
 numerical measures
 could not be had
Planning how to                     20      2.40        60
 compare actual
 program results with
 the performance goals
----------------------------------------------------------
\b On a scale of 1 ("little or no challenge") to 5 ("a very great
challenge"). 

Respondents found that, at the most basic level, defining the
specific outcomes desired for their program was difficult to
accomplish, but it was also complicated by program-specific
conditions.  Some said that defining outcome measures required
administrators to change from thinking on a day-to-day basis to
taking a long-term perspective on what they wanted to accomplish, as
indeed the Act intended them to do.  Shifting to a long-term
perspective led them to broaden their horizons to consider outcomes
over which they rarely have complete control, introducing additional
uncertainty.  More generally, some respondents observed that
"outcome" seemed to be a fuzzier concept than "output," difficult to
think through and specify precisely.  These tasks were said to be
particularly difficult in a volatile, complex policy environment. 

In addition, to arrive at an outcome definition that would be broadly
accepted, program officials reported having to do a lot of consensus
building with stakeholders who often disagreed on the validity of
outcome measures.  Some reported difficulty in getting state program
administrators and other federal stakeholders not only to think
beyond their own program operations, as previously noted, but also to
conceptualize how those diverse activities were related to a common
outcome for the nation as a whole.  Others noted that efforts to
agree on measures had to overcome program officials' reluctance to be
measured except in the most favorable light, concerned, perhaps, with
the potential use of performance data to blame program officials
rather than improve program functioning. 

For others, selecting outcome measures was difficult because it was
intertwined with anticipated data collection problems.  They noted
that a focus on outcomes involves developing new measures, new
databases, and, often, learning new measurement techniques. 
Moreover, the annual reporting requirement was said to force certain
issues:  for example, annual data collection needs to be orchestrated
and routinized, thus either raising additional logistics questions or
limiting program officials' choice of measures, if new data
collection was not a practical option. 


      RESPONDENTS BLAMED THE NEED
      TO RELY ON OTHERS FOR THEIR
      GREATEST DATA COLLECTION
      CHALLENGES
---------------------------------------------------------- Letter :4.3

Although, in general, the potential challenges in data collection
were not considered as difficult as those in other stages, about
one-third of our respondents reported that the following were
particularly challenging:  (1) using data collected by others, (2)
ascertaining the accuracy and quality of performance data, and (3)
acquiring the data in a timely way (see table 5).  However, these
programs may have avoided some of the data issues we posed through
decisions made in the previous stage to select measures for which the
respondents had existing data.  Our respondents said that using data
collected by others was challenging because it was difficult to
ascertain their quality or to ensure their completeness and
comparability.  The respondents also found a management challenge in
attempting to overcome resistance by external data providers to
spending money on additional data collection and to sharing costly
data.  Two respondents also reported having to deal with deliberate
misreporting by other agencies that were trying to justify higher
funding levels. 



                          Table 5
          
            Respondents' Ratings of the Level of
          Difficulty Posed by Potential Challenges
                     in Data Collection


                            Percentage
                        rating this as      Mean
                            a great or  challeng
                            very great         e     Valid
Potential challenge          challenge  rating\a     cases
----------------------  --------------  --------  --------
Using data collected                33      2.74        46
 by others
Ascertaining the                    30      2.92        60
 accuracy of and
 quality of
 performance data
Acquiring the data in               28      2.72        61
 a timely way
Acquiring the exact                 26      2.74        62
 data wanted and in
 the form desired
Obtaining baseline                  25      2.69        59
 data for comparison
Ascertaining the                    22      2.81        59
 accuracy of and
 quality of baseline
 data
Identifying and                     11      2.25        63
 locating sources of
 data for the
 performance measures
----------------------------------------------------------
\a On a scale of 1 ("little or no challenge") to 5 ("a very great
challenge"). 

The fact that their data were largely collected by others was the
most frequent explanation of why ascertaining the accuracy and
quality of performance data was a problem.  One respondent said that
collecting federal data is not a high priority for most states, and
thus they do not emphasize the data's accuracy.  Documentation of
data quality was reportedly often not available or was incomplete. 
For example, one respondent said that in his area, most state
record-keeping is manual and hard to audit.  Acquiring the data in a
timely way was reported as hindered by lack of adequate database
systems; more often it was said to be hindered by a mismatch between
the data collection time lines and the reporting cycle. 


      THE INFLUENCE OF FACTORS
      BEYOND THE PROGRAM'S CONTROL
      MAKES ATTRIBUTING THE
      RESULTS TO THE PROGRAM
      DIFFICULT
---------------------------------------------------------- Letter :4.4

When it came to analyzing and reporting performance, one challenge
stood out clearly as the most difficult:  separating the impact of
the program from the impact of other factors external to the program
(see table 6).  Forty-four percent of respondents who had begun this
stage claimed that it was a great or very great challenge.  The
difficulty was primarily the fact that the outcomes of many federal
programs are the result of the interplay of several factors, and only
some of these are within the program's control.  Even simple,
two-variable interactions are potentially difficult.  For instance,
if a new weapon system is introduced late in the fleet training
cycle, lower-than-expected levels of performance could be caused by
problems in the weapon system or in the training program. 



                          Table 6
          
            Respondents' Ratings of the Level of
          Difficulty Posed by Potential Challenges
                 in Analysis and Reporting


                            Percentage
                        rating this as      Mean
                            a great or  challeng
                            very great         e     Valid
Potential challenge          challenge  rating\a     cases
----------------------  --------------  --------  --------
Separating the impact               44      3.11        45
 of the program from
 the impact of other
 factors external to
 the program
Calculating the                     24      2.43        49
 outputs and outcomes
 for any program
 components
Having to modify or                 23      2.60        43
 develop additional
 indicators
Understanding the                   16      2.25        44
 reasons for unmet
 goals or
 unanticipated results
Comparing actual                    13      1.98        47
 program performance
 results with the
 performance goals
Translating the                     12      2.24        42
 results into
 recommendations for
 future program
 improvement and
 better performance
 measurement
Data that turned out                11      2.11        44
 to be inadequate for
 the intended analysis
----------------------------------------------------------
\a On a scale of 1 ("little or no challenge") to 5 ("a very great
challenge"). 

More importantly, many programs consist of efforts to influence
highly complex systems or phenomena outside government control.  In
such cases, one cannot confidently attribute a causal connection
between the program and its outcomes.  Respondents noted that
controlling for all external factors in order to measure a program's
effect is very difficult in programs that attempt to intervene in
highly complex systems such as ecosystems, year-to-year weather, or
the global economy.  Additionally, respondents pointed to other
factors that can exacerbate this problem, such as very long-term
outcomes that are difficult to link directly to program activity. 

Although the Act does not require agencies to conduct formal impact
evaluations, it does require them to (1) measure progress toward
achieving their goals, (2) identify which external factors might
affect such progress, and (3) explain why a goal was not met. 
Although few respondents reported difficulty identifying these
external factors during the goal identification stage (19 percent, as
shown in table 3), actually isolating their impact on the outcomes
during analysis was reported to be a more formidable challenge.  This
could be due either to analytic or to conceptual problems in
controlling for the influence of other factors.  Nevertheless,
because they realized that a simple examination of the outcome
measures would not accurately reflect their program's performance,
many of our respondents believed that they ought to go to the next
step and separate the influence of other factors on their program's
goals, in order to establish their program's impact. 


   PROGRAMS TOOK VARIED APPROACHES
   TO ADDRESS THEIR MOST DIFFICULT
   CHALLENGES
------------------------------------------------------------ Letter :5

Respondents reported active efforts to address those challenges they
identified as most difficult in each of the four stages.  The
approaches they described covered a range of strategies, from
participatory activities (such as consulting with stakeholders or
providing program managers with training in reporting outcome data)
to applying statistical and measurement methods (such as conducting a
customer survey or developing multiple measures of associated program
outcomes for an outcome that was difficult to measure directly). 
Programs applied similar participatory strategies throughout the
performance measurement stages but tended to tailor the analytic
strategies to the particular challenge, sometimes using quite
different approaches to the same challenge.  The scope and ingenuity
of some of these approaches demonstrate serious engagement in the
analytic dimension of performance measurement. 

Program officials reported relatively high levels of technical staff
involvement across the four performance measurement stages (72 to 82
percent of all those who identified a challenge in those stages; see
table 7).  Nevertheless, they appeared to have somewhat more
difficulty resolving their most difficult challenges in the stages of
developing performance measures and analyzing data and reporting
results than in the other two stages.  Program respondents were more
likely to report in these stages (11 and 12 percent, respectively)
that their performance measurement team was still trying to determine
what to do.  Moreover, respondents also reported feeling more
successful in their responses to the most difficult challenges in
identifying goals and collecting data than with those in selecting
measures and in analysis and reporting.  This pattern of experiencing
greater satisfaction in their approaches to the challenges in the
goal identification and data collection stages was even more apparent
when we looked at the single challenge in each stage that the
greatest number of respondents considered most difficult.\3



                          Table 7
          
               Respondents' Use of Evaluation
           Resources, Development of Approaches,
                    and Views of Success


                                                  Analyzin
                                                    g data
                                                       and
                Identify    Developing            reportin
                     ing   performance  Collecti         g
Item               goals      measures   ng data   results
--------------  --------  ------------  --------  --------
Evaluation resources
----------------------------------------------------------
Number of             61            62        58        42
 respondents
 who
 identified
 one challenge
 in the stage
 as most
 difficult
Percentage who       82%           81%       84%       87%
 had access to
 prior studies
Percentage of        77%           80%       80%       74%
 those who
 considered
 prior studies
 helpful
Percentage who       72%           82%       81%       74%
 were assisted
 by technical
 staff in this
 stage

Approaches
----------------------------------------------------------
Developed\a          93%           89%       98%       88%
Yet to be             7%           11%        2%       12%
 developed

Views of success
----------------------------------------------------------
Minimally             5%           16%       10%       14%
 successful
Somewhat              7%           22%       16%       14%
 successful
Moderately           42%           30%       29%       32%
 successful
Mostly               18%           24%       28%       34%
 successful
Very                 28%            8%       17%        7%
 successful
----------------------------------------------------------
\a Percentage of approaches to the most difficult challenge in a
stage reported by respondents who had identified one challenge as
most difficult. 


--------------------
\3 We did not independently assess the approaches respondents
described. 


      APPROACHES TO TRANSLATING
      LONG-TERM GOALS INTO ANNUAL
      GOALS
---------------------------------------------------------- Letter :5.1

In the first stage, identifying goals, the challenge respondents most
frequently identified as their most difficult was translating the
long-term goals established in their strategic plan into annual
performance goals.  All 12 respondents selecting this challenge as
their most difficult (representing 10 programs) reported having
developed an approach to this challenge, and most were well satisfied
with how it met the challenge.\4 Half rated their approach as mostly
to very successful, and half rated it as moderately successful in
responding to the challenge.  (App.  III provides data on
respondents' views of the approach they developed and their use of
evaluation resources for those who selected this as the most serious
challenge in this stage.) This group of respondents was a little less
likely than the full sample to report having access to prior studies
to develop their approaches to identifying goals.  Three-quarters had
prior studies to draw on, and three-quarters were assisted by
technical staff.  All those with access to prior studies generally
found them to be helpful. 

To address the challenge of specifying annual goals that were
consistent with their long-range goals, the respondents reported that
they tended either to use other than an annual time period for
reporting or to modify the global outcome toward which the goals were
directed.  (Table 8 shows the types of approaches the programs
developed for this challenge and for the second most frequently
identified challenge.) For example, two respondents reported that
their programs found that setting annual goals was not feasible
because of the exploratory and long-range nature of their work.  One
respondent compared the program's role with that of an investment
broker with a portfolio, for which long-term goals are fairly well
identified but for which annual expectations are much less certain. 
He added that because the program operates through the grant-funding
mechanism, which is less directive than other forms of financial
assistance, it requires an investment perspective.  The manager of
the second program pointed out that it is difficult to set annual
goals for a program targeted on a rapidly changing industry.  Both of
these programs had adopted a multiyear planning horizon for their
performance goals. 



                          Table 8
          
           Approaches Taken to the Most Difficult
              Challenges in Identifying Goals

                     Number of
                  respondents\  Approach to identifying
Challenge                    a  goals
----------------  ------------  --------------------------
Translating                 12  Specified performance
 long-term goals                 goals over an extended
 into annual                     period
 performance
 goals
                                Focused annual goals on
                                 proximate outcomes
                                Developed a conceptual
                                 model to specify annual
                                 goals
                                Focused annual goals on
                                 short-term strategies for
                                 achieving long-term goals
                                Developed a qualitative
                                 approach
                                Involved stakeholders
Distinguishing               9  Clarified definitions of
 between outputs                 output and outcome
 and outcomes
                                Focused on known,
                                 quantifiable outcomes
                                Focused on projected
                                 outputs
                                Surveyed customers to
                                 identify outcomes
                                Involved stakeholders
----------------------------------------------------------
\a Number of respondents who identified the challenge as most
difficult and had developed an approach to that challenge. 

The two programs in which the desired outcomes were modified tended
to have very global long-range objectives, such as reducing death
from breast cancer, for which many influences other than the program
can affect either the incidence of cancer or its mortality rate. 
Rather than target their annual performance goals directly on the
ultimate goal over which they had little control, the respondents
said that they identified activities, such as screening for disease,
that were known from previous research to be effective in achieving
the long-range goals.  They used these activities as the basis for
specifying annual goals.  Thus, the program focused its annual goals,
instead, on expanding the delivery of screening, which it can more
directly affect. 


--------------------
\4 Among programs represented by two respondents, in some cases, both
identified the same challenge as most difficult.  However, in other
cases, each respondent identified a different challenge as most
difficult. 


      APPROACHES TO DEVELOPING
      PERFORMANCE MEASURES THAT
      REFLECT OUTCOMES, NOT
      OUTPUTS
---------------------------------------------------------- Letter :5.2

Getting beyond outputs to develop outcome measures was the challenge
most often identified as the most difficult in the developing
performance measures stage:  18 respondents, representing 17
programs, cited this problem.  This challenge did not seem to be as
easily reconciled as the most serious challenge in identifying goals. 
Two of these respondents reported that they had yet to develop an
approach to solving this problem, and none of the respondents thought
they had very successfully addressed the challenge.  Only 17 percent
believed they were mostly successful, whereas most (about 80 percent)
believed their approach was somewhat to moderately successful. 
Respondents finding this challenge particularly difficult had less
access to prior studies and assistance from technical staff than the
total sample.  Two-thirds of these respondents had access to prior
studies and technical staff for their approach.  All those with
access to technical staff reported that they were involved in
developing measures that reflected outcomes.  (See app.  III.)

We found a diverse set of approaches for this challenge; some were
focused on conceptual issues, others on measurement issues.  (Their
approaches and those for the second most often identified challenge
in this stage are summarized in table 9.) Several respondents
described engaging in conceptual exercises to model the relationships
between the program's activities, actors, and objectives to isolate
and identify the uniquely federal role.  For example, respondents for
three programs emphasized the need to recognize the interaction of
the federal program and of state and local government efforts.  The
manager of one of these programs observed that it is difficult for
individual agencies at any level of government to specify outcome
measures attributable solely to their program because of the
interplay among programs at different levels in carrying out program
objectives.  He thought a more comprehensive measurement model that
encompasses federal as well as state and local government activity
was needed to identify separate federal outcome measures.  He said
that his professional community is grappling with the measurement
issues involved, but the model has not been developed yet. 



                          Table 9
          
           Approaches Taken to the Most Difficult
            Challenges in Developing Performance
                          Measures

                     Number of
                  respondents\  Approach to developing
Challenge                    a  performance measures
----------------  ------------  --------------------------
Getting beyond              16  Developed a measurement
 outputs to                      model that encompasses
 develop outcome                 state and local activity
 measures                        to identify outcome
                                 measures for the federal
                                 program
                                Encouraged program
                                 managers to develop
                                 projections for different
                                 funding scenarios
                                Conceptualized the
                                 outcomes of daily
                                 activities
                                Used multiple measures
                                 that are interrelated
                                Developed measures of
                                 customer satisfaction
                                Used qualitative measures
                                 of outcome
                                Planned a customer survey
                                Involved stakeholders
Specifying                   8  Identified outcome
 quantifiable                    measures used by similar
 performance                     programs
 indicators
                                Conducted a survey
                                Involved stakeholders
----------------------------------------------------------
\a Number of respondents who identified the challenge as most
difficult and had developed an approach to that challenge. 

In a second joint federal-state program, it was said to be difficult
to gain consensus on a single national outcome because there were
conflicting perspectives in the field on the appropriate intervention
strategy, and states were thus allowed to develop very diverse
programs.  One other program used conceptual models or scenario
exercises to help program managers broaden their horizons to identify
the probable outcomes of their daily activities, asking program staff
to imagine what they might be able to accomplish with different
levels of resources. 


      APPROACHES TO THE NEED TO
      RELY ON OTHERS FOR DATA
      COLLECTION
---------------------------------------------------------- Letter :5.3

Using data collected by others was identified as most difficult by
more respondents than any other data collection challenge; 11
respondents, representing 9 programs, did so.  All reported having
developed an approach to this challenge, and most were satisfied with
it.  More than half the respondents believed their approach was
either mostly or very successful. 

Respondents reported few resource problems in addressing this
challenge.  All the respondents reported that prior studies had been
conducted, and almost all (90 percent) said that technical staff were
available.  Most (73 percent) believed the studies were helpful, and
those who did used them to a great extent to identify data collection
strategies (86 percent) and verify the data (63 percent).  All those
who had access to technical staff reported that they were involved. 

Most of the approaches to this challenge involved either standard
procedures to verify and validate the data submitted to the program
by other agencies or a search for alternative data sources, as shown
in table 10, together with approaches for the next two most
frequently identified challenges.  For example, to verify data
submitted by other agencies, some respondents reported that they had
contacted the agency and asked it to correct the data or had hired a
contractor to do so.  Another respondent reported that to replace
existing outcome data that the program had obtained from others,
program representatives entered into roundtable discussions with
their customers to identify new variables and undertook a special
study to seek new data sources and design a composite index of the
outcome variables. 



                          Table 10
          
           Approaches Taken to the Most Difficult
                 Data Collection Challenges

                     Number of
                  respondents\  Approach to data
Challenge                    a  collection
----------------  ------------  --------------------------
Using data                  11  Verified and validated the
 collected by                    data
 others
                                Researched alternative
                                 data sources
                                Conducted a special study
                                 and redesigned a survey
                                 to develop new sources of
                                 outcome data
                                Involved stakeholders
Obtaining                    9  Created new data elements
 baseline data
 for comparison
                                Used data from other
                                 agencies
                                Developed a customer
                                 survey
                                Developed an activity-
                                 based cost system
                                Involved stakeholders
                                Provided training
Ascertaining the             9  Used a certified automated
 accuracy and                    data system
 quality of
 performance
 data
                                Used data verification
                                 procedures
                                Acknowledged the data
                                 limitations
                                Provided training
                                Used management experience
----------------------------------------------------------
\a Number of respondents who identified the challenge as most
difficult and had developed an approach to that challenge. 


      APPROACHES TO ISOLATING THE
      IMPACT OF THE PROGRAM
---------------------------------------------------------- Letter :5.4

Separating the impact of the program from the impact of other factors
external to the program was identified as most difficult by about
half of those who rated challenges in the data analysis and
results-reporting stage, and several had not resolved it.  Fourteen
respondents, representing 11 programs, reported having developed an
approach, but 5 respondents, representing 5 programs, had yet to do
so.  Respondents' assessments of the approaches they had developed
were modest--28 percent rated their approach as mostly or very
successful in meeting the challenge, whereas 44 percent believed they
were moderately successful.  (These data are provided in app.  III.)

Similar to the group at large, prior studies were available to most
of these programs, and most of these respondents (68 percent)
believed the studies were helpful, even those who had not yet
developed their approach.  Although fewer respondents had access to
technical staff (74 percent), more than 90 percent of them reported
that they were involved in addressing this challenge, including some
of those with approaches still to be developed.  (See app.  III.)

Program officials described using a variety of techniques employed in
formal evaluations of program impact as well as other approaches to
address this challenge, as summarized in table 11.  Notably, these
techniques were often employed at the subnational level, where the
influence of other variables was either reduced or easier to observe
and control for.  For example, because one such program is well aware
that the economy has a strong effect on a loan program's performance,
it monitors changes in the economy very closely, but at the regional
level.  Disaggregating the data to follow one regional economy at a
time allows program staff to determine whether an increase in loan
defaults in a given region reflects a faltering economy or indicates
some problem in the program that needs follow-up.  Another program,
faced with similar complexities, was said to sponsor special studies
to identify its impact at the local level, where it can control for
more factors.  Since this approach would be too expensive to
implement for the entire nation, the program conducts this type of
analysis only in selected localities. 



                          Table 11
          
           Approaches Taken to the Most Difficult
                     Analysis Challenge

                     Number of
                  respondents\
Challenge                    a  Approach to analysis
----------------  ------------  --------------------------
Separating the              14  Specified as outcomes only
 impact of the                   the variables that the
 program from                    program can affect
 the impact of
 other factors
 external to the
 program
                                Advised field offices to
                                 use control groups
                                Used customer satisfaction
                                 measures
                                Monitored the economy at
                                 the regional level
                                Expanded data collection
                                 to include potential
                                 outcome variables
                                Analyzed time-series data
                                Analyzed local-level
                                 effects that are more
                                 clearly understood
                                Involved stakeholders
----------------------------------------------------------
\a Number of respondents who identified the challenge as most
difficult and had developed an approach to that challenge. 

Other programs minimized the influence of external factors on their
programs' outcomes through their selection of performance measures. 
Some programs selected performance measures that are quite proximate
to program outputs, permitting a more direct causal link to be drawn
between program activities and results.  Another program did not have
the information it needed to analyze its impacts and settled for
measures of customer satisfaction. 


   EARLY IMPLEMENTATION WAS
   ASSISTED BY EVALUATION
   RESOURCES
------------------------------------------------------------ Letter :6

As examples of their agencies' cutting-edge efforts in performance
measurement, these programs appeared to have an unusual degree of
program evaluation support from within their agencies, as shown in
table 12.  Despite a 1994 survey that found a continuing decline in
evaluation capacity in the federal government, 58 percent of our
respondents said they had access to prior evaluations of their
program, and 69 percent had access to other studies of their program;
83 percent reported having access to program evaluators or other
technically trained staff.\5 Of those with access to program
evaluators, 89 percent reported that program evaluators in some way
assisted their efforts.  Several of the official GPRA pilots were
actually run by program evaluation and planning offices.  Almost all
respondents (96 percent) from large programs (those with annual
budgets over $1 billion) reported having access to evaluators, and
even 67 percent of respondents from small programs (with budgets
under $100 million) reported such access.  However, among those with
access to evaluators, small programs were less likely than their
large counterparts to actually obtain assistance from evaluators (78
percent compared with 95 percent). 



                          Table 12
          
          Respondents' Reported Access to and Use
                  of Evaluation Resources

                            Total sample      No. of valid
Evaluation resource            (percent)             cases
----------------------  ----------------  ----------------
Prior studies available
----------------------------------------------------------
Program evaluations                   58                67
Other studies                         69                65
Either                                81                67

Prior studies were helpful in
----------------------------------------------------------
Defining and setting                  77                53
 goals
Developing measures or                81                53
 planning data
 collection
Analyzing data and                    65                48
 reporting results

Evaluation staff
----------------------------------------------------------
Available                             83                64
Involved                              89                56

Evaluation or technical staff were involved in
----------------------------------------------------------
Defining and setting                  80                60
 goals
Developing measures or                88                60
 planning data
 collection
Analyzing data and                    68                57
 reporting results
----------------------------------------------------------
Respondents considered prior studies of their program as more helpful
in the stages of identifying goals, developing measures, and
collecting data (77 and 81 percent) than in the analysis and
reporting stage (65 percent).  Prior studies were considered most
helpful with the tasks of defining program goals, describing the
program environment, and developing quantifiable or readily
measurable indicators, but least helpful with setting performance
targets and explaining program results.  Similarly, evaluators and
other technically trained staff were said to be most involved in
developing performance measures and data collection strategies (88
percent among those with access), particularly in the task of
developing quantifiable, readily measurable performance measures, and
least involved in the analysis and reporting stage (68 percent). 

To develop quantifiable performance measures, for example, one
program used a data collection instrument developed in a prior study
to collect data on the outcomes of the program on the overall family
environment of its target population.  An evaluator serving as a
consultant to the program identified the data collection instrument. 
An administrator of another program, who was trained in evaluation
methods, used his expertise to develop quantifiable measures for the
outcome of a program subject to so many external social and
environmental factors that a single performance measure was difficult
to isolate.  He developed a series of measures that are linked to one
another and looked at the overall direction of the measures as the
performance indicator.  This approach, he suggested, recognized that
measuring overall performance is a more complex problem for some
programs than looking at a single number or group of numbers. 

Yet, it was in the tasks involved in developing performance measures
and data collection strategies that respondents were most likely to
report they could have used more help:  creating quantifiable,
measurable performance indicators (56 percent) and developing or
implementing data collection and verification plans (48 and 49
percent).  When asked why they were not able to get the help they
needed, some mentioned lack of time, unavailability of staff, or lack
of performance measurement expertise, but more commonly they reported
that it was hard to know in advance that evaluators' expertise would
be needed (42 percent). 

Others were aware that additional research is needed but faced
complex measurement issues that staff could not resolve.  For
example, the respondent whose program is collecting data on family
environment outcomes (previously mentioned) needed more dimensions
than those provided by the data collection instrument the program was
using.  The program is conducting exploratory work to identify some
of those dimensions.  In addition, it still has to determine how to
measure the program's long-term effects on parents and children. 
Another program is looking for sound evidence that services provided
to its clients may prevent those families from applying for and
receiving more expensive benefits from other public programs.  The
respondent reported plans to conduct research on this issue. 


--------------------
\5 Michael J.  Wargo, "The Impact of Federal Government Reinvention
on Federal Evaluation Activity," Evaluation Practice, 16:3 (1995),
pp.  227-37.  An earlier, similar assessment can be found in Program
Evaluation Issues (Washington, D.C.:  U.S.  General Accounting
Office, 1992). 


   CONCLUSIONS
------------------------------------------------------------ Letter :7

Seeking to improve government performance and public confidence in
government, GPRA established a requirement for executive branch
agencies to identify agency and program goals and report on program
results.  In reviewing the progress and challenges of selected
programs' efforts to complete the analytic steps involved, we found
that although agencies have been experimenting with performance
measurement for 3 years or more, most have not completed all the
tasks required by the Act, and many others are still grappling with
the analytic and technical challenges involved.  Thus, we expect
agencies' full implementation to be an evolving process requiring
several iterations to achieve valid, reliable, and useful performance
reporting systems.  However, we also expect both the agencies and the
Congress to benefit from performance measurement as reporting systems
are strengthened. 

The programs we reviewed are not only volunteers but also have more
than average experience with and access to analytical resources in
addressing the challenges of performance measurement.  Although
access to analytic expertise did not solve all these programs'
challenges, most of our respondents considered it helpful, and many
said they could have used even more such assistance.  Thus, with full
implementation across the government, more typical federal programs
are likely to find performance measurement an even greater challenge,
particularly if they do not have access to program evaluation or
other analytic resources. 

A recurring source of the programs' difficulty both in selecting
appropriate outcome measures and in analyzing their results stemmed
from two features common to many federal programs:  the interplay of
federal, state, and local government activities and objectives and
the aim to influence complex systems or phenomena whose outcomes are
largely outside government control.  In such cases, it may be
important to supplement performance measurement data with impact
evaluation studies to provide an accurate picture of program
effectiveness.  In addition, systematic evaluation of how a program
was implemented can provide important information about why a program
did or did not succeed and suggest ways to improve it. 


   AGENCY COMMENTS
------------------------------------------------------------ Letter :8

We discussed a draft of this report with a senior official at OMB. 
He suggested some technical changes, which we have incorporated. 


---------------------------------------------------------- Letter :8.1

We are sending copies of this report to the Chairmen and Ranking
Minority Members of the Senate and House Committees on the Budget,
the Senate and House Committees on Appropriations, and the
Subcommittee on Government Management, Information, and Technology,
House Committee on Government Reform and Oversight; the Director of
OMB; and other interested parties.  We will also make copies
available to others on request. 

If you have any questions concerning this report or need additional
information, please call William J.  Scanlon on (202) 512-4561 or
Stephanie Shipman, Assistant Director, on (202) 512-4041.  Other
major contributors to this report are listed in appendix IV. 

William J.  Scanlon
Director, Advanced Studies and Evaluation Methods

L.  Nye Stevens
Director, Federal Management and Workforce Issues


OBJECTIVES, SCOPE, AND METHODOLOGY
=========================================================== Appendix I

In order to provide information that may assist federal agencies in
meeting the analytic challenges of performance measurement and to
help the Congress in interpreting the program performance information
provided, we focused our review of agencies' early experiences with
performance measurement on three questions: 

1.  What analytic and technical challenges are agencies experiencing
as they try to measure program performance? 

2.  What approaches have they taken to address these challenges? 

3.  How have agencies made use of program evaluations or evaluation
expertise in implementing performance measurement? 

To capture the broad range of performance measurement challenges that
federal programs are likely to encounter, rather than to precisely
estimate the frequency of those challenges among early implementers,
we selected a nonrandom, purposive sample of federal programs that
had begun measuring their performance.  We based the sample on
several factors that we thought might affect their experience. 
Generally, we selected two programs each from the 14 cabinet
departments and from 6 independent agencies--one program that had
been designated as an official Government Performance and Results Act
of 1993 (GPRA) pilot and another that had begun performance
measurement activities on its own or in response to the Office of
Management and Budget's (OMB) fiscal year 1998 budget request. 
Because some agencies had no official GPRA pilot program, 17 of our
programs were GPRA pilots, while 23 were not.  (See the list of
programs we reviewed at the end of this app.) For each program, we
attempted to interview both the program official responsible for
performance measures and a program evaluator or other analyst who had
assisted in this effort.  Since no evaluator was identified in some
programs, while in others the evaluator was the person responsible
for the performance measurement effort, we conducted 68 interviews
with officials from 40 programs. 

To learn what kinds of technical and analytic challenges agencies
were experiencing, we asked these program officials to rate (on a
five-point scale) the level of difficulty they had experienced with
potential challenges at each stage of the process of developing
performance information:  identifying goals, selecting measures,
collecting data, and analyzing data and reporting results.  We
identified seven to nine potential challenges for each stage from the
literature on performance measurement and program evaluation and from
pretest interviews.  We then asked program officials to identify
their most difficult challenge in each stage, to describe what
approach they took to address it, and to rate (on a five-point scale)
how successfully that approach met the challenge.  Finally, we asked
whether prior evaluation studies and program evaluators (or other
technically trained staff), if available, were involved in the
various tasks of developing performance information. 

CHARACTERISTICS OF THE SAMPLE

We selected programs to represent diversity on characteristics that
we hypothesized might affect their experience in measuring program
performance:  program purpose; program funding size; locus of program
control at the federal, state, or other level; and program funding
through annual or multiyear appropriations.  Since the nature of what
a program intends to achieve is the basis for any measurement of its
results, our first criterion was the program's purpose.  To capture
the range of activities in the federal budget, we considered three
broad program purposes:  (1) administering regulations; (2) providing
services, including military defense; and (3) developing information,
including research and development, and statistical and demonstration
programs.  Because the smaller programs may have fewer resources to
spend on oversight but may also have more clearly focused goals than
larger programs, we selected programs with a range of budget sizes. 

Additionally, the federal government's level of control over results
may often depend on whether it has decision-making authority for
program structure, objectives, and type of delivery mechanism. 
Therefore, we selected a mix of programs whose primary actor is a
federal, state, or local agency or some other organization.  We also
thought budgetary independence might affect how programs responded to
the Act's requirements; programs not dependent on the Congress for
annual funding might not be as far along. 

Finally, we also considered how relevant a program was to the
agency's core mission.  In some agencies, administrative activities
resembling fairly simple processes, such as property procurement and
management, were selected as pilots.  Because questions about the
Act's implementation are concerned with how to measure government's
more complex activities, we believed that activities more central to
the agency's mission would provide more information about the future
of the Act's implementation. 

Our sample of pilots was generally similar to the entire population
of GPRA pilots in the range of program purposes, but it had a larger
proportion of pilots whose locus of control was at the federal level
(67 percent) than did the population of all pilots (50 percent).  It
also had a smaller proportion of pilots with funding under $100
million a year (38 percent compared to 43 percent) (see table I.1). 
However, our total sample, including pilots and other programs, had
the same proportion of federally controlled programs as did the
population of pilots (50 percent).  It also had somewhat more
information-development programs (29 percent compared to 19 percent),
fewer regulatory programs (13 percent versus 23 percent), and more
large programs with funding over $1 billion (36 versus 24 percent)
than the population of all pilots.  Most programs are funded by
annual appropriations and thus were also the largest share, 82
percent, of our sample.  The other programs in our sample either
received appropriations for multiple years or were funded for the
most part through the collection of offsetting fees. 



                         Table I.1
          
           Characteristics of Our Sample and All
                Official GPRA Pilot Programs


                                                  Official
Program                        Other                  GPRA
characteristic  Pilots      programs     Total      pilots
--------------  ------  ------------  --------  ----------
Program purpose
----------------------------------------------------------
Provide            57%           58%       57%         59%
 services or
 military
 defense
Develop             27            32        29          19
 information
Administer          17            11        13          23
 regulations

Locus of program control
----------------------------------------------------------
Federal             67            37        50          50
State               23            42        34          36
Other               10            21        16          14

Annual budget
----------------------------------------------------------
Less than $100      38             6        21          43
 million
Between $100
 million and        31            55        44          28
 $1 billion
Greater than        31            39        36          24
 $1 billion

Appropriations
----------------------------------------------------------
Annual              79            84        82          \a
Multiyear           21            16        18          \a
----------------------------------------------------------
\a Not available. 

We found neither an enumeration of agency efforts to measure program
performance aside from the official pilots nor a characterization of
all federal programs on these dimensions, so we do not know how
representative our sample is of the full population of federal
programs.  However, we believe our sample captures the breadth of
federal programs across a range of agencies, purposes, actors, sizes,
and types of budget authority. 

DATA COLLECTION AND ANALYSIS

Our survey sought both to characterize the range of analytic
challenges that federal programs are wrestling with governmentwide
and to obtain descriptions of what they are doing to address specific
challenges.  To satisfy both objectives, we asked all respondents to
do two things.  First, we asked them to rate the difficulty of the
full set of challenges we hypothesized for each of the four
performance measurement stages.  This provided us with quantitative
data for the portion of the sample that had at least begun each
stage.  Second, we asked them to nominate one challenge in each stage
as the most difficult and to describe, in their own words, why it was
difficult and what approach their program had developed to address
it.  This provided us with qualitative data for each challenge that
at least one respondent for a program identified as the most
difficult in that stage. 

To identify the challenges that our entire sample considered the most
problematic, we analyzed all respondents' ratings for each challenge
across the four performance measurement stages.  To explore why these
challenges were problematic, we analyzed the qualitative data
available from those who had identified them as their most difficult
(in that stage).  We then performed a more detailed content analysis
of the approach data, for the single challenge in each stage that the
largest percentage of respondents nominated as their most difficult. 
This allowed us to characterize the range of approaches being
developed by subgroups responding to the same challenge.  Because
some respondents from the same program identified different
challenges as their most difficult, we reported the results on the
basis of respondents rather than programs. 

We conducted our work between May 1996 and March 1997 in accordance
with generally accepted government auditing standards.  However, we
did not independently verify the information reported by our
respondents. 

Table I.2 lists the programs, by agency, included in our review. 



                         Table I.2
          
              Programs Included in Our Review

Agency                Program or function
--------------------  ------------------------------------
Agency for            Democracy program area, civil
International         society objective; Population and
Development           Health, unintended pregnancies
                      objective

Department of         Cooperative State Research,
Agriculture           Education, and Extension Service;
                      National Agricultural Statistics
                      Service

Department of         Information Dissemination: Patent
Commerce              and Trademark Office; National
                      Institute of Standards and
                      Technology laboratories

Department of         Air Force Air Combat Command; Navy
Defense               Atlantic Fleet

Department of         Vocational Rehabilitation State
Education             Grant Program; Even Start

Department of Energy  Office of Energy Efficiency and
                      Renewable Energy; science and
                      technology priority area in the
                      Department's performance agreement
                      with the President

Department of Health  Office of Child Support Enforcement;
and Human Services    Performance Partnerships in Health,
                      Mental Health; Performance
                      Partnerships in Health, Chronic
                      Disease

Department of         Office of the Chief Financial
Housing and Urban     Officer, Departmentwide Debt
Development           Collection; affordable housing for
                      low-income renters priority area in
                      the Department's performance
                      agreement with the President

Department of the     U.S. Geological Survey, National
Interior              Water Quality Assessment Program;
                      Office of Surface Mining Reclamation
                      and Enforcement

Department of         Organized Crime Drug Enforcement
Justice               Task Force; U.S. Marshals Service

Department of Labor   Occupational Safety and Health
                      Administration; Employment and
                      Training Administration

Department of State   Bureau of Diplomatic Security;
                      International Narcotics Program and
                      Law Enforcement Affairs

Department of         Federal Highway Administration,
Transportation        Federal Lands Highway Organization;
                      Federal Highway Administration,
                      Federal Aid Highway program

Department of the     U.S. Customs Service, Office of
Treasury              Enforcement; U.S. Secret Service

Department of         Veterans Benefits Administration,
Veterans Affairs      Loan Guaranty Service; Veterans
                      Health Administration, medical care
                      programs

Environmental         Acid Rain Program; Air and Radiation
Protection Agency     Program

Federal Emergency     Mitigation budget activity area;
Management            National Flood Insurance Program
Administration

National Aeronautics  Aeronautics; Human Exploration
and Space
Administration

National Science      Science and Technology Centers;
Foundation            Research Projects

Social Security       Entire agency
Administration
----------------------------------------------------------

OVERVIEW OF GPRA REQUIREMENTS
========================================================== Appendix II

The 1993 GPRA, or Results Act, legislation is the primary legislative
framework through which agencies will be required to set goals,
measure performance, and report on the degree to which goals were
met.  It requires each federal agency to develop, no later than by
the end of fiscal year 1997, strategic plans that cover a period of
at least 5 years and include the agency's mission statement; identify
the agency's long-term strategic goals; and describe how the agency
intends to achieve those goals through its activities and through its
human, capital, information, and other resources.  Agencies are to
identify critical external factors that have the potential to affect
the achievement of strategic goals and objectives, include a
description of any program evaluations used to establish goals, and
set out a schedule for periodic future evaluations.  Under the Act,
agency strategic plans are the starting point for agencies to set
annual goals for programs and to measure the performance of the
programs in achieving those goals. 

Also, the Act requires each agency to submit to OMB, beginning for
fiscal year 1999, an annual performance plan.  The first annual
performance plans are to be submitted in the fall of 1997.  The
annual performance plan is to provide the direct linkage between the
strategic goals outlined in the agency's strategic plan and what
manager and employees do day to day.  In essence, this plan is to
contain the annual performance goals the agency will use to gauge its
progress toward accomplishing its strategic goals and to identify the
performance measures the agency will employ to assess its progress. 
Also, OMB will use individual agencies' performance plans to develop
an overall federal government performance plan that OMB is to submit
annually to the Congress with the president's budget, beginning with
the budget for fiscal year 1999. 

The Act requires that each agency submit to the president and to the
appropriate authorization and appropriations committees of the
Congress an annual report on program performance for the previous
fiscal year (copies are to be provided to other congressional
committees and to the public upon request).  The first of these
reports, on program performance for fiscal year 1999, is due by March
31, 2000, and subsequent reports are due by March 31 for the years
that follow.  However, for fiscal years 2000 and 2001, agencies'
reports are to include performance data beginning with fiscal year
1999.  For each subsequent year, agencies are to include performance
data for the year covered by the report and 3 prior years. 

In each report, each agency is to review and discuss its performance
compared with the performance goals it established in its annual
performance plan.  When a goal has not been met, the agency's report
is to explain the reasons why the goal was not met; plans and
schedules for meeting the goal; and, if the goal was impractical or
not feasible, the reasons for that and the actions recommended. 
Actions needed to accomplish a goal could include legislative,
regulatory, or other actions; when an agency finds a goal to be
impractical or infeasible, the report is to contain a discussion of
whether the goal ought to be modified. 

In addition to evaluating the progress made toward achieving annual
goals established in the performance plan for the fiscal year covered
by the report, an agency's program performance report is to evaluate
the agency's performance plan for the fiscal year in which the
performance report was submitted (for example, in their fiscal year
1999 performance reports, due by March 31, 2000, agencies are
required to evaluate their performance plans for fiscal year 2000 on
the basis of their reported performance in fiscal year 1999). 
Finally, the report is to include the summary findings of program
evaluations completed during the fiscal year covered by the report. 

The Congress recognized that in some cases, not all the performance
data will be available in time for the March 31 reporting date.  In
such cases, agencies are to provide whatever data are available, with
a notation as to their incomplete status.  Subsequent annual reports
are to include the complete data as part of the trend information. 

In crafting GPRA, the Congress also recognized that managerial
accountability for results is linked to managers having sufficient
flexibility, discretion, and authority to accomplish desired results. 
The Act authorizes agencies to apply for managerial flexibility
waivers in their annual performance plans beginning with fiscal year
1999.  The authority of agencies to request waivers of administrative
procedural requirements and controls is intended to provide federal
managers with more flexibility to structure agency systems to better
support program goals.  The nonstatutory requirements that OMB can
waive under the Act generally involve the allocation and use of
resources, such as restrictions on shifting funds among items within
a budget account.  Agencies must report in their annual performance
reports on the use and effectiveness of any managerial flexibility
waivers that they receive. 

The Act calls for phased implementation so that selected pilot
projects in the agencies can develop experience from implementing the
Act's requirements in fiscal years 1994 through 1996 before
implementation is required for all agencies.  About 70 federal
organizations participated in this performance planning and reporting
pilot phase.  OMB was required to select at least five agencies from
among the initial pilot agencies to pilot managerial accountability
and flexibility for fiscal years 1995 and 1996; however, OMB did not
do so.\6

Finally, the Act requires OMB to select at least five agencies, at
least three of which have had experience developing performance plans
during the initial GPRA pilot phase, to test performance budgeting
for fiscal years 1998 and 1999.  Performance budgets to be prepared
by pilot projects for performance budgeting are intended to provide
the Congress with information on the direct relationship between
proposed program spending and expected program results and the
anticipated effects of varying spending levels on results.  To allow
the agencies more time for learning, OMB is planning to delay this
phase for 1 year. 


--------------------
\6 For information on the managerial accountability and flexibility
waiver process, see GPRA:  Managerial Accountability and Flexibility
Pilots Did Not Work as Intended (GAO/GGD-97-36, Apr.  10, 1997). 


ACCESS TO AND USE OF EVALUATION
RESOURCES
========================================================= Appendix III


                                                                          Separating the
                             Translating    Getting beyond                 impact of the
                         long-term goals        outputs to              program from the
                             into annual           develop  Using data   impact of other
                             performance       performance   collected  external factors
Item                               goals          measures   by others    to the program
----------------------  ----------------  ----------------  ----------  ----------------
Number of respondents                 12                18          12                23
 who selected this
 challenge as their
 most difficult
Number of respondents                 12                16        11\a            14\\\b
 who had developed an
 approach to their
 most difficult
 challenge
Number of respondents                  0                 2           0                 5
 whose approach was
 still to be developed
Number of respondents                  9                12          11                19
 who had access to
 prior studies
Percentage who                      100%               75%         73%               68%
 considered prior
 studies helpful
Number of respondents                 10                12          10                17
 who had access to
 technical staff
Percentage who were                  90%               100        100%               94%
 assisted by those
 technical staff

Respondents' view of success (percent)\c
----------------------------------------------------------------------------------------
Minimally successful                   0                 6           9                17
Somewhat successful                    0                28          18                11
Moderately successful                 50                50          18                44
Mostly successful                     33                17          46                22
Very successful                       17                 0           9                 6
----------------------------------------------------------------------------------------
\a The answer given by one respondent did not match the question
format. 

\b Answers given by four respondents did not match the question
format. 

\c Percentages may add to more than 100 because of rounding. 

MAJOR CONTRIBUTORS TO THIS REPORT

The following team members made important contributions to this
report:  Daniel G.  Rodriguez and Sara E.  Edmondson, Senior Social
Science Analysts, co-directed the survey and analysis of agencies'
experiences.  Joseph S.  Wholey, Senior Adviser for Evaluation
Methodology; Michael J.  Curro and J.  Christopher Mihm, Assistant
Directors; and Victoria M.  O'Dea, Senior Evaluator, provided advice
throughout the development of the report. 


RELATED GAO PRODUCTS
=========================================================== Appendix 0

GPRA:  Managerial Accountability and Flexibility Pilots Did Not Work
as Intended (GAO/GGD-97-36, Apr.  10, 1997). 

Performance Budgeting:  Past Initiatives Offer Insights for GPRA
Implementation (GAO/AIMD-97-46, Mar.  27, 1997). 

Measuring Performance:  Strengths and Limitations of Research
Indicators (GAO/RCED-97-91, Mar.  21, 1997). 

Child Support Enforcement:  Reorienting Management Toward Achieving
Better Program Results (GAO/HEHS/GGD-97-14, Oct.  25, 1996). 

Executive Guide:  Effectively Implementing the Government Performance
and Results Act (GAO/GGD-96-118, June 1996). 

Managing for Results:  Achieving GPRA's Objectives Requires Strong
Congressional Role (GAO/GGD-96-79, Mar.  6, 1996). 

Block Grants:  Issues in Designing Accountability Provisions
(GAO/AIMD-95-226, Sept.  1, 1995). 

Managing for Results:  Status of the Government Performance and
Results Act (GAO/T-GGD-95-193, June 27, 1995). 

Managing for Results:  Critical Actions for Measuring Performance
(GAO/T-GGD/AIMD-95-187, June 20, 1995). 

Managing for Results:  The Department of Justice's Initial Efforts to
Implement GPRA (GAO/GGD-95-167FS, June 20, 1995). 

Government Reform:  Goal-Setting and Performance
(GAO/AIMD/GGD-95-130R, Mar.  27, 1995). 

Block Grants:  Characteristics, Experience, and Lessons Learned
(GAO/HEHS-95-74, Feb.  9, 1995). 

Program Evaluation:  Improving the Flow of Information to the
Congress (GAO/PEMD-95-1, Jan.  30, 1995). 

Managing for Results:  State Experiences Provide Insights for Federal
Management Reforms (GAO/GGD-95-22, Dec.  21, 1994). 


*** End of document. ***