[From the U.S. Government Printing Office, www.gpo.gov]
Coastal zone -information Center COASTAL ZONE INFORMATION CENTER AUG 19 1977 PROGRAM MANAGEMENT AND THE FEDERAL EVALUATOR Pamela Horst Joe N. Nay John W. Scanlon Joseph S. Wholey 162-0010-6 An Urban Institute Reprint Zone H 62 .H67 1974 H 62 .H67 1974 The Urban Institute is a nonprofit research corporation establi@shecl in 1968 to study problems of the nation's urban communities. Independent and non- partisan, the Institute responds to current needs for disinterested analyses and basic information and attempts to facilitate the application of this knowl- edge. As part of this effort, it cooperates with federal agencies, states, cities, associations of public officials, the academic community and other sectors of the general public. The Institute's research findings and a broad range of interpretive view- points are published as an educational service, Conclusions expressed in Institute publications are those of the authors and do not necessarily reflect the views of other staff members, officers or trustees of the Institute, or of organizations, which provide funds toward support of Institute studies. These research findings are made available in three series of publica- tions: Books and Reports, Papers and Reprints. A current publications list is available on request. Publications Office The Urban Institute 2100 M Street, N.W. Washington, D.C. 20037 REFER TO URI-101 15 WHEN ORDERING. 162-0010-6 "PROGRAM MANAGEMENT AND THE FEDERAL EVALUATOR," by Pamela Horst,-J6e N. Nay, John W. Scanlon, and Joseph S. Wholey, is reprinted with permission from theAMERICAN SOCIETY FOR PUBLIC A 'DMINISTRA TION, appearing in, PUBLIC ADMINISTRATION REVIEW, July/August, 1974. B/77/300 10625 COASTAL ZONE INFORMATION CENTER PROGRAM MANAGEMENT AND THE FEDERAL EVALUATOR Pamela Horst Joe N. Nay John W. Scanlon Joseph S. Wholey DEPARTMENT OF COMMERCE NOAA COASTAL SERVICES CENTER 162-0010-6 2234-SOUTH HOBSON AVENUE CHARLESTON SC 29405-2413 August 1974 Property of csc Library THE URBAN INSTITUTE WASHINGTON, D.C. PROGRAM MANAGEMENT AND THE FEDERAL EVALUATOR Pamela Horst, Joe N. Nay, John W. Scanlon, and Joseph S. Wholey, The Urban Institute In 1969, The Urban Institute completed an performance following evaluation. This situation extensive study of federal evaluation and con- can be phrased as a critical management problem eluded that, "The most impressive finding about which we see confronting government agencies: the evaluation of social programs in the federal government is that substantial work in this field is Why have those in charge of programs and those who almost non-existent."' A linnited resurvey of the evaluate them not been able to join their efforts in a way field in 1972 revealed a quite different picture: that leads more frequently to significant improvements in funds committed to evaluation had mushroomed program performance? many studies had been completed, and the use oi large-scale social experimentation was increasing.2 Having been able both to observe and to This growth in evaluation has contributed participate in the development of federal program information- often imperfect, sometimes incor- evaluation, we have chosen here to raise three rect-to today's arguments about the direction, propositions about the root causes of the above method, and purpose of social programs. Without problem. If these causes are the crucial ones and if evaluation, many arguments would have remained we can come to understand their true impact, at the level of polemic. There is no question that federal program management and evaluation stand the presence of program evaluation has heightened on the edge of a period of increasing success. If the consciousness of federal program managers and not, and the causes of these weaknesses continue policy makers to the fact that they may, from to be ignored, then evaluation, program manage- time to time, have to respond to queries about the ment, the programs themselves, and those the effectiveness of their programs. programs are intended to serve will all continue to While evaluation has firmly established itself su ffe r. since 1969 in both the budget and the administra- This paper elaborates on why the three root tive rhetoric of the federal government, there is causes, when they exist, block further irnprove- little evidence to show that evaluation generally ment of many programs. The idea of a "preassess- leads to more effective social policies or programs. ment" of program evaluability is introduced as one On the contrary, the experience to date strongly tool for improving both program management and suggests that social programs have not been as program evaluation. We begin with a discussion of effective as expected and have not improved in the conventional treatment of evaluation problems JULY/AUGUST 1974 PROGRAM EVALUATION 301 and then present an alternative diagnosis and 9 Lack of Clear Logic: the logic of assump- prescription. Although much of the material pre- tions linking expenditure of resources, the sented is addressed to federal managers and federal implementation of a program intervention, evaluators, we believe the problems and solutions the immediate outcome to be caused by that discussed also hold for state and local government. intervention, and the resulting impact are not specified or understood clearly enough Apparent Causes of Evaluation Problems- to permit testing them. and an Alternative Statement Lack of Management: those in charge of the program lack the motivation, understanding, Most reviews made to determine what causes ability, or authority to act on evaluation programs and their evaluations to be ineffective measurements and comparisons of actual include one or more of the following con- intervention activity, actual outcomes, and clusions:3 actual impact. � Evaluations are not planned to support When one or more of these three propositions is decision making. true, both the problem (lack of significdiit im- � The timing, format, and precision of evalua- provement in program performance) and 'the six tion studies are not geared to user needs. apparent causes listed earlier can easily occur, In � Evaluation findings are not adequately com- cases where the first two propositions hold, an municated to decision makers. enormous range of possibilities will present them- � Different evaluations of the same program selves as to which measurements and comparisons are not comparable. to make-with no criteria for making sound � Evaluation fails to provide an accumulating, choices. In cases where the last proposition holds, increasingly accurate body of evidence. even exceptionally high quality evaluation is not � Evaluation studies often address unanswer- likely to be used well, if used at all. If a program able questions and produce inconclusive suffers from one or more of these three flaws, results. there is a very low probability that evaluation The first three apparent causes deal with information useful to program improvement can aspects of evaluation use. They occur at the be produced. Thus the program may be "unevalu- interface between the producers of evaluations and able" until the flaws are corrected. the prospective users of evaluation. A statement that the quality an .d value of The second three apparent causes deal with the evaluation are strongly affected by the degree to methods used by the evaluators in assessing the which these three conditions exist is not a startling interventions of the programs in society. They finding. What has not been realized or acknewl- occur at the interface between the producers of edged in the past, however, is that these thice evaluation and the program as it exists. They factors are not the responsbility of the evaluator. concern flaws in making measurements and com- While the conventional apparent causes relate to parisons and in drawing conclusions. how the evaluator does his job, these latter three Our experience to date in studying the manage- propositions describe an organizational environ- ment .problem-namely, the lack of significant ment over which the evaluator typically has little improvement in program performance-and in control. Evaluators, more than any other group in watching various agencies attack the apparent an agency, will appear unable to complete their causes of the problem has led us to conclude that work successfully when these conditions @xist, these six statements largely refer to symptoms, regardless of how they deal with the apparent rather than causes. We believe that the causes of causes. the problem may more properly be described by one or more of the following three propositions Why the Apparent Causes Are Suspect concerning the program itself: Lack of Definition: the problem addressed, In the past few years, we have condutted a the program intervention being made, the number of federal program evaluations and ;ielped expected direct outcome of that interven- to develop evaluation planning systems for several tion, or the expected impact on the overall federal agencies. In the course of our work, we society or, on the problem addressed are not have observed many attempts to treat the six sufficiently well defined to be measurable. apparent causes directly by improving the use and juiLY/AUGUST 1974 302 PUBLIC ADMINISTRATION REVIEW methodology of evaluation. These attempts in- act of administration of the. laws and regulations. clude policy review and dissemination panels, Evaluation of success or failure of the act of letting contracts for methodology development, implementation was primarily a matter of assessing high level reviews of evaluation plans, task forces compliance with the guiding laws and regulations. to select better questions for evaluation, better Discretion was at a minimum (at least over the systems for collecting data, requiring program short term). Arguments might take place about offices to submit advance descriptions of how whether goals were, adequate, but the details of the evaluation findings will be used, and the tightening program interpention we're determined in advance. of contract selection and monitoring procedures to In contrast, many new missions that the federal increase contractor responsiveness to agency government has been called upon to undertake needs. In some cases, the "solution" was reor- (e.g., lowering hard core unemployment) involve ganizati'on: centralizing previously decentralized problems in which the proper program interven- evaluation units. Since revenue sharing, talk of tion mechanism is not well understood, or defined, decentralizing a previously centralized evaluation or in some cases even known. Since in these cases office has gained popularity. In this case, the no one knows exactly what det ailed program headquarters office would no longer be responsible intervention will be of value, greater management for conducting national program evaluation, but discretion is allowed and exercised. While in some instead would go into the business of buil ding cases research may be undertaken or experiments local evaluation capability. may be made to increase understanding, more As these proposed solutions were implemented, typically a purportedly successful type of program however, we have continued to talk with and work intervention is simply put into place and an agency with participants in the process from the assistant or bureau is charged with making it into a secretary. level, through the program level, and on successful operation. In this case, evaluation is down to the recipients of services. We find that the expected to report to those in charge of a program management problem-that is, the lack of sig- on whether the use of discretion in choosing nificant improvement of program performance specific prog@am intervention techniques was suc- -continues to exist and the same apparent causes cessful and perhaps to suggest modifications or continue to be cited, whether there are high or low alternatives. quality evaluation efforts. Improvements in pro- The newer program areas are characterized by grams and in delivery of effective services remain uncertainty and discretion: uncertainty as to the far below the levels desired or expected. If the nat'ure of the problem and what constitutes root causes of the problem lay within the evaluation effective strategies of intervention, and discretion process, we believe that these correctives would be in how the problem and the intervention are showing some degree of success. Consequently this defined and how the intervention is implernented. experience led us to search for alternative explana- These conditions make sound and rapid evaluation tions and, finally, to the consideration of the three all the more important to effective management. conditions stated above as root causes of the Consider, however, how today's program environ- problem. ments can disable evaluation through three, factors: lack of definition, lack of a clear logic, The Source of the Problem and lack of management. Lack of Definition The significance of the proposed causes can'. best be understood by contrasting the nature of Examination of program legislation, regula- the intervention that the social programs of today tions, policy manuals, plans, and budget to deter- attempt to make in the society at large with the mine what a program intervention is can be very principal types of prograrn interventions at- deceptive. What at first seems clear often evapo- tempted in the past. Many older, classical govern- rates when the test of measurability is applied. The ment activities involved program interventions language used turns out to be ambiguous precisely whose nature was clearly defined and agreed upon where it would have to be specific in order for and which were described in detail in a body of evaluation to be.useful. Three common forms of law or regulation (e.g.,' Social Security). The inadequate language are: the vaporous wish, local implementation of such activities was largely an project packaging, and,how-to-do-it rule making. JULY/AUGUST 1974 PROGRAM EVALUATION 303 The vaporous wish is the eloquent but elusive When these three forms of language pre- language of goals put forward for most federal dominate, the intervention activities in the field programs. Exactly what are the "unemploy- may be diverse indeed. Our experiences examining ability," "alienation," "dependency," and "com- field operations indicate that program packaging is munity tensions" some programs desire to reduce? generally skin deep and that very different project How would one know when a program crossed the activities and definitions of outcome often parade line, successfully converting "poor quality of life" under the same assumed program names. An into "adequate quality of fife"? Would anyonc examination of 20 projects in the same program recognize "improved mental health," "improved will often reveal 20 very different program in- local capability," or "revitalized institutions"? The tervention designs, different in activity and problems addressed by social programs are almost purpose. This means that the program activity and never stated so that institutions, people, or the objective, as implemented in the field, cannot be relevant socioeconomic conditions could be clas- defined on a common base of measurable terms. It sified according to the degree to which they are is often difficult to find any consensus among afflicted with a problem. It is very hard to propose federal level policy makers as to what the defini- a solution to a problem that is ill-defined or tional base should be. This lack of a common undefined. How much harder it is to evaluate the framework can disable management and evaluation success of that proposed solution. efforts alike. Next, there is the project packaging language It is becoming clearer that many federal social which purports to describe the intervention activ- programs are simply envelopes for a large federal ity to be planted in the field and the expected investment in a problem area. A program may be outcome for those directly served by that activity. deceptive in the sense that it has enough content As any experienced site visitor will attest, this to allow it to be described in the media, lobbied language is often so annoyingly imprecise that it is into existence, and established as a federal effort - difficult to tell what parts of a local operation are and yet the program interventions are not spelled under discussion and even harder to distinguish out in any detail. Many program administrators compliance or assess performance. For example, over the last decade have essentially received a project characteristics prescribed in various pro- program envelope with only vaporous wishes and gram guidelines include: "coordinative mecha- money inside. Although more detailed definition nisin ... .. integrated services," a "range of modali- may not have been necessary in order to spend the ties," "extended career ladders," "accessibility of money, much more detailed definition is needed services," "continuity of care," "multi-disciplinary to evaluate the process and outcome. teams," "outreach capability," etc. Piojects should If it is decided that certain programs should be produce "upgraded job skills," "increased cultural further defined, who in an agency should be enrichment," "increased personal autonomy," responsible for the tasks? It should not be left to "improved family cohesion," etc. Rarely are useful the auditors or to the evaluators or to the measures or norms for these activities and out- information system people, because the choice of comes provided. specific measurable definitions is not merely a How-to-do-it rule making is the third kind of technical task. The definition of what is to be language that is commonly found. Here the terms measured in a program is central to policy making are very concrete and specific. We find guidance and program management. If there are many on factors like the qualifications of project di- different ways to measure the problem a social rectors, the contents of affiliation agreements with program purports to influence, this often means ,other local agencies, reporting relationships, the that there are many different problems. For many use of consultants, and accounting practices. This programs, no one has yet exercised the prerogative guidance appears to be definite and all inclusive. of selecting which specific set of social ills the Closer examination shows that it usually tells how program is trying to cure or the methods of cure. to run the part of a project which does not deal Legislation or regulations rarely make this choice, directly with the intervention into society. and the choice has policy implications since it Guidance for the part of the project which further specifies program intent and intervention. actually produces effects in society is not One of the major factors in shaping and directing a provided. program is carefully selecting what the program-is JULY/AUGUST 1974 304, PUBLIC ADMINISTRATION REVIEW going to do. The failure to define, measurable local settings. He is likely to find, on the contrary, interventi 'ons, outc 'omes, and impact for a program that a "halfway house" or a "the 'rapeutic com- is a major policy making defect. Those in charge of munity" in one locale bears no resemblance in the' agency and the, program, rather than the operating assumptions to others which go by the evaluators, should have primary responsibility for same name. A*fter spending a lot of money, time, program definitions. and effort, the evaluator will be forced to tell ttie Lack of a Clear Logic of Testable Assumptions agency what types of programs. are really out there, rather than how successful they are, and Even if the policy makers or program. mana- also that the only way to test the effectiveness of gers have provided measurable definitions, there alternative assumptions of treatment is to im- still may not be unanimity within a federal social plement a program-level exper iment, or introduce program about design or logic. As a result, planned and enforced variations into the program different evaluation efforts are often based on design. Those in charge of the program may feel different assumptions lir@ng program intervention that the evaluator has once again failed to answer with immediate outcome and ultimate program their questions. There@ are many examples of impact. The measures and data collection instru- evaluations being mounted to answer questions ments used are those that seem most reasonable to which bear no relationship to the program activity the evaluator. In this context it is easy to actually taking place in the field. This counter- understand why evaluation findings are often productive practice results from the failure of the noncomparable. When there is no carefully deter- agency to describe carefully the program assump- mined framework to guide the program, there is, tions so that they can be implemented and tested. of course, no such framework for evaluation Summing up, even when the intervention, studies. Nor is there a framework for systematical- expected outcome, and impact are defined in ly accumulating knowledge of program per- measurable terms, the more subtle questions of the formance. In fact, it becomes unclear what pro- logic linking (a) program expenditures to produc- gram performance means. tion of the i 'ritervention, (b) intervention to Program assumptions might be as simple as that outcome, and (c) outcome to impact on the "the transfer of money to school districts will raise problem must still be considered. The use of the the reading level of disadvantaged students" or word "logic" here is not meant to imply that the that "the training of the unemployed will lower linking assumptions are loose or tight, valid or unemployment." Often the borad program char- invalid, defensible or stupid. All that is implied is ters from the Congress referred to earlier have that a program in reality is based on. an inter- caused clusters of competing assumptions to grow related set of assumptions about what is believed up in many social programs. One set of assump- to happen (and sometimes why) when money is tions may be used for arugments with friends, for spent and the intervention made. The absence of instance, and another for argument with enemies. statements of these assumptions might be ex- This may be good politics, but it makes for pected to cause a problem for both program difficult evaluation design, since evaluation design managers and evaluators. The evaluators often should relate to the information needed to vali- notice the absence first, however, because they date, refute, or modify a set of operating assump- must. design tests of these assumptions. Tests tions. cannot be designed for people who are unable to, Without an adequate description of the assump- or refuse to, state their assumptions. tions governing the intervention of a program into Once again (as with measurable definitions) the society, it is more likely that evaluators will be statement of the logic of testable. assumptions is.a asked: to address unanswerable questions far re- policy question, not one that should be decided by moved from the actual activities taking place. To the evaluators. Evaluators should test the assump- take, a quiteJeasonable example, a program office tions, about what works. Those in charge should might insist on funding an evaluation to assess the make ' the initial assumptions underlying the fund- relative effectiveness of different.drug treatment ing and operation of the program. modalities. The evaluator may then find that these Lack ofManagement modalities do not represent pure, mutually ex*- clusiye approaches. which are replicated in multiple To get at the significance of lack of manage- JULY/AUGUST 1974 PROGRAM EVALUATION 305 ment, it is important to realize that evaluation is it purports to @accomplish. @ If evaluation is to useful only if it is, in fact, a tool of management. contribute to program improvement, there must A manager has a variety of tools to employ that be at least a few decision areas where the manager include direction of his line management, plan- will rely on program performance feedback (meas- ning, budgeting, audit and financial control, ad- ures of impact, outcome, intervention activities), ministration (for that part of his activity that can as well as on political pressures, popular ap- be clearly defined and where a law or set of rules is proaches, or his own hunches and beliefs. Else why used to guide program implementation), policy buy evaluation at all? The "textbook manager" analysis,. and evaluation. Evaluation is needed knows in advance and can specify what level of principally in support of policy analysis and evidence will prompt him to act at all, or cause management discretion. Evaluation performs the him to select among alternative actions. Further, same function for management that audit and he has the authority to act. control do for budgeting and that compliance Return with us now to reality, where the checks do for administration. typical government administrators live. These ad- One way of understanding the role of evalua- ministrators participate in continual agency debate tion as a management tool is to explore how a over program issues, but the debates proceed in a "textbook manager" might use evaluation in at- language which means different things to different tempts to improve program performance and then people, The debates are not centered on a measur- to contrast that with the way evaluation more able set of program descriptions nor are the frequently is used. assumptions guiding the program intervention Evaluators and 'the "textbook manager". co- made clear enough to be testable. In fact, most of operate very well.When the policy decisions about the people in this world will go to great lengths to program design are to be made, the evaluator asks keep these two things ambiguous in order to' the manager to specify the measurable definitions, expand their area for maneuver. The administrator the assumptions of the program linking these is a decision maker-he does take action. As in definitions, what kind of performance data would "textbook" management, many of his actions are cause the manager to act, and the kinds of action based on guesses about what is needed, shifting the manager has the authority and willingness to academic opinions and political support, and the implement. Armed with this guidance, the eval- demands of a set of higher level policy makers uator estimates the level of error associated with subject to continual turnover. Unlike the "text- collecting the evidence, estimates the ranges of book" management,* however, the typical govern- possible findings, and bounds the cost of the ment administrator does not establish and test proposed evaluation. The evaluator is then equip- assumptions linking intervention activities to ' pro- ped to provide a service not commonly rendered At gram performance. Typical government adminis- present. He can advise management on the cost tration might be called "pseudo-management," and feasibility of procuring evaluation evidence, because all its management activity takes place in a and the manager can weigh these factors against process that is not linked to actual program the potential value of evidence for improving results. In its own terms, such "pseudo- program performance. When the evaluation is final- management" is good if its activities remain ly commissioned, the evaluator has a clear basis for acceptable to an ever-changing cast of characters,at judging-the best level of aggregation, precision, and the policy level. delivery schedule because he has a user for the Evaluators and pseudo-managers operate inde- proposed evaluation. Many market surveys and pendently of one another. There is no basis for internal evaluations are conducted this way in communication between them. The pseudo- industry. When this kind of rational planning manager has no real use for evaluation and the occurs, one does not generate evaluation studies in evaluator can provide few, if any, services to assist search of users and uses. in pseudo-management. In fact, sound evaluation The utility of social program evaluation de- results may present a clear and present danger to pends at least in part upon 'defining the decision the pseudo-manager. In this environment,'. the context as well as the program design. The evaluator can expect his work to have minimal "textbook manager" has already defined his pro- impact. The problem for the evaluator ; is to gram in measurable terms and has indicated what distinguish pseudo-management from textbook JULY/AUGUST 1974 306 PUBLIC ADMINISTRATION REVIEW management. On the surface it appears to us that meaning in relation to particular actions the data psuedo-managment predominates ..n social agen- may suggest. But as we saw earlier, if everything. is cies; the potential for textbook management is yet left ambiguous, no one will know what level of unknown. evaluation findings would or should prompt action Our emphasis on identifying actual users of and therefore what level of validity and reliability evaluation and on pre-specifying the decision are required in the evaluation data. This means context and uses of evaluation information may that, in our example, drug program evaluations seem excessive. Yet the desired use of evaluation which show cure rates of 2 per cent, 5 per cent, 20 information determines not only how much it is per cent, 50 per cent, and 75 per cent could all be worth but also the form and accuracy that ii must dismissed by the pseudo-managers as "incon- have. And if those in charge of a program have no clusive" for decision making. , use for information about that program, then When a single individual does not have the there is no real way to design an adequate authority to take or to elaborate on the kinds of evaluation for them. What this might mean may be action mentioned above, those individuals whose demonstrated by an example. consensus is required must be found and con- Assume that a federal drug treatment program sulted. The point is that management of a prograrn for heroin abusers defines outcome success in the is a policy matter. Evaluation cannot prescribe following terms: the client reveals absence of management actions. Rather, the needs of manage- heroin use six months after discharge from treat- mentshould define evaluations. ment, as tested by three randomly spaced urin- alyses during the follow-up period and one urin- The Consequences of Evaluating When These alysis at the end of the six months. Those in Conditions Exist charge of the program say that they require information about this outcome to assist in deci- Why should the evaluator worry about the soft, sions about the following: allocation of technical unmeasurable underbelly of social program goals, assistance among drug treatment projects, realloca- objectives, and activity; about the obscure logic of tion of funds among projects, and assignment of program assumptions; or about whether there. is a headquarters staff to study problems associated management vacuum? If our analysis is correct, with achieving a desirable outcome level. But weaknesses in these areas can disable an evaluation suppose those in charge are challenged to specify effort while making the failure appear to be. the in advance how decisions might vary with the evaluator's own doing. range of possible evaluation findings. For example, If the ..agency evaluator, alone or with a will a task force convene for program redesign if contractor, attempts to carry out an evaluation of national program cure rates average 5 per cent, 15 a program where these flaws exist, our experience per cent, or 50 per cent? Will technical assistance indicates that there are two highly likely out- be given to projects whose average cure rate falls comes. First, the evaluator's attempt to define the below 5 per cent? Is there technical assistance to program in measurable and logical terms will give? Can projects be closed down? Will a stated flounder. No available methodology. can bridge the national objective of a 30 per cent cure rate be gap between the program as implemented in the adjusted downward, if the actual average cure rate field and the program as suggested by program found is 15 per cent? This type of dialogue would goal statements. Thus the results of his evaluation permit the evaluator to assess the potential value are likely to be labeled "inconclusive," "abstract," of evaluation information by identifying plausible or "an effort to develop methodology." Second, and practical uses -for it and also permit the his findings will not be responsive to the informa- evaluator to assess the specific type and accuracy tion needs of those in charge of the program. He of the information required. may produce the wrong informationQr inform 'a- The level of validity and reliability required in tion that is too imprecise or too sparse. Even if the measurable data should be an important factor evaluation is technically unimpeachable, those in used in analyzing the method of collection, the charge of the program may find it irrelevant to cost of data collection, and the methods and cost their decision context, seein-'no way to act upon 9 of. data analysis before data collection efforts ever the information.. begin. The "conclusiveness',' of data only takes on We have suggested that the definition of JULY/AUGUST 19,74 PROGRAM EVALUATION 307 measurable program design and of testable assump- program and its information base, and can make tions about how the program works is a major clear whether a major evaluation effort is or is not policy issue which should be resolved by policy warranted. In essence, the three root causes of makers and program managers within the discre- problems in program evaluation can be trans- tionary boundaries of program legislation. Program formed into a set of criteria for determining the policy making is not the job of the agency evaluability of a public program. These criteria are evaluator, and lie should not undertake the task expressed in the following questions: even if it is disguised as a "technical" choice of he 0 Are the problems, intended program inter- proper program measures needed to conduct an ventions, anticipated outcomes, and the ex- evaluation. pected impact sufficiently well defined as to Some Sources of Leverage be measurable? 0 In the assumptions linking expenditure to Is there a strategy that evaluators can adopt to implementation of intervention, intervention return the jobs of policy making and program to the outcome anticipated, and immediate management to policy makers and evaluators- outcome to the expected impact on the and improve the utility and yield of evaluation problem, is the logic laid out clearly enough (and program) dollars? Fortunately, some factors to be tested? in the present federal environment may supply the 9 Is there anyone clearly in charge of the leverage needed to force attention to the three program? Who? What are the constraints on conditions (lack of definition, lack of clear logic, his ability to act? What range of actions and lack of management) that have proven costly might he reasonably take or consider as a to program effectiveness and evaluation. result of various possible evaluation findings First, there is less naivet6 about federal social about the measures and assumptions dis- programs today. More awareness exists that attack- cussed above? ing a vague problem with an unproven social In a sense the criteria are sequential. Measurable behavioral, or economic theory is not likely to definitions form a basis for the testable assump- bring success. Raising issues about program defini- tions. Then both serve as a basis for the considera- tions and assumptions is now more likely to strike tion of the range of decisions that those in charge a responsive chord in this climate. Secondly, the of the program might make as a result of informa- federal budget is not expanding rapidly, and the tion about actual costs, interventions, outcomes, present Administration and the Congress are plac- and impact. ing more emphasis on accountability. Third, both In practice the evaluator will have to judge the the Congress and citizens are pushing for more degree to which the three criteria are satisfied for effective delivery of public services, and more particular programs. The evaluator generally has evidence of effectiveness. several programs in his agency that can be eval- The evaluator, with some help from high level uated at any one time. In initial planning, the policy makers and program managers, may be able evaluator should focus on testing each program to take advantage of these potential sources of against these three criteria, using the best informa- leverage and use them to force the definition that tion available from the programs themselves to makes evaluation possible. At least he may assure assess how valuable each program may be. This that his efforts are expended in areas where there assessment should be discussed directly with is the best chance of success. The tool that we policy makers and program officials. The interac- recommend he employ is a "preassessment of tion between evaluator and program officials may evaluability" for every program that is a candidate assist policy makers and program officials to for evaluation. defirie the measures and specify the logic of assumptions that need to be tested. I Preassessment of Evaluability The next task is to decide which programs meet all three criteria. Then programs that meet some We recommend a process of pre-evaluation criteria, or almost meet aH criteria, may be sorted 4 design. If conducted in proper detail, this process out. Finally, in most agencies, a third group of can provide what might be called a "rapid feed- programs will emerge which satisfy few-if any-of back evaluation" of the present status of a the three criteria. JULY/AUGUST 1974 308 PUBLIC ADMINISTRATI ON REVIEW At this point the evaluator will have completed serious problems on the list to the attention of the his own pr6assessment of the "evaluability" of the top level of the agency hierarchy so they will programs of his agency.. It is almost useless to know which programs are or are not evaluable, and explore questions of use and methodology for why. programs that clearly do not meet the criteria. The These actions may be very risky things to do in next and fmal step is both a possible source of many agencies, but it can prevent a lot of useless leverage for the evaluator and a somewhat risky evaluation attempts and later recrimination. We business in many agencies. believe that they would force improvements in programperformance as well. Clearly Naming the Problem for Others The evaluator has now created three fists of Notes programs: "evaluable," "potentially evaluable with 1. Joseph S. Wholey, et al., Federal Evaluation Policy, further program or management definition," and (Washington, D.C.: The Urban Institute, 1970). 11 2. Garth N. Buchanan and Joseph S. Wholey, "Federal not evaluable." Since these problems are now Level Evaluation," in Evaluation, Vol. 1, No. 1 (Fall understood to involve policy and management 1972), pp. 17-22. questions, as well as evaluation design questions, 3. For a concise overview of the literature in which these the list has two uses. criticisms have been put forward see Francis G. Caro First, the evaluator should evaluate only the (ed.), Readings in Evaluation Research, Russell Sage Foundation, 197 1, pp. 9-15. In our own work we have programs that are evaluable. He should agree to had access to unpublished internal assessments of help with the definitional problems of potentially evaluation efforts by several federal agencies; the evaluable programs. But he should not hesitate to majority of these note agency dissatisfaction, with name the nature of the problem. The evaluator their evaluation product and identify many of these should tell policy makers and program managers apparent causes as major influences. 4. See John D. Waller and John W. Scanlon, Urban whether their programs are or are not evaluable, Institute Plan for the Design of an Evaluation (Wash- and why. Second, the evaluator should bring the ington@ D.C.: The Urban Institute, March 1973). JULY/AUGUST 1974 OTHER SELECTED URBAN INSTITUTE PUBLICATIONS BOOKS Federal Evaluation Policy: Analyzing the Effects of Public Programs, Joseph S. Wholey, John W. Scanlon, Hugh G. Duffy, James S. Fukumoto, and Leona M. Vogt, 1970, URI 40001, Paperback, $2.95 Urban Processes: As Viewed By The Social Sciences, Kenneth J. Arrow, James G. March, James S. Coleman, Anthony Downs, and William Gorham, 1970, URI 20001, Paperback, $1,95 The Unemployment-Inflation Dilemma: A Manpower Solution, Charles C. Holt, C. Duncan MacRae, Stuart 0. Schweitzer, and Ralph E. Smith, 1970, URI 60003, Paperback, $2.95 Blacks and Whites: An Experiment in Racial Indicators, Michael J. Flax, 1971, URI 60002, Paperback, $1.50 New Towns In-Town: Why a Federal Program Failed, Martha Derthick, 1972, URI 7000 6, Paperback, $2.95 Public Prices For Public Products, Selma J. Mushkin, Editor, 1972, URI 90009, Paperback, $6.50,- URI 90010, Hard cover, $10.95 What is Revenue Sharing?, Charles J. Goetz, 1972, URI 13000, Paperback, $1.95 Forecasting Local Government Spending, Claudia DeVita Scott, 1972, URI 50010, Hard cover, $4.95 The Struggle to Bring Technology to Cities, 1971, URI 70001, Paperback, $1.95 How Clean is Our City? A Guide for Measuring the Effectiveness of Solid Waste Collection Activities, Louis H. Blair and Alfred 1. Schwartz, 1972, URI 10004, Paperback, $1.95 How Shall We Collect The Garbage? A Study in Economic Organization, Dennis R. Young, 1972, URI 10008, Paperback, $1.95 Measuring the Effectiveness of Local Government Services: Recreation, Harry P. Hatry and Diana R. Dunn, 1971, URI 70002, Paperback, $1.75 An Introduction to Sample Surveys for Government Managers, Carol H. Weiss and Harry P. Hatry, 1971, URI 30003, Paperback, $1.50* Obtaining Citizen Feedback: The Application of Citizen Surveys to Local Governments, Kenneth Webb and Harry P. Hatry, 1973, URI 18000, Paper- back, $1.95 The High Cost of Education in Cities: An Analysis of the Purchasing Power of the Educational Dollar, Betsy Levin, Thomas Muller, and Corazon Sandoval, 1973, URI 31000, Paperback, $2.50 University Urban Research Centers, Second Edition, Grace M. Taher, Editor, 1971-72, URI 10002, Paperback, $2.75 Cable Television in the Cities: Community Control, Public Access and Mi- nority Ownership, Charles Tate, Editor, 1971, URI 80004, Paperback, $3.95 Governing Metropolitan Areas: A Critical Review of Councils of Govem- ment and the Federal Role, Melvin B. Mogulof, 1971, URI 70004, Paper- back, $2.25 Measuring the Effectiveness of Local Government Services: Transportation, Richard E. Winnie and Harry P. Hatry,1972, URI 16000, Paperback, $1.95 C@,Y,)rsl RL ZONE ITIFORMAT10% CEWE' 1 0 1 I IM THE URBAN INSTITUTE 2100 M StrE I I I 1 037 3 6668 00002 5132 @