Federal Judgeships: The General Accuracy of the Case-Related	 
Workload Measures Used to Assess the Need for Additional District
Court and Courts of Appeals Judgeships (30-MAY-03, GAO-03-788R). 
                                                                 
Biennially, the Judicial Conference, the federal judiciary's	 
principal policymaking body, assesses the judiciary's needs for  
additional judgeships. If the Conference determines that	 
additional judgeships are needed, it transmits a request to	 
Congress identifying the number, type (courts of appeals,	 
district, or bankruptcy), and location of the judgeships it is	 
requesting. In 2003, the Judicial Conference sent to Congress	 
requests for 93 new judgeships--11 for the courts of appeals, 46 
for the district courts, and 36 for the bankruptcy courts. In	 
assessing the need for additional judgeships, the Judicial	 
Conference considers a variety of information, including	 
responses to its biennial survey of individual courts, temporary 
increases or decreases in case filings, and other factors	 
specific to an individual court. However, the Judicial		 
Conference's analysis begins with the courts of appeals--weighted
case filings and adjusted case filings, respectively. These two  
measures recognize, to different degrees, that the time demands  
on judges are largely a function of both the number and 	 
complexity of the cases on their dockets. Some types of cases may
demand relatively little time and others may require many hours  
of work. Generally, each case filed in a district court is	 
assigned a weight representing the average amount of judge time  
the case is expected to require. Using these measures, individual
courts whose past case-related workload meets the threshold	 
established by the Judicial Conference may be considered for	 
additional judgeships. Authorized judgeships are the total number
of judgeships authorized by statute for each district court and  
court of appeals. The Judicial Conference relies on these	 
quantitative workload measures to be reasonably accurate rests in
turn on the soundness of the methodology used to develop them.	 
Whether those measures are reasonably accurate rests in turn on  
the soundness of the methodology used to develop them. Our	 
objectives were to (1) determine whether the methods the Judicial
Conference uses to quantitatively measure the case-related	 
workload of district court and court of appeals judges results in
a reasonably accurate measure of judges' case-related workload,  
(2) asses the reasonableness of any proposed methodologies to	 
update the workload measures, and (3) obtain information from the
Administrative Office of the U.S. Courts on the steps the	 
Judiciary takes to ensure that the case filing data required for 
these workload measures are accurate.				 
-------------------------Indexing Terms------------------------- 
REPORTNUM:   GAO-03-788R					        
    ACCNO:   A07042						        
  TITLE:     Federal Judgeships: The General Accuracy of the	      
Case-Related Workload Measures Used to Assess the Need for	 
Additional District Court and Courts of Appeals Judgeships	 
     DATE:   05/30/2003 
  SUBJECT:   Courts (law)					 
	     Evaluation methods 				 
	     Judges						 
	     Performance measures				 
	     Work measurement					 
	     Labor force					 

******************************************************************
** This file contains an ASCII representation of the text of a  **
** GAO Product.                                                 **
**                                                              **
** No attempt has been made to display graphic images, although **
** figure captions are reproduced.  Tables are included, but    **
** may not resemble those in the printed version.               **
**                                                              **
** Please see the PDF (Portable Document Format) file, when     **
** available, for a complete electronic file of the printed     **
** document's contents.                                         **
**                                                              **
******************************************************************
GAO-03-788R

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures United
States General Accounting Office

Washington, DC 20548

May 30, 2003 The Honorable Lamar Smith Chairman Subcommittee on Courts,
the Internet,

and Intellectual Property Committee on the Judiciary House of
Representatives

Subject: Federal Judgeships: The General Accuracy of the Case- Related
Workload Measures Used to Assess the Need for Additional District Court
and Courts of Appeals Judgeships

Dear Mr. Chairman: Biennially, the Judicial Conference, the federal
judiciary*s principal policymaking body, assesses the judiciary*s needs
for additional judgeships. 1 If the Conference determines that additional
judgeships are needed, it transmits a request to Congress identifying the
number, type (courts of appeals, district, or bankruptcy), and location of
the judgeships it is requesting. In 2003, the Judicial Conference sent to
Congress requests for 93 new judgeships-- 11 for the courts of appeals, 46
for the district courts, and 36 for the bankruptcy courts. 2 In assessing
the need for additional judgeships, the Judicial Conference considers a

variety of information, including responses to its biennial survey of
individual courts, temporary increases or decreases in case filings, and
other factors specific to an individual court. However, the Judicial
Conference*s analysis begins with the

quantitative case- related workload measures it has adopted for the
district courts and courts of appeals* weighted case filings and adjusted
case filings, respectively. These two measures recognize, to different
degrees, that the time demands on judges are largely a function of both
the number and complexity of the cases on their dockets. Some types of
cases may demand relatively little time and others may

1 The Chief Justice of the United States presides over the Conference,
which consists of the chief judges of the 13 courts of appeals, a district
judge from each of the 12 geographic circuits, and the chief judge of the
Court of International Trade. The Conference meets twice a year. 2 This
report covers the methodology used to develop the case- related workload
measures for district court and courts of appeals judges. We recently
testified on the methodology used to develop the case- related workload
measure for bankruptcy judges. (See Federal Bankruptcy Judges: Weighted
Case Filings as a Measure of Judges* Case- Related Workload, GAO- 03- 789T
(Washington, D. C.: May 22, 2003)).

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 2
require many hours of work. Generally, each case filed in a district court
is assigned a

weight representing the average amount of judge time the case is expected
to require. A case with a weight of 3.0, for example, would be expected to
take twice as much time as a case with a weight of 1.5. In the courts of
appeals, pro se case filings* those in which one or both parties are not
represented by an attorney* are weighted at 0.33 and all other case
filings at 1.0.

Using these measures, individual courts whose past case- related workload
meets the threshold established by the Judicial Conference may be
considered for additional judgeships. These thresholds are 430 weighted
case filings per authorized judgeship for district courts and 500 adjusted
case filings per three- judge panel of authorized judgeships for the
courts of appeals (courts of appeals judges generally hear cases in
rotating panels of three judges each). Authorized judgeships are the total
number of judgeships authorized by statute for each district court and
court of appeals.

The Judicial Conference relies on these quantitative workload measures to
be reasonably accurate measures of judges* case- related workload. Whether
these measures are reasonably accurate rests in turn on the soundness of
the methodology used to develop them. As agreed with your office, our
objectives were to (1) determine whether the methods the Judicial
Conference uses to quantitatively measure the case- related workload of
district court and court of appeals judges

results in a reasonably accurate measure of judges* case- related
workload, (2) assess the reasonableness of any proposed methodologies to
update the workload measures, and (3) obtain information from the
Administrative Office of the U. S. Courts (AOUSC) on the steps the
Judiciary takes to ensure that the case filing data required for these
workload measures are accurate. The information for the last objective is
presented in enclosure I. The scope of our work specifically excluded any
analysis of how the Judicial Conference used the case- related workload
measures to develop its current judgeship request.

Results in Brief

The district court weighted case filings, as approved in 1993, appear to
be reasonably accurate and are based on a reasonable methodology. However,
they are about 10 years old, and we have concerns about the research
design approved to update them.

Overall, the weighted case filings, as approved in 1993, appear to be a
reasonably accurate measure of the average time demands that a specific
number and mix of cases filed in a district court could be expected to
place on the district judges in that court. The methodology used to
develop the weights used a valid sampling procedure, developed weights
based on actual case- related time recorded by judges from case filing to
disposition, and included a measure (standard errors) of the statistical
confidence in the final weight for each weighted case type. Without such a
measure, it is not possible to assess the accuracy of the final case
weights. However, the case weights are about 10 years old, and the data on
which the weights were based are as much as 15 years old. Changes since
1993, such as the characteristics of cases filed in federal district
courts and changes in case management practices, may have affected whether
the 1993 weights continue to be a reasonably accurate measure of the
average time burden on district court judges resulting from a specific
volume and mix of cases. Some of these changes may have increased time
demands;

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 3
others may have reduced time demands. To the extent that the current case
weights

understate or overstate the total case- related time demands on district
judges, the weights could potentially result in the Judicial Conference
understating or overstating the need for new district court judgeships.

The Judicial Conference*s Subcommittee on Judicial Statistics has approved
a research design for updating the current case weights, and we have some
concerns about that design. The design would include limited data on the
time judges actually spend on specific types of cases. Much of the time
data used would be based on consensus estimates from groups of experienced
judges. Such data cannot be used to develop an objective, statistical
measure of the accuracy of the final case weights. Without such a measure,
it is not possible to determine whether the case weights are in fact a
reasonably accurate measure of case- related judge workload. In assessing
the need for judgeships in specific courts, the Judicial Conference relies
on the case weights to be a reasonably accurate measure of judges* case-
related workload.

Unlike the district court case weights, the adjusted filings workload
measure for appellate judges is not based on any empirical data regarding
the time that different types of cases required of courts of appeals
judges. The adjusted filings workload measure basically assumes that all
cases have an equal effect on judges* workload with the exception of pro
se cases* those in which one or both parties are not represented by a
lawyer* which are weighted at 0.33, or one- third as much as all other
cases. In the documentation we reviewed, we found no empirical data to
support that assumption. The current court of appeals case- related
workload measure, adopted in 1996, reflects an effort to improve the
previous measure, which may have tended to overstate judgeship needs. At
the time the current measure was developed and approved, using the new
benchmark of 500 adjusted case filings resulted in judgeship numbers that
closely approximated the judgeship needs of the majority of the courts of
appeals, as the judges of each court of appeals perceived them. However,
on the basis of the documentation we reviewed, there is no empirical basis
on which to assess the accuracy of adjusted filings as a measure of case-
related workload for courts of appeals judges.

Weighted Case Filings: District Judge Case- Related Workload Measure Is
Reasonably Accurate, but 10 Years Old, and the Plan to Update It Raises
Some Concerns

The purpose of the district court case weights was to create a measure of
the average judge time that a specific number and mix of cases filed in a
district court would require. Importantly, the weights were designed to be
descriptive not prescriptive* that is, the weights were designed to
develop a measure of the national average amount of time that judges
actually spent on specific types of cases, not to develop a measure of how
much time judges should spend on various types of cases. Finally, the
weights were designed to measure only case- related judge workload. Judges
have noncase- related duties and responsibilities, such as administrative
tasks, that are not reflected in the case weights.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 4
Case Weights Measure Average Judicial Time Demands

With a few exceptions, such as cases that are remanded to a district court
from the courts of appeals, each civil and criminal case filed in a
district court is assigned a case weight that varies from 0.031 (for cases
involving defaulted student loans or veterans benefit overpayments) to
5.99 (for death penalty habeas corpus cases) based on the subject matter
of the case. 3 The weight of the overall average case is 1.0. All other
case weights were established relative to this national average case. 4
Thus, a case with a weight of 0. 5 would be expected to require on average
about half as much judicial time as the national average case. Conversely,
a case with a weight of 2. 0 would be expected to take twice as much time
as the national average case. Case weights for criminal felony cases are
applied on a per defendant basis. 5 For example, the case weight for
heroin/ cocaine distribution is 2.27. A heroin/ cocaine distribution case
with two defendants would be weighted at 4.54* two times the assigned
weight of 2.27. The actual amount of time a judge may spend on individual
cases of any specific type may be more or less than the national average
for that type of case.

The total annual weighted filings for a district are determined by summing
the case weights associated with all the cases filed in the district
during the year. Weighted case filings per authorized judgeship is the
total annual weighted filings divided by the total number of authorized
judgeships. For example, if a district had total weighted filings of 4,600
and 10 authorized judgeships, its weighted filings per authorized
judgeship would be 460. The Judicial Conference uses weighted filings of
430 or more per authorized judgeship as an indication that a district may
need additional judgeships. Thus, a district with 460 weighted filings per
authorized judgeship could be considered for an additional judgeship.

In assessing judgeship needs, the weighted case filings are calculated
using authorized judgeships (a number which includes any vacancies). This
is a measure of the average workload per judge in a district court if all
the court*s authorized judgeships were filled. Calculating the weighted
case filings per active judge* that is, on the basis of the number of
authorized judgeships filled* would show the burden of existing vacancies
on active judges, but not necessarily the need for more judgeship
positions.

3 Weights are assigned to each civil case counted as an original filing,
removal from state courts, or interdistrict transfer (transfers from one
district to another). Weights are also assigned to each felony defendant
counted as an original filing, reopened filing, or interdistrict transfer.
Generally, felonies are those crimes that carry a term of imprisonment of
more than 1 year. Weights are not assigned to civil cases remanded to the
district courts from the courts of appeals, reopened cases, or
multidistrict litigation transfers* cases transferred to a single district
from a number of districts for disposition, such as asbestos or breast
implant litigation. 4 Some types of civil cases were weighted differently
if they involved the United States as a party or were removed from state
court to federal court. 5 The weights do not include nonfelony criminal
cases, which are generally the responsibility of

magistrate, not district, judges.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 5
Case Weights Calculated in 1993 Using Time Data Recorded by Judges

The Judicial Conference approved the use of the current district court
case weights in 1993. The weights are based on a *case- tracking time
study,* conducted between 1987 and 1993, in which judges recorded the
amount of time they spent on each of their cases included in the time
study. 6 The study included about 8,100 civil cases and about 4,200
criminal cases that were generally *tracked* from filing to disposition. 7
All judges who worked on each case were supposed to record the time they
worked

on the case. 8 Data collection for the time study began in November 1987.
Districts were brought into the study over a 2- year period, with the last
district entering the study in January 1990. When a district was brought
into the study, a 2- week period was designated for sampling, during which
all cases filed were included in the time study sample.

At the conclusion of the study, sample cases were grouped into civil and
criminal cases, with individual subclassifications (case types) for each,
such as "Contract: Insurance" and "Bank Robbery." Each sample case had a
value associated with it, which was the total number of minutes reported
by the district judge( s) who worked on it. The number of sample cases in
the subclassifications ranged from 18 to 1,563. Within each
subclassification, a simple average and a standard error were computed.
The averages and standard errors were converted into relative values as
the final step in creating the case weights* that is, all the weights were
calculated relative to the time required for the average case in the
study.

Methodology Used to Develop Case Weights Was Reasonable Overall, the
weighted case filings, as approved in 1993, are a reasonably accurate
method of measuring the average judge time that a specific number and mix
of cases filed in a district court could require. The methodology used to
develop the weights is reasonable. It used a valid sampling procedure,
developed weights based on actual case- related time recorded by judges
from case filing to disposition, and included a measure (standard errors)
of the statistical confidence in the final weight for each weighted case
type.

6 The time study for bankruptcy courts was a *diary study* in which judges
recorded the time spent on case- related and noncase- related work during
a 10- week period. Although each method has different strengths and
limitations, each method can produce useful, reasonably accurate results.
Enclosure II includes a comparison of these two methodologies. 7 Not all
cases were completed by the end of the study; some were still pending. 8
This included district judges, senior judges, magistrate judges, and
visiting judges. District judges*

nonsenior and senior* exercise the full judicial authority vested in the
district courts. Nonsenior district judges are those who hold a designated
judgeship position and generally carry a full caseload. Senior district
judges are judges who have retired from regular, full- time active service
but remain on the bench and perform such judicial duties as they are
willing and able. Magistrate judges, appointed for a fixed term of years,
exercise the judicial duties permissible by statute and the Constitution
that the district courts delegate to them. Visiting judges are those
visiting from their *home court* to assist in addressing the workload of
the court they are visiting. Visiting judges may or may not be senior

judges. Time reported by magistrate judges was not included in the final
computations of the case weights.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 6
The sampling method was appropriately designed to ensure that all district
judges

and all case types could potentially be included in the sample. The
staggered entries of districts into the study ensured the selection of
case samples were taken throughout the year, reducing or eliminating bias
due to seasonal variation in case filings. Every district court judge
could potentially have been a participant in the study (depending on when
the 2- week window was designated at a given district and case assignments
during that period).

The method of recording the time spent on each case was designed to
capture all judge time spent on a sample case. Although it was not
possible to determine if all reportable judge time was in fact recorded
and reported, validity checks on the reported time were made where
possible. For example, judge- reported courtroom time in each sample case
was compared with the time reported for the same case in the judiciary*s
database on courtroom proceedings.

The empirical data on hours expended on each case in the sample were used
to develop the case weights. The case weights for specific types of cases
were basically determined by dividing the total amount of time judges
reported for that type of case by the number of such cases in the study.
For example, if judges reported a total of 2,000 hours for 200 cases of a
specific type in the study, this would translate into 10 hours per case.
Sampling variability in the estimates based on the time study data was
quantified and provided with the weights. The standard error that is
associated with each weight provides an indicator of variability due to
the weight being produced via a sample, rather than data from the universe
of cases during the study period. The standard errors can be used to
display the statistical reliability of the weighted case filings estimate
for each district. Without some measure of statistical reliability, it is
not possible to objectively assess how accurate the case weights are. The
case weights are relative weights. That is, each case weight was
calculated

relative to the average case as determined in the study, which was
assigned a value of 1.0. For example, a case type with a weight of 2.0
would be expected to require twice as much judge time as the *average*
case. Relative weights were determined by dividing the absolute weight of
each type of case by the weight or value of the average case. The Federal
Judicial Center (FJC) converted absolute weights to relative weights by
dividing the absolute weight values by 2.132. This value was chosen after
FJC conducted research to determine how to produce a new set of relative
weights that they considered to be comparable to the previous set of
relative weights. As described by FJC officials, this approach was
reasonable. For the purposes of applying the national weights to
individual districts, the

methodology assumed two things: (1) that the district*s judges were
typical of district judges as a whole and (2) that the district*s cases of
any given type were typical of that case type as a whole. This may or may
not have been true, but these are reasonable assumptions given the purpose
of the study* to develop weights based on national averages, not to
develop weights for individual districts or judges.

Research Design for Updating the District Court Case Weights Raises
Concerns The case weights are almost 10 years old, and the time data on
which they were based are as much as 15 years old. Changes since the case
weights were finalized in

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 7
1993, such as changes in the characteristics of cases filed in federal
district courts

and in case management practices, may affect how accurately the weights
continue to reflect the time burden on district court judges today. For
example, since 1993, new civil causes of action (such as telemarketing
issues) and criminal offenses (new terrorism offenses) needed to be
accommodated within the existing case- weight structure. According to FJC
officials, where the new cause of action or criminal offense is similar to
an existing case- weight type, the weight for the closest case type

is assigned. Where the new cause of action or criminal offense is clearly
different from any existing case weight category, the weight assigned is
that for either *all other civil* for civil cases or *all other criminal*
for criminal cases.

The Subcommittee on Judicial Statistics of the Judicial Conference*s
Judicial Resources Committee has approved the research design for revising
the current case weights, with a goal of having new weights submitted to
the Resources Committee for review in the summer of 2004. The research
would be led by FJC, who developed the research design. Although the
methodology for updating the case weights appears to offer the benefit of
reduced judicial burden (no time study data collection), potential cost
savings, and reduced calendar time to develop the new weights, we have
some concerns about the basic research design.

Our principal concerns are two: the challenge of obtaining reliable,
comparable data from two different automated data systems for the analysis
and the limited collection of actual data on the time judges spent on
cases. Essentially, the design for the new case weights relies on three
sources of data for specific types of cases: (1) data from automated
databases identifying the docketed events associated with cases; (2) data
from automated sources on the time associated with courtroom events for
cases; and (3) consensus estimates from structured, FJC- guided
discussions among experienced judges on the judge- time required for
noncourtroom events in the cases, such as reading briefs or writing
opinions. The design assumes that judicial time spent on a given case can
be accurately estimated by viewing the case as a set of individual tasks
or events in the case. Information about event frequencies and, where
available, time spent on the events would be extracted from administrative
databases and reports, and then used to develop estimates of the judge-
time spent on different types of cases. For event data, the research
design proposes using new technology (the Case Management/ Electronic Case
Filing system) that is currently being introduced into the court system
for recording case management information. However, not all courts have
implemented the new system, and data from the existing and new systems
will have to be integrated in the study. Successfully integrating the data
from these two databases will be a challenge. FJC recognizes this and has
developed a strategy for addressing the issues, which includes forming a
technical advisory group from FJC, AOUSC, and individual courts to develop
a method of reliably extracting and integrating data from the two case
management systems for analysis.

Second, the design for developing the new weights does not require judges
to record time spent on individual cases. A significant limitation of the
time data to be used is that the time data available from existing
databases and reports are limited to time associated with courtroom events
and proceedings, while a majority of district judges* time is spent on
case- related work outside the courtoom. The time required for
noncourtroom events, such as reviewing briefs, will be based on the
consensus of groups of experienced judges. Groups of 8 to 13 district
judges in each of the 12

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 8
circuits (about 100 in all) will meet in a series of structured
discussions to develop

estimates of the time required for different events in different types of
cases within each circuit, using FJC- developed *default values* as the
reference point for developing their estimates. These default values would
be based in part on the existing case weights and in part on other types
of analyses. Following this series of meetings, a national group of 24
judges (2 from each circuit), using structured procedures, will consider
the data from the 12 circuit groups and develop consensus time estimates
for use in developing the weights. These consensus time estimates are
likely to represent a majority of the judge time used to develop the new
weights. These consensus data are dependent upon the experience and
knowledge of the participating judges and the accuracy and reliability of
the judges* recall about the average time required for different events in
different types of cases* about 150 if all case types in the current case
weights were used. The greater the number of events and types of cases for
which judges are asked to make estimates, the greater the demands on
judges to recall accurately the judge time associated with specific events
and types of cases. These consensus data cannot be used to calculate
statistical measures of the accuracy of the resulting case weights. Thus,
it will not be possible to objectively, statistically assess how accurate
the new case weights are* weights on whose reasonable accuracy the
Judicial Conference will rely in assessing judgeship needs in the future.

A concurrent time study using "case tracking" or "diary" methods would be
advisable to identify potential shortcomings of the event- based procedure
and to assess the relative accuracy of the case weights that are produced
using that procedure. In the absence of a concurrent time study, there
would be no objective, statistical way to determine the accuracy of the
case weights produced by the proposed event- based methodology.

Adjusted Case Filings: Courts of Appeals Judge Workload Measure Lacks
Empirical Basis for Assessing Its Potential Accuracy

The principal quantitative workload measure that the Judicial Conference
uses to assess the need for additional courts of appeals judges is
adjusted case filings. We found the adjusted filings workload measure is
based on available data from standard statistical reports for the courts
of appeals. The measure is not based on any empirical data about the judge
time required by different types of cases in the courts of appeals.

The Judicial Conference*s policy is that courts of appeals with adjusted
case filings of 500 or more per three- judge panel may be considered for
additional judgeships. Courts of appeals generally decide cases using
constantly rotating three- judge panels. Thus, if a court had 12
authorized judgeships, those judges could be assigned to four panels of
three judgeships each. The Conference may also consider factors other than
adjusted case filings, such as the geography of the circuit or the median
time from case filings to disposition. For 11 of the 12 courts of appeals,
the Judicial Conference counts all case filings equally, with two
exceptions. (There is no specific workload measure established for the D.
C. circuit, as discussed later.) First, cases

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 9
refiled and approved for reinstatement are excluded from total case
filings. 9 Second,

two- thirds of pro se cases* defined by AOUSC as cases in which one or
both of the parties are not represented by legal counsel* are deducted
from total case filings (that is, they are effectively weighted at 0.33).
For example, a court with 600 total pro se case filings in fiscal year
2001 would be credited with 198 adjusted pro se case filings (600 x 0.33).
The remaining nonpro se cases would be weighted at 1.0 each. Thus, a court
of appeals with 1,600 case filings (excluding reinstatements)* 600 pro se
cases and 1,000 nonpro se cases* would be credited with 1,198 *adjusted*
case filings (198 discounted pro se cases plus 1,000 nonpro se cases). If
this court had 6 judges (allowing two panels of 3 judges each), it would
have 599 adjusted case filings per 3- judge panel, and thus, under the
Judicial Conference*s policy, could be considered for additional
judgeships.

The current case- related workload measure for courts of appeals judges,
adopted in 1996, is similar in concept to the measure we reviewed in 1993.
10 Table 1 illustrates the similarities and differences in the two
measures. Although the current workload measure is expressed in terms of
appellate case filings, both the 1986 and 1996 caserelated workload
measures are based on assumptions about the judge workload associated with
merit dispositions. Merit dispositions are cases that are decided on the
legal rights of the parties to the case rather than on technical issues,
such as lack of federal jurisdiction.

The workload measure we reviewed in 1993 was based on 5- year averages of
merit dispositions in each circuit separately, and the result was not
necessarily comparable among circuits because of the different methods
that each circuit used to decide its cases. The current measure uses a
single national standard for all circuits. Using national data on merit
dispositions as a percentage of case filings in 1994, the current workload
measure was based on the assumption that nationally about 55 percent of
all appellate case filings* except for pro se filings and reinstated
filings* result in merit dispositions. Thus, 500 adjusted case filings
would represent 275 merit dispositions* or 20 more than the 255 used in
the 1986 measure. The increase from 255 to 275 was basically a matter of
establishing equity between the district courts and courts of appeals
workload thresholds. To be considered for additional district court
judgeships, the Judicial Conference had raised the threshold from 400 to
430 weighted case filings per judgeship (a 7.5- percent increase). The new
merit dispositions standard raised the threshold for courts of appeals
from 255 to 275 merit dispositions (a 7.8- percent increase).

9 Such cases were dismissed for procedural defaults when originally filed
but *reinstated* to the court*s calendar when the case was later refiled.
The number of such cases, as a proportion of total cases, is generally
small. 10 U. S. General Accounting Office, Federal Judiciary: How the
Judicial Conference Assesses the Need for More Judges, GAO/ GGD- 93- 31
(Washington, D. C.: Jan. 29, 1993).

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 10
Table 1: A Comparison of the 1986 and 1996 Methods of Measuring Case-
Related

Workload for Courts of Appeals Judges

1986 1996

The benchmark for considering additional judgeships in a court of appeals
is 255 merit dispositions per 3- judge panel.

The benchmark for considering additional judgeships in a court of appeals
is 500 adjusted case filings per 3- judge panel. Prisoner petition cases
(a subset of pro se cases) are counted as one- half of an appellate case
filing.

Pro se cases (which include prisoner petitions) are counted as one- third
of an appellate case filing. Uses a 5- year average merit termination rate
for each individual circuit. Uses single standard of 500 adjusted filings
for all courts of

appeals. No other adjustments. Does not count number of appeals reinstated
after procedural

default as part of adjusted filings to prevent double counting of appeals.
Calculations are based on actual 5- year average merit terminations rate
for each court of appeals.

Calculations apply to each circuit*s appellate case filings. Source: FJC
documentation and interviews.

The current court of appeals case- related workload measure represents an
effort to improve the previous measure. As we noted in our 1993 report,
using the previous measure the courts of appeals* own restraint, not the
workload standard, seemed to have determined the actual number of
appellate judgeships the Judicial Conference requested. At the time the
current measure was developed and approved, using the new benchmark of 500
adjusted case filings resulted in judgeship numbers that closely
approximated the judgeship needs of the majority of the courts of appeals,
as the judges of each court perceived them. The current court of appeals
case- related workload measure principally reflects a policy decision
using historical data on filings and terminations. In 1995, the
Subcommittee on Judicial Statistics of the

Judicial Conference*s Judicial Resources Committee sent a survey to the
chief judge of each circuit court of appeals. In the responses, there was
no agreement that either the 500 adjusted filings standard or a weight of
0.33 for pro se cases were the appropriate standards. Unlike the district
court case weights, the adjusted filings workload measure is not based on
empirical data regarding the judge time that different types of case may
require. On the basis of the documentation we reviewed, we determined
there is no empirical basis for assessing the potential accuracy of
adjusted filings as a measure of case- related judge workload.

The D. C. Circuit* Adjusted Case Filings Not Applicable to Its Unusual
Caseload In a report to a Judicial Conference subcommittee, 11 FJC
discussed some of the distinctive features of the Court of Appeals for the
D. C. Circuit. The report noted that approximately 30 percent of the
circuit*s filings in fiscal years 1996- 1997 were administrative agency
appeals that occur almost exclusively in the D. C. circuit and were more
burdensome than other cases in several aspects. On average, these cases

had more independently represented participants per case;

were more likely to have participants with multiple objectives, involve
complex or statutory law, and require the mastery of technical or
scientific information;

had more briefs filed per case;

had a higher proportion of cases that were terminated; and 11 Federal
Judicial Center, Assessment of Caseload Burden in the U. S. Court of
Appeals for the D. C. Circuit, Report to the Subcommittee on Judicial
Statistics of the Committee on Judicial Resources of the Judicial
Conference of the United States (Washington, D. C.: 1999).

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 11

had a higher rate of case consolidation (where two or more cases are
combined for decision).

The report concluded that the need for additional judgeships in the D. C.
circuit should not be measured using the general workload threshold of 500
adjusted case filings per 3- judge panel. However, because no information
was available on judges* actual time expenditures, there was no empirical
basis for suggesting a specific alternative formula for assessing the D.
C. circuit*s judgeship needs. The report also concluded that the D. C.
circuit*s remaining caseload* that is, all cases other than administrative
agency appeals* was generally not distinguishable from the caseloads of
the other circuits. The report suggested several possible ways to
integrate the D. C. circuit into the existing adjusted weighted filings
system, such as giving greater weight to federal agency appeals or
lowering the general threshold of 500 adjusted filings per 3- judge panel
for the D. C. circuit. The Judicial Conference has not yet adopted any
specific workload measure for the D. C. circuit. However, the Judicial
Conference requested no additional judgeships for the D. C. circuit in
2003.

No Judicial Conference Consensus on How to Revise Adjusted Filings
Workload Measure

In 1993, we recommended that the Judicial Conference improve its workload
measure for the courts of appeals. 12 In the last decade, the Judicial
Conference has considered a number of proposals for developing a revised
case- related workload measure for courts of appeals judges, but the
Conference has been unable to reach a consensus on any approach. As part
of its assistance to the Conference in this effort, FJC in 2001 compiled a
document that reviewed previous proposals to develop some type of case
weighting measure for the courts of appeals. 13 Table 2 outlines some of
these proposals and their advantages and disadvantages, as identified by
FJC.

12 U. S. General Accounting Office, Federal Judiciary: How the Judicial
Conference Assesses the Need for More Judges, GAO/ GGD- 93- 31
(Washington, D. C.: Jan. 29, 1993). 13 Federal Judicial Center, Review of
Previous Appellate Case Weighting Proposals, (Washington, D. C.: Aug. 22,
2001).

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 12
Table 2: Past Proposals to Revise the Case- Related Workload Measure for
Courts of

Appeals Judges

Proposal Advantages Disadvantages

1. Estimation of case burden based on actual time required to process the
case.

The quantitative approach would be very thorough.

Empirically based data.

Judges may not be amenable to the time- consuming task of recording the
hours spent on individual cases. Time spent gathering data could

be used elsewhere. 2. Estimate of case burden based on the assessment of
burden of only *certain characteristics* from an already- existing
database of *factors.*

Would not be very timeconsuming for judges.

Would assess the frequencies of certain *factors.*

Analysis of an existing database would save time.

Can use a *wealth* of factors to get a big picture of the caseload burden.
Difficult to agree on which factors

to use.

Difficult to decide if presence and absence of factors is enough
information.

Database and survey accuracy may be compromised.

3. Normative assessment of cases to look qualitatively at the cases as a
whole.

Convenient to extract information from surveys or group discussions.

Difficult to decide which factors to use.

Dependent upon accuracy of judges recall about the case.

Lack of empirically based data. 4. Using multiple regression to use
information about the proportional

mix of cases with different defined characteristics in the different
circuits to account for the differences in case termination level.
Quantitative approach to

determine factors to use. Use of a potentially incomplete model.

Inherent statistical limits.

Cannot assess appellate burdens on a national level.

5. Using district court weights for the appellate system. Already
available data.

Save time by using existing data.

Little consistency between the two court systems.

Sacrifice accuracy. 6. Tallying court opinions (published and
unpublished). Most appellate judge work

leads to production of appellate opinions in chambers.

Necessary information cannot be obtained consistently. 7. Sampling cases
for approximately 3 months for a case- based study (Nov. 8, 1993).

Can project the results of 3 months of cases, to the rest of the year.

There is no way to anticipate possible sample sizes, so cannot make a
statistical prediction.

Source: FJC documentation.

Additionally, there are more proposals that are variations of the above or
combinations of the above. Some of these possibilities have more potential
than others. Generally, methods that rely principally on empirical data on
actual case characteristics and judge behavior (e. g., time expended on
cases) are more appropriate than those that rely principally on
qualitative data because statistical methods can be used to estimate the
accuracy of the resulting workload measure.

Conclusions

Overall, the methodology used to develop the district court case weights
is reasonable, and the resulting case weights are a reasonably accurate
measure of district court judge case- related workload. However, the
weights are about 10 years old, and the time data on which they are based
are as much as 15 years old. Consequently, it is uncertain whether the
case weights continue to be a reasonably accurate measure of the average
district judge time burden resulting from a specific volume and mix of
cases. The Judicial Conference*s Subcommittee on Judicial Statistics has
approved a research design for updating the current case weights, about
which we have two concerns. The design would rely in large part on data
from

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 13
two different case management data systems and it will be a challenge to
reliably and

usefully integrate the data from these two systems for analysis. FJC
recognizes this and is developing a strategy for addressing the issue.
Second, the design includes limited actual data on the time district
judges spend on different types of cases. All the data on noncourtroom
time will be based on estimates developed by 13 groups of experienced
judges (about 124 in all) using structured, guided discussions. These data
cannot be used to calculate statistical measures of the accuracy of the
resulting case weights. Thus, it will not be possible to objectively,
statistically assess how accurate the new case weights are* weights on
whose reasonable accuracy the Judicial Conference will rely in assessing
judgeship needs in the future.

The adjusted case filings workload measure used for the courts of appeals
is not based on actual data about the time that courts of appeals judges
expend on different types of cases. Rather, it represents a policy
judgment of the appropriate workload benchmark for considering new
judgeships that is based on an analysis of past trends in case filings and
merit dispositions. Because of the lack of empirical data on the time
demands on courts of appeals judges, neither we nor the judiciary can
assess whether adjusted filings is a reasonably accurate measure of the
workload of courts of appeals judges. Any methodology to revise the
current workload measure that relies solely on qualitative data is
unlikely to provide reasonably reliable and verifiable estimates of
judges* workload. In 1993, we recommended that the Judicial Conference
develop a better measure of the workload of courts of appeals judges.
Although the Conference has studied many potential methods of improving
its workload measure, it has been unable to agree on any methodology for
doing so.

We recognize that a methodology that provides greater empirical assurance
of a workload measure*s accuracy will require judges to document how they
spend their time on a cases for at least some period of weeks. We believe
that, given the importance and cost of federal judgeships, this would be a
good investment to ensure that the workload measures that are used to
support judgeship requests are reasonably accurate and based on the best
data available using sound research methods.

Recommendations

We recommend that the Judicial Conference of the United States

update the district court case weights using a methodology that supports
an objective, statistically reliable means of calculating the accuracy of
the resulting weights; and

develop a methodology for measuring the case- related workload of courts
of appeals judges that supports an objective, statistically reliable means
of calculating the accuracy of the resulting workload measures and that
addresses the special case characteristics of the Court of Appeals for the
D. C. Circuit.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 14

Agency Comments and Our Response

We provided the Director of the Administrative Office of the United State
Courts and the Director of the Federal Judicial Center with a draft of
this report for comment. Both provided technical comments, which were
incorporated into the report as appropriate. In a May 27, 2003 letter, the
Chair of the Committee on Judicial Resources of the Judicial Conference of
the United States provided comments (see enc. III) that offered four major
observations: (1) the case- related workload in each court district court
and court of appeals for which the Judicial Conference has requested one
or more judgeships considerably exceed the minimum thresholds the
Conference has established for considering additional judgeships in
district courts and courts of appeals; (2) we did not provide the full
context in which the Judicial Conference uses the district court case
weights in assessing district court judgeship needs; (3) the workload of
the courts of appeals entail important factors that have defied
measurement, including significant differences in case processing
techniques; and (4) we did not fully and accurately describe the full
context of the new district court case weighting study.

With regard to the first two observations, the scope of our work was
limited to an assessment of the relative accuracy of the weighted case
filings and adjusted case filings measures of district court judge and
courts of appeals judge workload, respectively. Our report clearly states
that the workload measures we reviewed are one of many factors the
Judicial Conference considers in assessing judgeship needs, although the
assessment begins with these workload measures. With regard to the courts
of appeals, we recognize that there are significant methodological
challenges in developing a more precise workload measure for the courts of
appeals. However, using the data available, neither we nor the Judicial
Conference can assess the accuracy of adjusted case filings as a measure
of the case- related workload of courts of appeals judges. We believe it
is premature to conclude that it is not possible to develop a case-
related workload measure for courts of appeals judges whose accuracy can
be reasonably determined.

The Deputy Director of FJC provided comments in a May 27, 2003 letter (see
enc. IV). Both the FJC Deputy Director and the Chair of the Judicial
Conference*s Committee on Judicial Resources said that we did not fully
describe the proposed methodology for updating the district court case
weights and why this methodology could produce case weights whose accuracy
could be reasonably assessed. We have added language to the report that
provides more detail on the iterative Delphi technique that would be used
to develop the consensus estimates of the judge time required for
noncourtroom events in many different types of cases. FJC agrees that the
Delphi methodology would not support the calculation of standard errors
for the new case weights, but said that it would allow FJC to assess the
integrity of the resulting case weight system. We do not believe that the
proposed methodology can be used to assess the accuracy of weights based
in large part on consensus data. The Delphi technique of guided,
structured discussions inherently relies for its accuracy and reliability
on the experience and knowledge of the participating judges and the
accuracy and reliability of judges* recall about the average time required
for different events in many different types of cases* about 150 if all
case types in the current weights were used. The greater the number of
events and types of cases for which judges are asked to make estimates,
the greater the demands on judges to recall

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 15
accurately the judge time required by those events and types of cases.
Generally, the

Delphi technique is most appropriate when more precise analytical
techniques are not feasible and the issue could benefit from subjective
judgments on a collective basis. However, more precise analytical
techniques are available and were used to develop the current district
court case weights. We believe that any methodology used should support
the calculation of standard errors. Such statistical measures are
essential for assessing the potential error of the weighted case filings
for any specific district that has requested additional judgeship( s).

We believe that the importance and cost of creating new federal judgeships
requires the best possible case- related workload data to support the
assessment of the need for more judgeships. The methodology approved for
the revision of the bankruptcy case weights offers an approach that could
be usefully adopted for the revision of the district court case weights.
The bankruptcy court methodology would use a twophased approach. First,
new case weights would be developed based on the time data recorded by
bankruptcy judges for a period of weeks* a methodology very similar to
that used to develop the current bankruptcy case weights. The accuracy of
the new case weights could be assessed using standard errors. The second
part represents experimental research to determine if it is possible to
make future revisions to the weights without conducting a time study. The
data from the time study can be used to validate the feasibility of this
approach. If the research determines this is possible, the case weights
could be updated more frequently with less cost than required by a time
study. We believe this methodology would provide (1) more accurate
weighted case filings than the design proposed for revising the district
court case weights and (2) a sounder method of developing and testing the
accuracy of case weights that were developed without a time study.

Objectives, Scope, and Methodology

As agreed with your office, our objectives were to (1) determine whether
the methods the Judicial Conference uses to quantitatively measure the
case- related workload of district court and court of appeals judges
results in a reasonably accurate measure of judges* case- related
workload, (2) assess the reasonableness of any proposed methodologies to
update the workload measures, and (3) obtain information from the AOUSC on
the steps the Judiciary takes to ensure that the case filing data required
for these workload measures are accurate. To do this, we obtained and
reviewed documentation on the methodology used to develop the existing
workload measures and proposals to revise those measures from AOUSC and
FJC and interviewed officials at both agencies. We based our assessments
on our experience with and knowledge of sound research design and
generally accepted statistical analysis methods. We also obtained
information on the methods the judiciary uses to ensure the accuracy of
the case filings data on which the workload measured rely. Although the
Judicial Conference considers a number of factors in assessing judgeship
needs for the district courts and courts of appeals, our work focused only
on the relative accuracy of the weighted case filings and adjusted case
filings measures. We did our work in Washington, D. C., in April and May
2003.

- - - -

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 16
We will send copies of this report to interested congressional committees,
the

Director, Administrative Office of the U. S. Courts; Director, Federal
Judicial Center; and the Chair, Committee on Judicial Resources, Judicial
Conference of the United States. We will make copies available to others
on request. In addition, this report will be available at no charge on
GAO*s Web site at http:// www. gao. gov.

If you have any questions about this report, please contact me at (202)
512- 8777. The key contributors to this report were David Alexander, Kriti
Bhandari, Rochelle Burns, and Chris Moriarity.

Sincerely yours, William O. Jenkins, Jr. Director, Homeland Security and
Justice Issues Enclosures - 4

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 17

Enclosure I Quality Assurance Steps the Judiciary Takes to Ensure the
Accuracy of Case Filing Data for Weighted Filings

Whether the district court case weights are a reasonably accurate measure
of district judge case- related workload is dependent upon two variables:
(1) the accuracy of the case weights themselves and (2) the accuracy of
classifying cases filed in district courts by the case type used for the
case weights. If case filings are inaccurately identified by case type,
then the weights are inaccurately calculated. Because there are fewer
categories used in the courts of appeals workload measure, there is
greater margin for error. The database for the courts of appeals should
accurately identify (1) pro se cases (2) reinstated cases, and (3) all
cases not in the first two categories.

All current records related to civil and criminal filings that are
reported to the Administrative Office of the U. S. Courts (AOUSC) and used
for the district court case weights are generated by the automated case
management systems in the district courts. Filings records are generated
monthly and transmitted to AOUSC for inclusion in its national database.
On a quarterly basis, AOUSC summarizes and compiles the records into
published tables, and for given periods these tables serve as the basis
for the weighted caseload determinations.

In responses to written questions, AOUSC described numerous steps taken to
ensure the accuracy and completeness of the filings data, including the
following:

Built- in, automated quality control edits are done when data are entered
electronically at the court level. The edits are intended to ensure that
obvious errors are not entered into a local court*s database. Examples of
the types of errors screened for are the district office in which the case
was filed, the U. S. Code title and section of the filing, and the judge
code. Most district courts have staff responsible for data quality
control.

A second set of automated quality control edits are used by AOUSC when
transferring data from the court level to its national database. These
edits screen for missing or invalid codes that are not screened for at the
court level, such as dates of case events, the type of proceeding, and the
type of case. Records that fail one or more checks are not added to the
national database and are returned electronically to the originating court
for correction and resubmission.

Monthly listings of all records added to the national database are sent
electronically to the involved courts for verification.

Courts* monthly and quarterly case filings are monitored regularly to
identify and verify significant increases or decreases from the normal
monthly or annual totals.

Tables on case filings are published on the Judiciary*s intranet for
review by the courts.

Detailed and extensive statistical reporting guidance is provided to
courts for reporting civil and criminal statistics. This guidance includes
information on

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 18
general reporting requirements, data entry procedures, and data processing
and

reporting programs.

Periodic training sessions are conducted for district court staff on
measures and techniques associated with data quality control procedures.

AOUSC did not identify any audits to test the accuracy of district court
case filings or any other efforts to verify the accuracy of its electronic
data by comparing the electronic data to *hard copy* case records for
district courts. Within the limited time for our review, AOUSC was unable
to obtain information from individual courts to include in its responses.
We have no information on how effective the procedures AOUSC described may
be in ensuring that the data in the automated databases were accurate and
reliable means of assigning weights to district court case filings.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 19

Enclosure II Measuring Judicial Workload Using the Collection of Time
Study Data

The current bankruptcy court and district court workload measures were
developed using data collected from time studies. The district court time
study took place between 1987 and 1993, and the bankruptcy court time
study took place between 1988 and 1989.

Different procedures were used in these two time studies. The bankruptcy
court time study protocol is an example of a "diary" study, where judges
recorded time and activity details for all of their official business over
a 10 week period. The district court time study protocol is an example of
a "case- tracking" study, where a sample of cases were selected, and all
judges who worked on a given sample case recorded the amount of time they
spent on the case. Time studies, in general, have the substantial benefit
of providing quantitative information that can be used to create objective
and defensible measures of judicial workload, along with the capability to
provide estimates of the uncertainty in the measures. Estimating Judge
Time in Diary and Case- Tracking Studies

At the conclusion of a case- tracking study, total time spent on each
sample case closed during the study period is readily available by summing
the recorded times spent on the case by each judge who worked on the case.
For a given case type, the summed recorded times can be averaged to obtain
an estimate of the average judicial time per case for that case type.

For a diary study, however, it is necessary to make estimates of judicial
workload for all cases that were not both opened and closed during the
data collection period. This estimation step requires information from the
caseload database, and thus the accuracy of estimates depends in part on
the accuracy of the caseload data. Two kinds of information are required
from the caseload database: case type and length of time the case has been
open. Using these data and the time data judges have recorded for specific
cases, estimates can be made of the overall time required for cases that
were not opened and closed during the calendar period covered by the diary
study.

Comparing Case- Tracking Studies and Diary Studies

Each study type has advantages and disadvantages. The following outlines
the similarities and differences in terms of burden, timeliness of data
collection, postdata collection steps, accuracy, and comprehensiveness.

Burden on Participants Each study type places burden on judicial personnel
during data collection. It is not clear that one study type is less
burdensome than the other. The diary study procedure requires more
concentrated effort, but data are collected for a shorter period of time.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 20
Timeliness of Data Collection

Data collection for a diary study can be completed more quickly than for a
casetracking study. Post Data Collection Steps

More effort is needed to convert diary study data to judicial workload
estimates than case- tracking study data. Also, the accuracy of estimates
from diary study data depends in part on the accuracy and objectivity of
the information in the caseload database.

Data Accuracy It is not clear that one study type collects more accurate
data than the other study type. Some of the bankruptcy court case- related
time study data could not be linked to a specific case type due to
misreporting errors and/ or errors in the caseload database. Some error of
this type likely is unavoidable because of the requirement to record all
time rather than record time for specific cases only. However, it is
plausible that a diary study collects higher quality data, on average,
because all official time is to be recorded during the study period;
judicial personnel become accustomed to recording their time. In contrast,
the data quality for a case- tracking study could decline over the study's
length; for example, after a substantial proportion of the sample cases
are closed, judicial personnel could become less accustomed to recording
time on the remaining open cases.

Comprehensiveness and Efficiency In theory, a case- tracking study
collects more comprehensive information about judicial effort on a given
case than a diary study, because data for a sampled case almost always are
collected over the duration of the case. (Data collection may be
terminated for a few cases that remain open, or are reopened, many years
after initial filing.)

With the diary approach, the total judicial time that is required for
lengthy case types is estimated by combining *snap shots* of the time
required by such cases of different ages. Thus, in theory, producing
accurate weights for lengthy case types is not problematic. In practice,
however, difficulties may be encountered. For example, in the 1988- 1989
bankruptcy time study, the asset and liability information for cases older
than 22 months was inadequate and appropriate adjustments had to be made.
In addition, difficulties may arise if only a small number of cases of the
lengthy type are in the system. This is an issue FJC said it is
considering as it finalizes how to assess the judicial work associated
with mega cases in the upcoming bankruptcy case- weighting study.

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 21

Enclosure III Comments from the Chair of the Judicial Resources Committee,

Judicial Conference of the United States

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 22

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 23

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 24

Enclosure IV Comments from the Federal Judicial Center

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 25

GAO- 03- 788R Accuracy of Judges Case- Related Workload Measures Page 26
(440195)
*** End of document. ***