Hospital Quality Data: CMS Needs More Rigorous Methods to Ensure Reliability of Publicly Released Data (31-JAN-06, GAO-06-54). The Medicare Modernization Act of 2003 directed that hospitals lose 0.4 percent of their Medicare payment update if they do not submit clinical data for both Medicare and non-Medicare patients needed to calculate hospital performance on 10 quality measures. The Centers for Medicare & Medicaid Services (CMS) instituted the Annual Payment Update (APU) program to collect these data from hospitals and report their rates on the measures on its Hospital Compare Web site. For hospital quality data to be useful to patients and other users, they need to be reliable, that is, accurate and complete. GAO was asked to (1) describe the processes CMS uses to ensure the accuracy and completeness of data submitted for the APU program, (2) analyze the results of CMS's audit of the accuracy of data from the program's first two calendar quarters, and (3) describe processes used by seven other organizations that assess the accuracy and completeness of clinical performance data. -------------------------Indexing Terms------------------------- REPORTNUM: GAO-06-54 ACCNO: A46078 TITLE: Hospital Quality Data: CMS Needs More Rigorous Methods to Ensure Reliability of Publicly Released Data DATE: 01/31/2006 SUBJECT: Data collection Data integrity Hospitals Medical records Performance measures Quality assurance Quality control Reporting requirements Statistical data Program implementation Annual Payment Update Program ****************************************************************** ** This file contains an ASCII representation of the text of a ** ** GAO Product. ** ** ** ** No attempt has been made to display graphic images, although ** ** figure captions are reproduced. Tables are included, but ** ** may not resemble those in the printed version. ** ** ** ** Please see the PDF (Portable Document Format) file, when ** ** available, for a complete electronic file of the printed ** ** document's contents. ** ** ** ****************************************************************** GAO-06-54 * Results in Brief * Background * Selection of Measures * Collection, Submission, and Reporting of Quality Data * Implementation of the APU Program * Other Reporting Systems * CMS Has Processes for Checking Data Accuracy but Has No Ongo * CMS Checks Data Accuracy Electronically and Through an Indep * CMS Has No Ongoing Process to Ensure Completeness of Data Su * Data Accuracy Baseline Was High Overall, but Statistically U * Baseline Level of Data Accuracy Was High Overall, and Large * Passing the 80 Percent Threshold Is Statistically Uncertain * No Data Were Available to Provide Baseline Assessment of Com * Other Reporting Systems Use Various Methods to Ensure Data A * Other Reporting Systems Use Various Methods to Check Data * Other Reporting Systems Conduct Independent Audits * Conclusions * Recommendations for Executive Action * Agency Comments * GAO Contact * Acknowledgments * GAO's Mission * Obtaining Copies of GAO Reports and Testimony * Order by Mail or Phone * To Report Fraud, Waste, and Abuse in Federal Programs * Congressional Relations * Public Affairs Report to the Committee on Finance, U.S. Senate United States Government Accountability Office GAO January 2006 HOSPITAL QUALITY DATA CMS Needs More Rigorous Methods to Ensure Reliability of Publicly Released Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data Reliability of Hospital Quality Data GAO-06-54 Contents Letter 1 Results in Brief 5 Background 7 CMS Has Processes for Checking Data Accuracy but Has No Ongoing Process to Check Completeness 13 Data Accuracy Baseline Was High Overall, but Statistically Uncertain for Many Hospitals, and Data Completeness Baseline Cannot Be Determined 21 Other Reporting Systems Use Various Methods to Ensure Data Accuracy and Completeness, Notably an Independent Audit 28 Conclusions 30 Recommendations for Executive Action 31 Agency Comments 32 Appendix I Scope and Methodology 35 Appendix II Other Reporting Systems 41 Appendix III Data Tables on Hospital Accuracy Scores 47 Appendix IV Comments from the Centers for Medicare & Medicaid Services 53 Appendix V GAO Contact and Staff Acknowledgments 58 Tables Table 1: HQA Hospital Quality Measures 9 Table 2: Percentage and Number of Hospitals Whose Baseline Accuracy Score Met or Fell Below the 80 Percent Threshold, by Measure Set and Quarter 23 Table 3: Background Information on CMS and Other Reporting Systems 41 Table 4: Processes Used by CMS and Other Reporting Systems to Ensure Data Accuracy 43 Table 5: Processes Used by CMS and Other Reporting Systems to Ensure Data Completeness 45 Table 6: Median Hospital Baseline Accuracy Scores, by Hospital Characteristic, Quarter, and Measure Set 47 Table 7: Proportion of Hospitals with Baseline Accuracy Scores Not Meeting 80 Percent Threshold, by Hospital Characteristic, Quarter, and Measure Set 48 Table 8: Percentage of Hospitals with Baseline Accuracy Scores Not Meeting 80 Percent Threshold, by JCAHO-Certified Vendor Grouped by Number of Hospitals Served, Quarter, and Measure Set 49 Table 9: Breadth of Confidence Intervals in Percentage Points Around the Hospital Baseline Accuracy Scores at Selected Percentiles, by Measure Set and Quarter 51 Table 10: For Hospitals with Confidence Intervals That Included the 80 Percent Threshold, Percentage of Total Hospitals with an Actual Baseline Accuracy Score That Either Met or Failed to Meet the Threshold, by Measure Set and Quarter 52 Figures Figure 1: Approximate Times for Collection, Submission, and Reporting of Hospital Quality Data 12 Figure 2: Baseline Hospital Accuracy Scores at Selected Percentiles, by Measure Set and Quarter 22 Figure 3: Percentage of Hospitals Whose Baseline Accuracy Score Confidence Intervals Clearly Exceed, Fall Below, or Include the 80 Percent Threshold, by Measure Set and Quarter 26 Abbreviations ACC American College of Cardiology AMI acute myocardial infarction APU program Annual Payment Update program CABG coronary artery bypass grafting CAP community-acquired pneumonia CDAC Clinical Data Abstraction Center CMS Centers for Medicare & Medicaid Services DAVE Data Assessment and Verification Project HF heart failure HQA Hospital Quality Alliance IFMC Iowa Foundation for Medical Care JCAHO Joint Commission on Accreditation of Healthcare Organizations MDS Minimum Data Set MEDPAR Medicare Provider Analysis and Review MMA Medicare Prescription Drug, Improvement, and Modernization Act MSA metropolitan statistical area NCQA National Committee for Quality Assurance PCI percutaneous coronary intervention PTCA percutaneous transluminal coronary angioplasty QIO quality improvement organization SPARCS Statewide Planning and Research Cooperative System SSA Social Security Administration STS Society of Thoracic Surgeons This is a work of the U.S. government and is not subject to copyright protection in the United States. It may be reproduced and distributed in its entirety without further permission from GAO. However, because this work may contain copyrighted images or other material, permission from the copyright holder may be necessary if you wish to reproduce this material separately. United States Government Accountability Office Washington, DC 20548 January 31, 2006 The Honorable Charles E. Grassley Chairman The Honorable Max Baucus Ranking Minority Member Committee on Finance United States Senate The Medicare Prescription Drug, Improvement, and Modernization Act (MMA) of 2003 created a financial incentive for hospitals to submit data to provide information about their quality of care that could be publicly reported.1 Under Section 501(b) of MMA, acute care hospitals shall submit the clinical data from the medical records of all Medicare and non-Medicare patients needed to calculate hospitals' performance on 10 quality measures. If a hospital chooses not to submit the data, it will lose 0.4 percent of its annual payment update from Medicare for a subsequent fiscal year.2 The Centers for Medicare & Medicaid Services (CMS) established the Annual Payment Update program (APU program)3 to implement this provision of MMA. Participating hospitals submit quality data that are used to calculate a hospital's performance on the measures quarterly,4 according to a schedule defined by CMS. MMA affects hospital annual payment updates for fiscal year 2005 through fiscal year 2007.5 For fiscal year 2005, the first year of the program, CMS based its annual payment update on quality data submitted by hospitals for patients discharged between January 1, 2004, and March 31, 2004. 1Pub. L. No. 108-173, S: 501(b), 117 Stat. 2066, 2289-90 (amending section 1886(b)(3)(B) of the Social Security Act, to be codified at 42 U.S.C. S: 1395ww(b)(3)(B)). 2The reduction in the annual payment update applies to hospitals paid under Medicare's inpatient prospective payment system. Critical access, children's, rehabilitation, psychiatric, and long-term-care hospitals may elect to submit data for any of the measures, but they are not subject to a reduction in their payment if they choose not to submit data. 3Throughout this report, we refer to CMS's Reporting Hospital Quality Data for the Annual Payment Update program as the "APU program". 4Throughout this report, we refer to the clinical data submitted by hospitals that are used to calculate their performance on the measures as "quality data". 5Senate Bill 1932 would extend the APU program indefinitely. It would also increase the penalty for not submitting data to 2 percent and provide for the Secretary to establish additional measures, beyond the original 10, for payment purposes. Under MMA, the 10 quality measures for which hospitals report data are those established by the Secretary of Health and Human Services as of November 1, 2003. The measures cover three conditions: heart attack, heart failure, and pneumonia. Over 3 million patients were admitted to acute care hospitals in 2002 with these three conditions, representing approximately 10 percent of total acute care hospital admissions. For patients over 65, acute care hospital admissions for the three conditions represented approximately 16 percent of total admissions. The collection of quality data on the 10 measures is part of a larger initiative to provide useful and valid information about hospital quality to the public.6 In April 2005, CMS launched a Web site called "Hospital Compare" to convey information on these and other hospital quality measures to consumers. Additional measures are being introduced by CMS,7 and it is expected that public reporting of hospital quality measures will continue into the future. Hospitals may submit quality data on additional measures for the APU program, but CMS bases any reduction in the annual payment update on the 10 measures referenced in the MMA. In addition to this effort, other public and private organizations also administer reporting systems in which clinical data are collected and may be released to the public. In order for publicly released information on the hospital quality measures to be useful to patients, payers, health professionals, health care organizations, regulators, and other users, the quality data used to calculate a hospital's performance on the measures need to be reliable, that is, both accurate and complete. If a hospital submits complete data, that is, data on all the cases that meet the specific inclusion criteria for eligible patients, but the data are not collected, or abstracted, from the patients' medical records accurately, the data will not be reliable. Similarly, if a hospital submits accurate data, but those data are incomplete because the hospital leaves out eligible cases, the data will not be reliable. Data that are not reliable may present a risk to people making decisions based on the data, such as a patient choosing a hospital for treatment. The program's initial, or baseline, data could describe data reliability at the start of the program and provide a reference point for any subsequent assessments. 6According to the Secretary of Health and Human Services, the effort is also intended to provide hospitals with a sense of predictability about public reporting expectations, to standardize data and data collection mechanisms, and to foster hospital quality improvement, in addition to providing information on hospital quality to the public. 7For example, CMS plans to publicly report on the Hospital Compare Web site measures of patient perspectives on seven aspects of hospital care, with national implementation scheduled for 2006. You asked us to provide information on the reliability of publicly reported information on hospital quality obtained through the APU program. In this report, we (1) describe the processes CMS uses to ensure that the quality data submitted by hospitals for the APU program are accurate and complete and any plans by CMS to modify its processes; (2) determine the baseline levels of accuracy and completeness for the data for patients discharged from January 2004 through June 2004, the first two quarters of data submitted by hospitals under the APU program; and (3) describe the processes used by seven other organizations that collect clinical performance data to assess the accuracy and completeness of quality data for selected reporting systems. In addressing these objectives, we collected information through interviews, examination of documents, and data analysis. To describe CMS's processes for ensuring the accuracy and completeness of the quality data for the APU program, we interviewed program officials from CMS and its contractors,8 hospital associations, quality improvement organizations (QIO), and hospital data vendors.9 In addition, we examined both publicly available and internal documents from CMS and its contractors. To determine the baseline accuracy and completeness of data submitted for the APU program, we drew on available information collected by CMS. In particular, we analyzed the accuracy of the quality data based on the reabstraction of patient medical records performed by CMS's Clinical Data Abstraction Center (CDAC).10 The reabstraction results available at the time we conducted our analyses pertained to hospital discharges that took place from January 1, 2004, through June 30, 2004.11 We extracted additional information about hospitals from the Medicare Provider of Services database, including the number of Medicare-certified beds and urban or rural location. After examining the CDAC data and reviewing the procedures that CMS has put in place to conduct the reabstraction process, we determined that the data were sufficiently reliable to use in estimating the baseline level of accuracy characterizing the quality data submitted by hospitals for those two calendar quarters. Regarding data on completeness of the quality data, we interviewed CMS officials and contractors and examined related documents. To examine the methods used by other reporting systems12 to assess data completeness and accuracy, we conducted structured interviews with officials from seven organizations,13 including government agencies, that administer such systems. We focused on reporting systems that collect clinical rather than administrative data. We selected a mix of systems, in terms of public or private sponsorship, types of providers assessed, and medical conditions covered, to ensure variety. We also spoke with individual health professionals with expert knowledge in the field of hospital quality assessment. 8CMS's contractors for this program are the Iowa Foundation for Medical Care (IFMC) and DynKePRO, LLC. IFMC is the quality improvement organization (QIO) for the state of Iowa. (QIOs are independent organizations that work under contract to CMS to monitor quality of care for the Medicare program and help providers to improve their clinical practices.) Under a separate contract, IFMC operates the national database for hospital quality data known as the QIO clinical warehouse. DynKePRO, LLC, an independent medical auditing firm, operates CMS's Clinical Data Abstraction Center (CDAC), which assesses the accuracy of hospital data submissions. 9Some hospitals contract with data vendors to electronically process, analyze, and transmit patient information. Our analysis of the level of accuracy and completeness of the quality data is based on the procedures developed by CMS to validate the data submitted; we have not independently compared the data submitted by hospitals to the original patient clinical records. In addition, we did not assess the performance of hospitals with respect to the quality measures themselves (which show how often the hospitals provided a specified service or treatment when appropriate). We conducted our work from November 2004 through January 2006 in accordance with generally accepted government auditing standards. For more details on our scope and methodology, see appendix I. 10Reabstraction is the re-collection of clinical data for the purpose of assessing the accuracy of hospital abstractions. In the APU program, CDAC compares data originally submitted by the hospitals to those it has reabstracted from the same medical records. 11These were the calendar quarters for which, at the time we conducted our analysis, hospitals had collected the data and CMS had completed its process for reabstracting and assessing the data. We analyzed data for all hospitals affected by section 501(b) of MMA, which were located in 49 states and the District of Columbia. Hospitals in Maryland and Puerto Rico were excluded because they are paid under different payment systems than other acute care hospitals. 12Throughout this report, we refer to this group of quality data reporting systems, each of which collects some type of clinical performance data from designated providers or health plans, as "other reporting systems". 13The seven organizations were the American College of Cardiology, the California Office of Statewide Health Planning and Development, CMS (the units responsible for monitoring nursing home care regarding the Data Assessment and Verification Project contract), the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), the National Committee for Quality Assurance, the New York State Department of Health, and the Society of Thoracic Surgeons. Results in Brief CMS has processes for ensuring the accuracy of the quality data submitted by hospitals for the APU program, but has no ongoing process for assessing the completeness of those data. To check accuracy, one CMS contractor electronically checks the data as they are submitted to the clinical warehouse, and another operates CMS's CDAC that conducts an independent audit by sampling five patient record abstractions from all the quality data submitted by each hospital in a quarter. CDAC then compares the quality data originally collected by the hospital from the medical records for those five patients to the quality data it has reabstracted from the same medical records. The data are deemed to be accurate if there is 80 percent or greater agreement between these two sets of results. CMS did not require hospitals to meet the 80 percent threshold for the 10 APU measures to receive their full annual payment update for fiscal year 2005. However, for fiscal year 2006, CMS reduced the payment update by 0.4 percentage points for hospitals whose data on the APU measures do not meet the 80 percent threshold. To assess completeness, CMS has twice compared the number of cases submitted by each hospital for the APU program for a given period to the number of claims each hospital submitted to Medicare, once for the fiscal year 2005 update and once for the fiscal year 2006 update. However, these analyses did not address non-Medicare patient records, and the approach that CMS took in these analyses was not capable of detecting incomplete data for all hospitals. For example, to determine which hospitals could receive the full fiscal year 2006 update, CMS limited its analysis to hospitals that submitted no patient data at all to the clinical warehouse in a given quarter. CMS has not put in place an ongoing process for checking the completeness of the data that hospitals submit for the APU program that would provide accurate and consistent information for all patients and all hospitals. Nor has CMS required hospitals to certify that they submitted data for all eligible patients or a representative sample thereof. We could determine a baseline level of accuracy for the quality data submitted by hospitals for the APU program but not a baseline level of completeness. We found a high overall baseline level of accuracy when we examined CMS's assessment of the data from the first two calendar quarters of 2004. Overall, the median accuracy score exceeded 90 percent, which was well above the 80 percent accuracy threshold set by CMS, and about 90 percent of hospitals met or exceeded that threshold for both the first and the second calendar quarters of 2004. For most hospitals whose accuracy score was well above the threshold, the results based on the reabstraction of five cases were statistically certain. However, for approximately one-fourth to one-third of all the hospitals that CMS assessed for accuracy, the statistical margin of error for their accuracy score included both passing and failing accuracy levels. Consequently, for these hospitals, the five cases that CMS examined were not sufficient to establish with statistical certainty whether the hospital met the threshold level of data accuracy. Accuracy did not vary between rural and urban hospitals, and small hospitals provided data as accurate as those from larger hospitals. The completeness baseline could not be determined because CMS did not assess the extent to which all hospitals submitted data on all eligible patients, or a representative sample thereof, for the first two calendar quarters of 2004, and consequently there were no data from which to derive such an assessment. Other reporting systems that collect clinical performance data have adopted various methods to ensure data accuracy and completeness. Some of these methods are used by all of these other reporting systems, such as checking the data electronically to identify missing data. Officials from some of the other systems and an expert in the field stressed the importance of including an independent audit in the methods used by organizations to check data accuracy and completeness. Most other reporting systems that conduct independent audits incorporate three methods as part of their process that CMS does not use in its independent audit. Specifically, most include an on-site visit, focus their audits on a selected number of facilities or reporting entities, and review a minimum of 50 patient medical records per reporting entity during the audit. In order for CMS to ensure that the hospital quality data are accurate and complete, we recommend that the CMS Administrator, focusing on the subset of hospitals for which it is statistically uncertain if they met CMS's accuracy threshold in one or more previous quarters, increase the number of patient records reabstracted by CDAC. We further recommend that CMS require hospitals to certify that they took steps to ensure that they submitted data on all eligible patients, or a representative sample thereof, and that the agency assess the level of incomplete data submitted by hospitals for the APU program to determine the magnitude of underreporting, if any, in order to refine how completeness assessments may be done in future reporting efforts. In commenting on a draft of this report, CMS agreed to implement steps to improve the quality and completeness of the data. Background Medicare spends over $136 billion annually on inpatient hospital care for its beneficiaries. To help ensure the quality of the care it purchases through Medicare, CMS launched the Hospital Quality Initiative in 2003. This initiative aims to refine and standardize hospital data, data transmission, and performance measures as part of an effort to stimulate and support significant improvement in the quality of hospital care. One component of this broader initiative is CMS's participation in the Hospital Quality Alliance (HQA), a public-private collaboration that seeks to make hospital performance information more accessible to the public, payers, and providers of care.14 Before the enactment of MMA, HQA had organized a voluntary program for hospitals to submit data on quality of care measures intended for public reporting. For its part as a participant in HQA, CMS set up a central database to receive the data submitted by hospitals and initiated plans for a Web site to post information on hospital quality of care measures. Thus, CMS had a data collection infrastructure in place when MMA established the financial incentive for hospitals to submit quality data. Selection of Measures The 10 measures chosen by the Secretary of Health and Human Services for the APU program are the original 10 measures that were adopted by HQA. HQA subsequently adopted additional measures that relate to the same three conditions-heart attacks, heart failure, and pneumonia-and others that relate to surgical infection prevention. (See table 1 for a listing of the APU-measure set and the expanded-measure set.15) Hospitals participating in HQA were encouraged to submit data on the additional measures, but data submitted on the additional measures did not affect whether a hospital received its full payment update under the APU program. CMS and the QIOs have tested these measures for validity and reliability, and all measures have been endorsed by the National Quality Forum, which fosters agreement on national standards for measurement and public reporting of health care performance data.16 14HQA (formerly called the National Voluntary Hospital Reporting Initiative) was initiated by the American Hospital Association, the Federation of American Hospitals, and the Association of American Medical Colleges. It is supported by CMS, as well as the Joint Commission on Accreditation of Healthcare Organizations, National Quality Forum, American Medical Association, Consumer-Purchaser Disclosure Project, AARP, AFL-CIO, and Agency for Healthcare Research and Quality. Its aim is to provide a single standard quality measure set for hospitals to support public reporting and pay-for-performance efforts. 15Throughout this report, we refer to the 10 measures on which reductions in the annual payment update are based as the "APU-measure set" and to the combination of those 10 with the additional measures adopted by HQA as the "expanded-measure set". HQA added 7 measures for discharges beginning April 1, 2004, and another 5 measures for discharges beginning July 1, 2004, for a total of 22 measures on which hospitals may currently submit data. Thus, the expanded-measure set includes different numbers of measures for different quarters of data. 16The National Quality Forum is a voluntary standard-setting, consensus-building organization representing providers, consumers, purchasers, and researchers. Table 1: HQA Hospital Quality Measures Surgical infection Heart attack Heart failure Pneumonia prevention APU-measure set For 1. Aspirin at 6. Left 8. Initial (none) discharges arrival ventricular antibiotic received beginning function within 4 hours of January 1, 2. Aspirin assessment hospital arrival 2004 prescribed at discharge 7. ACE inhibitor 9. Oxygenation for left assessment 3. ACE ventricular (angiotensin- systolic 10. Pneumococcal converting dysfunction vaccination status enzyme) inhibitor for left ventricular systolic dysfunction 4. Beta blocker at arrival 5. Beta blocker prescribed at discharge Expanded-measure set For 1-5 above plus 6-7 above plus 8-10 above plus (none) discharges beginning 11. Thrombolytic 14. Discharge 16. Blood culture April 1, agent received instructions performed before 2004 within 30 minutes first antibiotic of hospital 15. Adult smoking received in arrival cessation hospital advice/counseling 12. PTCA 17. Adult smoking (percutaneous cessation transluminal advice/counseling coronary angioplasty) received within 90 minutes of hospital arrival 13. Adult smoking cessation advice/counseling For 1-5, 11-13 above 6-7, 14-15 above 8-10, 16-17 above 20. discharges plus Prophylactic beginning antibiotic July 1, 18. Initial received 2004 antibiotic within 1 selection for CAP hour prior (community-acquired to surgical pneumonia) in incision immunocompetent patient 21. Prophylactic 19. Influenza antibiotic vaccinationa selection for surgical patientsa 22. Prophylactic antibiotics discontinued within 24 hours after surgery end Source: CMS, as of August 4, 2005. Note: Measures are worded as CMS posted them on www.qnetexchange.org. aHospitals are collecting data for these measures, but public reporting of hospital performance on these measures has been postponed. To minimize the data collection burden on hospitals by the APU program, CMS and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) have worked to align their procedures and protocols for collecting and reporting the specific clinical information that is used to score hospitals on the measures. JCAHO-accredited hospitals-approximately 82 percent of hospitals that participate in Medicare-have since 2002 submitted data to JCAHO on the same measures as those in the APU-measure set as well as many of those in the expanded-measure set. Beginning with the first calendar quarter of data submitted by hospitals for the APU program, hospitals had the option of submitting the same data to CMS that many of them were already collecting for JCAHO. In November 2004, CMS and JCAHO jointly issued a manual laying out the aligned procedures and protocols for discharges beginning January 1, 2005. Collection, Submission, and Reporting of Quality Data Hospitals use CMS's definition of the eligible patient population to identify the patients for whom they should collect and submit quality data for each measure. The definition is based on the primary diagnosis and, for the two cardiac conditions, the age of the patient.17 Specifically, hospitals use diagnostic codes and demographic information from the patients' medical and administrative records to determine eligibility based on protocols established by CMS. Once the eligible patients have been identified, hospitals extract from their patients' medical records the specific data items needed for the Iowa Foundation for Medical Care (IFMC) to calculate a hospital's performance, following detailed data abstraction guidelines developed by CMS. Hospitals may submit data for all eligible patients for a given condition, or if they have more than a specified number of eligible patients, they may draw a random sample according to a formula,18 and submit data for those patients only. These data are put into a standardized data format and submitted quarterly through a secure Internet connection to the QIO "clinical warehouse" administered by IFMC. IFMC accepts into the clinical warehouse only the data that meet the formatting and other specifications established by CMS19 and that are submitted before the specified deadline for that quarter. About 80 percent of hospitals rely on data vendors-which typically are collecting the same data for JCAHO-to submit the data for them. 17Patients under 18 years of age are excluded from the eligible patient population for the two cardiac conditions. 18Before hospitals can consider sampling, rather than submitting all of their eligible cases, the number of eligible cases must exceed a minimum sample size that ranges from 60 per quarter for pneumonia cases to 76 for heart failure cases and 78 for heart attack cases. Once hospitals reach that threshold for a given condition, they can submit a random sample of their cases as long as the minimum sample size is met and it includes at least 20 percent of their eligible cases, up to a maximum sample size requirement of 241 for pneumonia, 304 for heart failure, and 311 for heart attacks. For discharges that occurred prior to January 1, 2005, CMS applied a different formula to hospitals not accredited by JCAHO that called for a minimum sample size of 7 for each of the three conditions and a sampling rate of at least 20 percent until a maximum sample size requirement of 70 cases was reached. IFMC aggregates the information from the individual patient records to generate a rate for each hospital on each of the measures for which the hospital submitted relevant clinical data. These rates show how often a hospital provided the specific service or activity designated in the measures to patients for whom that service or activity was appropriate. Hospitals also collect information on each patient that identifies patients for whom the particular service or activity would not be called for, such as patients with a condition that would make prescribing aspirin or beta blockers medically inappropriate. CMS posts on its Hospital Compare Web site each hospital's rates for all the APU and expanded measures for which it submitted data.20 In November 2004, CMS first posted these rates, based on data from the first quarter of calendar year 2004. It subsequently posted new rates in March 2005, based on the first two quarters of calendar year 2004 data, and again in September and December 2005 with additional quarters of data. CMS continues to update these rates quarterly, using the four most recent quarters of data available. There can be up to a 14-month time lag between when patients are treated by the hospital and when the resulting rates are posted on the CMS Web site. (See fig. 1.) 19IFMC statistics show that a majority of hospitals ultimately succeed in gaining acceptance for all the cases they have submitted and that less than 10 percent of hospitals have had more than 5 percent of their cases rejected in a given quarter. 20For two measures, influenza vaccination and prophylactic antibiotic selection for surgical patients, CMS has postponed public reporting. Figure 1: Approximate Times for Collection, Submission, and Reporting of Hospital Quality Data aCMS had to make its determination of hospital eligibility for the fiscal year 2005 annual payment update decision approximately 1 month after hospitals submitted their data for the first quarter. Implementation of the APU Program In implementing the APU program, CMS uses the same policies and procedures for collecting and submitting quality data as are used for HQA. For the first annual payment update determined by the APU program, which applied to fiscal year 2005, hospitals were required to begin submitting data by July 1, 2004, for the patients discharged during the first calendar quarter of 2004 (January through March 2004). Data were received from 3,839 hospitals, over 98 percent of those affected by the MMA provision. These figures include 150 hospitals that certified to CMS that they had no eligible patients with the three conditions during the first calendar quarter of 2004. Hospitals that have no eligible patients are not penalized and receive the full annual payment update. For the second annual payment update determined by the APU program, which applied to fiscal year 2006, participating hospitals were required to continue to submit data in accordance with the quarterly deadlines set by CMS. Failure to meet the requirements of the program and qualify for the full annual payment update in one year does not affect a hospital's ability to participate in and qualify for the full update in the succeeding year. CMS has assigned primary responsibility to the 53 QIOs to inform hospitals about the APU program's requirements and to provide technical assistance to hospitals in meeting those requirements. This includes assistance to hospitals in submitting their data to the clinical warehouse provided by IFMC. Other Reporting Systems There are several organizations that administer reporting systems that collect clinical data, some of which also release their data to the public. Some of these organizations are in the public sector, such as state health departments, and some are in the private sector, such as accreditation bodies. Several of these systems have been in existence for a number of years, including one for as long as 16 years. Hospitals, health plans, nursing homes, and other external organizations submit data to these systems on a range of medical conditions, which for most of these systems includes at least one cardiac condition (e.g., percutaneous coronary intervention, coronary artery bypass grafting, heart attack, heart failure). Many of these systems make the results of the data they have collected available for public use. For example, one public organization has been collecting individual, patient-level data on cardiac surgeries from hospitals for the past 16 years and creates reports based on the data collected, which it subsequently posts on its Web site. Additionally, data collected by these reporting systems can also be used for quality improvement efforts and to track performance over time. (For more background information on other reporting systems, see app. II, table 3.) CMS Has Processes for Checking Data Accuracy but Has No Ongoing Process to Check Completeness CMS has processes for ensuring the accuracy of the quality data submitted by hospitals for the APU program, but has no ongoing process to assess whether hospitals are submitting complete data. To check accuracy, IFMC, a CMS contractor, electronically checks the data as they are submitted to the clinical warehouse. In addition, CDAC independently audits the data submitted by hospitals. Specifically, it reabstracts the quality data from medical records for a sample of five patients per quarter for each hospital and compares its results to the quality data submitted by hospitals. The data are deemed to be accurate if there is 80 percent or greater agreement between these two sets of results, a standard that hospitals had to meet for the APU-measure set to qualify for their full annual payment update for fiscal year 2006. To check completeness, CMS has twice compared the number of cases submitted by each hospital for the APU program for a given period to the number of claims the hospital submitted to Medicare, once for the fiscal year 2005 update and once for the fiscal year 2006 update. However, these analyses did not address non-Medicare patient records and the approach that CMS took in these analyses was not capable of detecting incomplete data for all hospitals. CMS has not put in place an ongoing process for checking the completeness of the data that hospitals submit for the APU program that would provide accurate and consistent information for all patients and all hospitals. Moreover, CMS has not required hospitals to certify that they submitted data for all eligible patients or a representative sample thereof. CMS Checks Data Accuracy Electronically and Through an Independent Audit CMS employs two processes to check and ensure the accuracy of the quality data submitted by hospitals for the APU program. First, at the time that data are submitted to the clinical warehouse, IFMC, a CMS contractor, electronically checks the data for inconsistencies and missing values. The results are shared with hospitals. After the allotted time for review and correction of the submissions, no more data or corrections may be submitted by hospitals for that quarter. These checks are done whether the hospital submits its data directly to the warehouse or through a data vendor. Second, CDAC conducts quarterly independent audits to verify that the data submitted by hospitals to the clinical warehouse accurately reflect the information in their patients' medical records.21 From among all the patient records submitted to the clinical warehouse each quarter, CMS randomly selects for CDAC's reabstraction five patient records from each participating hospital.22 CDAC sends a request for these patients' medical records to the hospitals, and they send photocopies of the records to CDAC for reabstraction. A CDAC abstractor reviews the medical record, determines if or when a specific action occurred-such as the time when a patient arrived at the hospital-and records that data field accordingly. Once the CDAC reabstraction is complete, the response previously entered into that field by the hospital is compared to that entered by the CDAC abstractor, and CDAC notes whether the two responses match. If they do not match, a second CDAC abstractor reviews the medical record to make a final determination. The results of the CDAC reabstraction are sent to the clinical warehouse, where the individual data matches and mismatches are summed to produce an accuracy score for each hospital. The accuracy score represents the overall percentage of agreement between data submitted by the hospital and data reabstracted by CDAC across all five cases.23 It is based on all the APU and expanded measures for which the hospital submitted data.24 The score, along with information from CDAC on where the mismatches occurred and why, is shared with the hospital and the hospital's local QIO. CMS considers hospitals achieving an accuracy score of 80 percent or better to have provided accurate data. Hospitals with accuracy scores below 80 have the opportunity to appeal their reabstraction results.25 21DynKePRO, LLC, has operated CDAC since 1994. For 10 years it shared this function with a second firm, but in September 2004 DynKePRO negotiated a new contract with CMS that made it the sole CDAC contractor. In April 2005, DynKePRO became CSC York. 22To be included in the reabstraction process, hospitals must have submitted data on at least six patients across all three conditions in that quarter. In applying these processes for the fiscal year 2005 annual payment update, CMS did not require hospitals to meet the 80 percent accuracy threshold for the 10 APU measures to qualify for the full update. Rather, to receive their full payment update, hospitals only had to pass the electronic data checking performed when they submitted their data to the clinical warehouse for the first calendar quarter of the APU program-for discharges that occurred from January 2004 through March 2004. Although the accuracy scores were not considered for the payment update, CMS calculated an accuracy score for each quarter in which the hospital submitted at least six cases to the clinical warehouse. Each quarter the accuracy score was based on data for all the measures submitted by the hospital in that quarter and was derived from five randomly selected patient records. Along with the accuracy score, hospitals received information on where mismatches occurred and the reasons for the mismatches. 23The accuracy score is not based on all the data submitted by a hospital. Rather, CMS has identified a specific subset of the data elements that should be counted in computing the accuracy score. In general, CMS included in this subset the clinical data elements needed to calculate the hospital's rate for each of the measures and left out other administrative and demographic information about the patients. CMS estimates that five patient records usually contain about 100 data elements for calculation of the accuracy score, but the actual number of data elements depends on which conditions were involved and the number of measures for which a hospital submitted data. 24Although CMS computes accuracy scores based on data for all measures submitted to the clinical warehouse, it recognizes that the MMA provision affecting hospital payments applies only to data for the 10 measures specified for the APU program. See 69 Fed. Reg. 49080 (Aug. 11, 2004). 25CMS created an appeal process that allows a hospital to challenge the reabstraction results through its local QIO. For data from the first two calendar quarters of 2004, if the QIO agreed with the hospital's interpretation, the appeal was forwarded to CDAC for review and correction, if appropriate. CDAC's decision on the appeal was final. Beginning with data from the third calendar quarter of 2004, appealed cases no longer go back to CDAC. Instead, QIOs make the final decision to uphold either CDAC's or the hospital's interpretation. During this process, hospitals are not allowed to supplement the submitted patient medical records. In contrast to the prior year, CMS applied the 80 percent threshold for accuracy as a requirement for hospitals to qualify for their full fiscal year 2006 annual payment update.26 IFMC continued to check electronically all of the data as they were submitted for each quarter and calculated accuracy scores quarterly for each hospital. CMS decided to base its payment update decision on the accuracy score that hospitals obtained for the third calendar quarter of 2004-for discharges that occurred from July 2004 through September 2004.27 This meant that the payment decision rested on the reabstraction results obtained from 5 randomly selected patient records. If a hospital met the 80 percent accuracy threshold based on all of the quality data it submitted, it received the full payment update. However, if a hospital failed to meet the 80 percent threshold, CMS recomputed the accuracy score using only the data elements required for the APU-measure set. For hospitals that failed again, CMS combined the CDAC reabstraction results from the third calendar quarter of 2004 with the CDAC results from the fourth calendar quarter of 2004 to produce an accuracy score derived from 10 patient medical records.28 CMS then computed accuracy scores first for all the quality data submitted by the hospital and finally for the APU-measure set, if needed to reach the 80 percent threshold. As a result, even though CMS assessed hospital accuracy primarily on the basis of data that exceeded those required for the APU-measure set, hospitals were not denied the full annual payment update except on the basis of the APU-measure set. A possibility does exist, however, that a hospital could have qualified for the full update based on its results for all the data it submitted, even if it would have failed using the APU-measure set. This could happen if the hospital submitted data that matched the CDAC abstractors' entries more consistently for the data entries used exclusively in computing the expanded measures, such as those relating to smoking cessation counseling, than for the data required by the APU-measure set. 2670 Fed. Reg. 47420-47428 (Aug. 12, 2005). 27CMS decided not to use accuracy scores from the first two quarters of the APU program because those data were collected before the alignment of CMS and JCAHO data collection specifications had begun to come into effect. Given the time needed to conduct all the steps in the process (see fig. 1), CMS was left with the third calendar quarter of 2004 as the latest full quarter of data that could be used for determining the fiscal year 2006 update. The third calendar quarter also marked HQA's expansion to 22 measures. 28Hospitals had to submit their patient medical records to CDAC for the fourth calendar quarter 2004 reabstractions no later than August 1, 2005, to take advantage of this additional opportunity to pass the 80 percent threshold. In the future, CMS intends to base its decisions on hospital eligibility for full annual payment updates on accuracy assessments from more than one quarter. Although its concerns about potential alignment issues affecting data for the first two quarters of the APU program led the agency to rely primarily on data from the third calendar quarter for the fiscal year 2006 update, CMS stated that its goal was to use accuracy assessments from four consecutive quarters when it determines hospital eligibility for the fiscal year 2007 full annual payment update. CMS uses the accuracy scores in making decisions on payment updates, but the scores do not affect the information posted on the Hospital Compare Web site. The Web site transmits to the public the rates on the APU and expanded measures that derive from the data that the hospitals submitted to the clinical warehouse. CMS does not post the accuracy scores generated from the CDAC reabstraction process on the Web site or indicate if the hospital rates are based on data that met CMS's 80 percent threshold for accuracy.29 CMS Has No Ongoing Process to Ensure Completeness of Data Submitted for the APU Program Although CMS has recognized the importance of obtaining quality data for the APU program on all eligible patients, or a representative sample if appropriate, it has not put in place an ongoing process to ensure that this occurs. For the fiscal year 2005 annual payment update, CMS checked that hospitals submitted data for at least a minimum number of patients by using Medicare claims data to estimate the number of "expected cases" that each hospital should have submitted to the clinical warehouse. To do this, it first calculated the average number of patients for each of the three conditions that each hospital had billed Medicare for over the previous eight calendar quarters (January 2002 through December 2003). Then, if the average number of Medicare claims for a condition was large enough to entitle the hospital to draw a sample instead of submitting data for all the eligible patients to the clinical warehouse, CMS reduced the number of "expected cases" based on the size of the sample.30 CMS told each hospital what its expected numbers of heart attack, heart failure, and pneumonia patients were. If the actual number of patients for whom hospitals submitted data for the APU program was lower, the hospitals were instructed to send a letter to their local QIO, signed by the hospital's CEO or administrator, stating that the hospital had fewer discharged patients for that condition than CMS had estimated. If such a letter was filed, the hospital qualified for the full annual payment update. In the end, no hospital participating in the APU program was denied a full annual payment update for fiscal year 2005 for submitting data on an insufficient number of patients or any other reason. 29The Hospital Compare Web site identifies instances where rates for a measure were based on fewer than 25 cases and where data were suppressed due to inaccuracies. However, the latter indication reflects situations where a hospital had problems with transmission of its data by a data vendor, not the outcome of the CDAC reabstractions. For the fiscal year 2006 update decision, CMS took a different approach to using Medicare claims data to address the issue of completeness. CMS used Medicare claims data to check whether hospitals that billed Medicare for any cases with one of the three conditions submitted at least one case to the clinical warehouse. To do this, CMS compared each hospital's Medicare claims for the three conditions for the four calendar quarters of 2004 to the hospital's submissions to the clinical warehouse for those same quarters. CMS identified instances where hospitals had submitted one or more claims for payment to Medicare for any of the three conditions for a quarter when they had not submitted any cases with one of those conditions to the clinical warehouse. On this basis, CMS determined that 110 hospitals would not qualify for the full payment update for fiscal year 2006. CMS conducted two additional analyses involving a comparison of the same Medicare claims data and quality data submissions to identify hospitals that may have submitted incomplete data for the APU program, but these analyses did not affect hospital eligibility for the full fiscal year 2006 payment update. The additional analyses identified (1) a set of hospitals that may have submitted samples of their eligible cases to the clinical warehouse when, according to the applicable sampling rules, they should have submitted data on all their cases; and (2) another set of hospitals that failed to submit cases to the clinical warehouse for all of the three conditions for which they filed Medicare claims in that quarter. However, in contrast to the hospitals that did not qualify for their full payment update, the hospitals in the second set submitted to the clinical warehouse at least one case for one of the three conditions. A CMS official stated that the agency plans to educate the hospitals identified by these additional analyses on the data submission and sampling requirements for the APU program. 30Originally, CMS intended to apply JCAHO's sampling rules to JCAHO-accredited hospitals, and its own sampling rules to the other hospitals, in computing their "expected cases". JCAHO's sampling procedures called for submitting larger samples to the clinical warehouse than CMS's did. However, when CMS officials determined that they could not reliably identify every hospital that belonged in the JCAHO group, they decided to apply the CMS rules across the board to all hospitals. Therefore, for many JCAHO-accredited hospitals, the number of "expected cases" computed by CMS underestimated the number of Medicare cases for which these hospitals should have submitted data, because JCAHO-accredited hospitals were to submit cases according to the JCAHO sampling rules. The analysis that CMS conducted using Medicare claims data for its fiscal year 2005 update decision and the three analyses it conducted in conjunction with its fiscal year 2006 update decision shared two limitations: none addressed the completeness of data submissions for non-Medicare patients, and none could detect incomplete data for all hospitals. Given that non-Medicare patients represent a substantial proportion of the patients treated for heart attacks, heart failure, and pneumonia,31 any minimum number of "expected cases" based on Medicare claims inherently underestimates the total number of patients for which hospitals should have submitted quality data for the APU program. Moreover, the approaches taken in the analyses conducted for both fiscal year updates could not detect incomplete data for many hospitals. For example, in the fiscal year 2005 analysis, the difference between the number of cases expected under the CMS sampling rules and the higher number expected under the sampling rules that applied to JCAHO-accredited hospitals meant that JCAHO-accredited hospitals treating more patients than the minimum CMS sample of seven could have failed to submit data on most of the cases that exceeded the CMS minimum and still have met the number of expected cases set by CMS.32 The analysis that CMS conducted to determine hospital eligibility for the full fiscal year 2006 update also could identify only certain hospitals that submitted incomplete data, in this case limited to hospitals that submitted no patient data at all to the clinical warehouse in a given quarter. 31Non-Medicare patients account for about 40 to 50 percent of all patients hospitalized for heart attacks and pneumonia and 20 to 32 percent of those hospitalized for heart failure. For individual hospitals, these percentages could be higher or lower. 32See appendix I for more detailed information on the limitations that applied to CMS's effort to estimate a minimum number of expected cases for each hospital. CMS officials acknowledged that the lack of information on non-Medicare patients and the imprecise adjustments that CMS made to take account of the varying sampling procedures that hospitals could have followed limited the conclusions that CMS could draw from its Medicare claims data analysis for the fiscal year 2005 update. Because of these limitations, CMS officials described their effort as a rough check for inconsistencies between data submitted by hospitals to the clinical warehouse and the cases that the hospitals had billed to Medicare. CMS has not combined these limited efforts to monitor the completeness of hospital quality data submissions with efforts to clearly inform hospital officials of their obligation to submit complete data. For example, CMS has not explicitly listed submission of complete data as a requirement for participating in the APU program on the "Notice of Participation" that the hospital CEO or administrator must sign when hospitals enroll. The notice states requirements for participating hospitals-including that they must register with the QualityNet Exchange Web site33 and that they must submit data for all measures specified in the APU-measure set by established deadlines. The notice indicates that the submitted data will undergo validation, a reference to the CDAC reabstraction process. However, the notice does not stipulate that hospitals must submit data for all eligible cases, or for a representative sample if appropriate. We interviewed health professionals familiar with the APU program, several of whom raised concerns about data completeness. One expert in the area of outcomes research noted the potential for systematic underreporting by hospitals. He suggested that, as one approach to detect systematic underreporting, CMS could compare not only the number of patients for whom data were submitted and Medicare claims filed, but also the characteristics of patients for cases submitted to the APU program to the patient characteristics of comparable cases submitted to Medicare for payment. Another expert in the area of clinical quality improvement expressed his concern that the APU program did not verify the completeness of the data. He observed that hospitals have flexibility in determining which patients are included through their assignment of the patient's primary diagnosis. A QIO official echoed this concern, noting the risk that hospitals could decide to not submit cases where patients had not received the services or activities assessed by the APU measures. 33The QualityNet Exchange Web site is the secure Internet connection used to transmit hospital quality data to the clinical warehouse. Data Accuracy Baseline Was High Overall, but Statistically Uncertain for Many Hospitals, and Data Completeness Baseline Cannot Be Determined We could determine a baseline level of accuracy for the quality data submitted for the APU program but not a baseline level of completeness. We found a high overall baseline level of accuracy when we examined CMS's assessment of the data submitted by hospitals for the first two calendar quarters of 2004. The median accuracy score exceeded 90 percent, which was well above the 80 percent accuracy threshold set by CMS, and about 90 percent of hospitals met or exceeded that threshold for both the first and the second calendar quarters of 2004. For most hospitals whose accuracy scores were well above the threshold, the results were statistically certain. However, for approximately one-fourth to one-third of all the hospitals that CMS assessed for accuracy, the statistical margin of error for their accuracy score included both passing and failing accuracy levels. Consequently, for these hospitals, the small number of cases that CMS examined was not sufficient to establish with statistical certainty whether the hospital met the threshold level of data accuracy. Accuracy did not vary between rural and urban hospitals, and small hospitals provided data as accurate as those from larger hospitals. The completeness baseline could not be determined because CMS did not assess the extent to which all hospitals submitted data on all eligible patients, or a representative sample thereof, for the first two calendar quarters of 2004, and consequently there were no data from which to derive such an assessment. Baseline Level of Data Accuracy Was High Overall, and Large Majority of Hospitals Met Accuracy Threshold Overall, the baseline level of data accuracy for the first two quarters of the APU program was high. The median accuracy score achieved by hospitals ranged between 90 and 94 percent, with slightly higher values in the second quarter and for the APU-measure set. (See fig. 2.) In addition, with at least half the hospitals receiving accuracy scores above 90, relatively few failed to reach the 80 percent threshold set by CMS. Figure 2: Baseline Hospital Accuracy Scores at Selected Percentiles, by Measure Set and Quarter Note: Figure reflects accuracy scores for hospitals covered by the APU program. Hospitals that submitted fewer than six cases to the clinical warehouse in a quarter did not undergo CDAC reabstraction and therefore did not receive an accuracy score for that quarter. Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. In both quarters, 90 to 92 percent of hospitals obtained accuracy scores meeting the threshold using the APU-measure set, and 87 to 90 percent met the threshold using the expanded-measure set (see table 2).34 The 8 to 13 percent of hospitals that did not meet the accuracy threshold represented approximately 300 to 500 hospitals across the country. Table 2: Percentage and Number of Hospitals Whose Baseline Accuracy Score Met or Fell Below the 80 Percent Threshold, by Measure Set and Quarter APU-measure set Expanded-measure set January-March April-June 2004 January-March April-June 2004 2004 discharges discharges 2004 discharges discharges Percentage Number Percentage Number Percentage Number Percentage Number Hospitals whose accuracy score met 80 percent threshold 90.2 3,290 91.8 3,282 86.8 3,165 90.0 3,217 Hospitals whose accuracy score fell below 80 percent threshold 9.8 359 8.2 292 13.2 483 10.0 357 Total 100 3,649 100 3,574 100 3,648 100 3,574 Source: GAO analysis of CMS data. Note: Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. There were minimal differences in baseline accuracy scores among hospitals characterized by urban or rural location and small or large capacity,35 but variation across hospitals served by different data vendors was more substantial. Rural hospitals and smaller hospitals generally received accuracy scores similar to those of urban hospitals and larger hospitals.36 Among the hospitals that used JCAHO-certified data vendors to submit their quality data to the clinical warehouse, a higher percentage of hospitals served by certain data vendors met the 80 percent threshold than did the hospitals served by other data vendors (see app. III, table 8).37 34For our analysis of baseline accuracy, the expanded-measure set includes the seven additional quality measures beyond the APU-measure set that HQA adopted for discharges after March 31, 2004. We found that some hospitals submitted data on the additional measures to the clinical warehouse for discharges occurring before that date, possibly because the hospitals were already collecting those data for JCAHO. 35We assessed hospital capacity in terms of the number of patient beds. 36For more detailed information on the relation of data accuracy to hospital characteristics and use of data vendors, see the tables in appendix III. Passing the 80 Percent Threshold Is Statistically Uncertain for One-Fourth to One-Third of Hospitals While the baseline level of data accuracy achieved by hospitals in the aggregate was well above the 80 percent threshold, for approximately one-fourth to one-third of hospitals the determination that a particular hospital met the 80 percent threshold was statistically uncertain. This uncertainty stems primarily from the small number of cases examined for accuracy from each hospital. Because CDAC's reabstraction of the data is limited to five patient records per quarter, the greater sampling variability found in small samples leads to relatively large confidence intervals, reflecting low statistical precision, for the accuracy score of any specific hospital.38 Across all hospitals, the median difference between the upper and lower limits of the confidence interval was 14.0 percentage points using the APU-measure set for first-quarter discharges, dropping to 11.8 percentage points in the second quarter.39 For the expanded-measure set, the median confidence interval was 14.6 percentage points in the first quarter and 13.0 percentage points in the second. 37The data that we obtained from CMS specifically identified data vendors that JCAHO had certified for its own performance reporting system. These data vendors submitted data to the clinical warehouse for 78 to 79 percent of the hospitals we analyzed for the two baseline quarters, while another 13 to 14 percent of hospitals directly submitted their own data. 38Statistical uncertainty occurs because different samples generally produce different results, due to variation among the individual patients selected for different samples. With larger samples, differences in the results obtained from one sample to another decrease. Calculating a confidence interval provides a way to assess the effect of sample variation on the results obtained. Confidence intervals are usually computed at the 95 percent level. So if 100 samples were selected, the result produced by 95 of them would likely fall between the low and high ends of the confidence interval. For example, one 300-plus-bed hospital in Virginia had an accuracy score of 83.3 for the second calendar quarter of 2004 using the expanded-measure set, with a confidence interval that ranged from 76.8 to 89.9. There is a 95 percent likelihood that any sample selected for that hospital would generate an accuracy score that was greater than 76 and lower than 90. 39The formula used to generate these confidence intervals takes into account variation in the number of individual data elements that were available in the five selected cases to compare the hospital's and CDAC's results. This is the same formula that is used by CMS, with one modification. Whereas CMS applied a one-tailed test at a 95 percent significance level to protect against hospitals receiving a failing score due to sampling error, we applied a two-tailed test at the 95 percent significance level to identify both failing and passing scores that were statistically uncertain. (See app. I.) The wide confidence intervals meant that for a substantial number of hospitals it was statistically uncertain whether a different sample of cases would have altered their result from passing the 80 percent threshold to failing, or vice versa.40 For most hospitals there was statistical certainty that their baseline accuracy score met CMS's 80 percent accuracy threshold. However, other hospitals had confidence intervals for their accuracy scores where the upper limit was 80 or above and the lower limit was less than 80. Because the confidence interval around the accuracy score computed for each of these hospitals bracketed the accuracy threshold set by CMS, their results were statistically uncertain.41 Consequently, for these hospitals, the small number of cases that CMS examined was not sufficient to establish whether the hospital met the threshold level for data accuracy. One-third of all the hospitals that CMS assessed for accuracy fell into this uncertain category for first-quarter 2004 discharges using the APU-measure set. (See fig. 3.) This proportion declined to about one-fourth of the hospitals for the second quarter. When the expanded-measure set was used-as CMS has done when calculating its quarterly accuracy scores-the proportion of hospitals whose accuracy scores were statistically uncertain increased compared to the APU-measure set for both the first and the second quarter. 40Most, but not all, of the hospitals with statistically uncertain results had accuracy scores of 80 or above. See table 10 in appendix III. 41For example, if a hospital had a confidence interval that ranged from 77 to 90, taking multiple samples would lead to some samples generating accuracy scores at or above 80 and other samples generating scores of less than 80. Whether that hospital passed the 80 percent accuracy threshold would depend on which of those samples was actually selected. Figure 3: Percentage of Hospitals Whose Baseline Accuracy Score Confidence Intervals Clearly Exceed, Fall Below, or Include the 80 Percent Threshold, by Measure Set and Quarter Note: The confidence interval is based on a 95 percent significance level. Calculation of the accuracy scores and confidence intervals for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. These confidence intervals would narrow if CMS drew on multiple quarters of data to bring more cases into the computation of the accuracy scores. CMS has stated its intention to base this accuracy assessment on four quarters of hospital quality data, but so far every accuracy score it has generated and reported to hospitals has been based on a single quarter of data. Moreover, its implementation of the fiscal year 2006 payment update called for using only one quarter of data, with the possibility of adding one more quarter of data for hospitals that failed to meet the accuracy threshold based on the single quarter of data.42 42See 70 Fed. Reg. at 47422. No Data Were Available to Provide Baseline Assessment of Completeness of Hospital Quality Data There were no data available from which to estimate a baseline level of completeness for the first two calendar quarters of data submitted for the APU program. In contrast to the system of quarterly reabstractions performed by CDAC to check the accuracy of quality data submitted by hospitals, CMS did not conduct any corresponding assessment of the extent to which all hospitals submitted data on all the cases, or a representative sample of such cases, that met CMS's eligibility criteria for the first two calendar quarters of 2004. The information that CMS did collect was not suitable for estimating the baseline level of data completeness. The Medicare claims data analysis conducted by CMS on the first calendar quarter of data submitted for the APU program was not designed to provide valid information on the magnitude of data incompleteness for each hospital, which is what is needed to estimate a baseline level of data completeness. Although CMS could identify instances where certain hospitals failed to provide quality data on all eligible cases, CMS's analysis did not produce comparable information on data completeness for every hospital. As noted above, it lacked information on non-Medicare patients and could not adjust properly for the sample sizes that JCAHO-accredited hospitals would have drawn if they followed JCAHO's sampling rules rather than CMS's. The limitations in the CMS analysis would affect some hospitals more than others, depending on how many non-Medicare patients a hospital treated and whether it applied the JCAHO sampling rules. Consequently, had we used information from this analysis to estimate baseline data completeness, our results would have been distorted by the uneven impact of those factors on the information produced for different hospitals.43 In addition, we found no data for assessing the baseline completeness of the quality data provided by hospitals submitting samples of their eligible cases to the clinical warehouse. For hospitals that submitted a sample, their quality data could be incomplete, even if they submitted the expected number of cases, if their samples were not selected in a way that ensured they were representative of all a hospital's patients. If a hospital did not follow appropriate procedures to provide for random selection, the sample might not be representative and therefore could be incomplete. Because the available information from CMS focused on the number of cases submitted, and not on how they were selected, we could not address this aspect of data completeness. 43See appendix I for a more detailed description of this assessment. Other Reporting Systems Use Various Methods to Ensure Data Accuracy and Completeness, Notably an Independent Audit Other reporting systems that collect clinical performance data have adopted various methods to ensure data accuracy and completeness, and officials from these systems stressed the importance of including an independent audit in these activities. Most other reporting systems that conduct independent audits incorporate three methods as part of their process that CMS does not use in its independent audit. Specifically, these systems include an on-site visit, focus their audit on a selected number of facilities or reporting entities, and review a minimum of 50 patient medical records per reporting entity. Other Reporting Systems Use Various Methods to Check Data Other reporting systems that collect clinical performance data have adopted various methods to ensure data accuracy and completeness. To check data accuracy, all the other reporting systems we examined assess the data when they are submitted, typically using computers to detect missing or out-of-range data. (See app. II, tables 4 and 5.) In addition, all the other systems have developed standardized data collection processes and measures. When checking data completeness, all the other systems compare submitted data with data from another source, whether inside the facility, such as pharmacy or laboratory records, or outside the facility, such as state hospital discharge data or Medicare claims data. Officials reported that these analyses were done annually or had been done one time, and one said that additional studies were planned.44 Officials from these systems also cite various other methods to consider when ensuring data accuracy and completeness, including reviewing established measures annually, identifying a point person at each facility to provide consistency, establishing channels for ongoing communication, and providing training on a continuous basis.45 44For example, on-site auditors from one reporting system compare the data submitted against catheterization laboratory schedules and hospital billing records for the previous 12 months. Another reporting system hired a contractor to perform a one-time study comparing patient assessment data submitted by a facility against its total Medicare claims to identify instances where patient assessments were missing. Other Reporting Systems Conduct Independent Audits Most other reporting system officials we interviewed conduct independent audits that include a comparison of submitted data to medical records. Most other reporting systems that conduct independent audits incorporate three methods as part of their process that CMS does not use in its independent audit. Specifically, they (1) include an on-site visit as part of their independent audit, (2) focus their audits on a selected number of facilities or reporting entities, and (3) review a minimum of 50 patient medical records per reporting entity during the auditing process. During an on-site visit, auditors are able to review patient medical records for accuracy and interview staff when additional information is needed. Auditors are also able to check the data submitted to their system against other data sources at the facilities, including physician notes, patient or resident rosters, billing records, laboratory records, and pharmacy records. In addition, because auditors from other reporting systems may not visit every facility,46 the systems use various methods to focus the auditing process when selecting which facilities to visit. These include auditing a percentage of all eligible facilities, auditing facilities that did particularly well or poorly, and auditing a subset of facilities each year. Furthermore, most of the other reporting systems that conduct independent audits review a minimum of 50 patient medical records per audited entity as part of their independent auditing process. When selecting which patient medical records to review, some systems take a random sample of the patient population, one system reviews all deaths at the selected facility, and another reviews all instances where the patient died from shock as a result of percutaneous coronary intervention. 45We have also published a document that describes a flexible framework for assessing data reliability, including both accuracy and completeness, when assessing computer-processed data. This document offers procedures that can be adapted to varying circumstances. These procedures include conducting electronic data testing, such as logic tests; ensuring internal control systems are in place that check the data when they are entered into the system and limit access to the system; checking for missing data elements as well as missing case records; and reviewing related documentation, which may include tracing a sample of records large enough to estimate an error rate back to their source documents. See GAO, Assessing the Reliability of Computer-Processed Data, GAO-03-273G (Washington, D.C.: October 2002) External Version 1. 46An official from one reporting system said that budgetary constraints limit the number of on-site audits that the system can perform. As a result, auditors from that system focus their review on hospitals with outcomes that fall above and below the systemwide average. Officials at other reporting systems we interviewed and an expert in the field stressed the importance of the independent audit. For example, an official from one of the other reporting systems said that audits conducted by an independent third party are "the best way" to ensure data accuracy and completeness. An official from another reporting system said that having someone independently check the data is "one of the most important things" that an organization can do to check data accuracy and completeness. Additionally, an expert we interviewed said that independent, external audits are "essential." Though most of the other reporting systems employ an independent auditing process, officials from one system that has yet to implement such a process said their organization recognizes the importance of independently checking the data and is currently designing and implementing an independent auditing process. Conclusions Data collected for the APU program affect the payment received by hospitals from Medicare and are used to inform the public about hospital quality. For both these purposes, it is important that CMS is able to ensure that the data are reliable in terms of both accuracy and completeness. CMS has put in place an ongoing process for assessing the accuracy of quality data submitted by hospitals, but the process has limitations. Although CMS checks the accuracy of data electronically as they are submitted and through an independent audit conducted by CDAC, the latter process is limited by the selection of only five cases per quarter per hospital, regardless of the hospital's size. Most hospitals had high baseline accuracy scores that were statistically certain. However, for about one-fourth to one-third of all the hospitals that CMS assessed for the first two calendar quarters of 2004, CMS's determination as to whether the hospital met its accuracy standard was statistically uncertain. This was due primarily to the small number of cases selected for an audit. Although CMS has stated its intention to look at more cases by pooling reabstraction results from more than one calendar quarter, all of the hospital accuracy reports that it has generated to date have been based on a single quarter of data. Officials from other reporting systems that collect clinical performance data told us that they also use an independent audit to check data accuracy, but generally sample a larger number of patient medical records, either by sampling a percentage of total cases submitted or by identifying a minimum number of cases in the sample. In addition, most other reporting systems focused their audits on a selected number of facilities. In contrast to CMS's establishment of an ongoing process for assessing data accuracy, the agency has not put in place an ongoing process to check the completeness of the data that hospitals submit. Because of the purposes for which these data may be used, there could be an incentive for hospitals to selectively report data on cases that score well on the quality measures. With no ongoing way to check completeness, CMS does not know whether or how often hospitals submit incomplete data. We believe this is a significant gap in oversight. The process used for the fiscal year 2005 annual payment update compared hospital submissions to Medicare claims data, but as CMS has noted, this did not provide a comparable assessment of each hospital's data, even for Medicare patients alone. Moreover, in its comparison of hospital quality data submissions with Medicare claims for the fiscal year 2006 update, CMS identified more than 100 hospitals that had treated eligible patients in a given quarter but had not submitted data on a single case for that quarter to the clinical warehouse. Yet CMS has not asked hospitals to certify that the data they have submitted constitute all, or a representative sample, of the eligible patient population. The various methods used by other reporting systems to check the completeness of data illustrate the variety of approaches that are available. These include conducting on-site visits as part of their independent audit, comparing data submissions to data from another source maintained by the facility or external to it, and performing such checks annually or planned at specified intervals. Given CMS's plans to continue public reporting efforts after the APU program ends, we believe that processes for checking the reliability of data should continue to be refined in order for the individuals and organizations that use the data to have confidence in the information. Recommendations for Executive Action In order for CMS to help ensure the reliability of the quality data it uses to produce information on hospital performance, we recommend that the CMS Administrator undertake the following three actions: o focusing on the subset of hospitals for which it is statistically uncertain if they met CMS's accuracy threshold in one or more previous quarters, increase the number of patient records reabstracted by CDAC in a subsequent quarter so that the proportion of hospitals with statistically uncertain results is reduced; o require hospitals to certify that they took steps to ensure that they submitted data on all eligible patients, or a representative sample thereof; and o assess the level of incomplete data submitted by hospitals for the APU program to determine the magnitude of underreporting, if any, in order to refine how completeness assessments may be done in future reporting efforts. In commenting on a draft of this report, CMS stated it appreciated our analysis and recommendations. (CMS's comments appear in app. IV.) The agency noted that the APU program led to a dramatic increase in the number of hospitals that submitted data on the designated 10 quality measures, resulting in public reporting of quality data for about 3,600 hospitals on the agency's Web site. In addition, CMS described the steps it had taken to ensure the accuracy and completeness of the quality data submitted by hospitals for the APU program. It said that the methods it had used were sound, but it agreed that the quality and completeness of the data must be improved. With respect to reducing the statistical uncertainty of its assessments of the accuracy of hospital quality data submissions, CMS agreed that the quarterly accuracy assessments based on five patient charts can have considerable sampling error and stated that it would improve the stability of its accuracy assessments by using data from four calendar quarters when it assessed hospital eligibility for the fiscal year 2007 annual payment update. CMS stated a concern with having sufficient time within the current data submission schedule to increase the number of patient records reabstracted. However, we recommended in the draft report that hospitals with statistically uncertain results in one or more previous quarters have an increased number of records reabstracted. The assessment of statistical uncertainty for a hospital and the reabstraction of additional records do not need to occur within the same quarter. We have modified slightly the wording of the recommendation to clarify the intended timing of these additional reabstractions. With respect to ensuring the completeness of quality data submitted by hospitals, CMS agreed that it needs to improve its methods. CMS noted that its comparison of hospital data quality submissions to the claims filed by those hospitals to be paid for treating Medicare beneficiaries uncovered numerous discrepancies. The agency agreed with our recommendation to require hospitals to formally attest to the completeness of the quality data that they submit quarterly. In addition, CMS stated that it would also require each hospital to report the total number of Medicare and non-Medicare patients who were eligible for quality assessment under the APU program. In terms of assessing the level of incomplete data for the APU program, CMS said it had a process in place to accomplish this, but as we stated in the draft report, CMS's process did not cover all patients and all hospitals because it lacked information on non-Medicare patients even though hospitals were required to submit data on both Medicare and non-Medicare patients. Additionally, the tests that CMS applied could detect incomplete data for only a limited subset of hospitals, in contrast to its assessment of data accuracy which covered all hospitals that submitted data on six or more cases in a quarter. CMS acknowledged it could assess completeness only for Medicare patients, but said that by requiring hospitals to report an aggregate count of all eligible patients, it would henceforth have the data needed to assess the completeness of both Medicare and non-Medicare quality data submissions. The agency stated it will use these data to provide quarterly feedback to hospitals about the accuracy and completeness of their data submissions, and require them to explain discrepancies between the data they have submitted for the APU program and the aggregate count of eligible patients they have reported. CMS has not said that it will determine the magnitude of underreporting for the program as a whole, as we recommended. Additionally, by relying on the hospitals themselves to supply data on the number of non-Medicare patients, CMS's proposed approach lacks an independent verification of the completeness of submitted data. This contrasts with the practice of most of the other reporting systems we contacted, as well as experts in the field, who generally underscored the importance of independently checking both the accuracy and the completeness of the quality data. As arranged with your offices, unless you publicly announce its contents earlier, we plan no further distribution of this report until 30 days after its issue date. At that time, we will send copies of this report to the Administrator of CMS and other interested parties. We will also make copies available to others on request. In addition, the report will be available at no charge on GAO's Web site at http://www.gao.gov . If you or your staffs have any questions about this report, please contact me at (202) 512-7101 or [email protected]. Contact points for our Offices of Congressional Relations and Public Affairs may be found on the last page of this report. GAO staff who made major contributions to this report are listed in appendix V. Cynthia A. Bascetta Director, Health Care To determine the processes used by the Centers for Medicare & Medicaid Services (CMS) to ensure the accuracy and completeness of data submitted by hospitals for the Annual Payment Update program (APU program), we interviewed both CMS officials and staff at DynKePRO-which operates the Clinical Data Abstraction Center (CDAC)-and the Iowa Foundation for Medical Care (IFMC), two contractors that perform data collection and data quality monitoring tasks for the APU program. In addition, we reviewed documentation on the program available publicly on the Quality Net Exchange Web site1 and the Web sites of several quality improvement organizations (QIO)-contractors to CMS that provide technical assistance to hospitals on the APU program-as well as documents on the APU program provided to us at our request by CMS. We also obtained access to CMS's intranet system and searched for relevant memorandums and other documents regarding CMS's policies and requirements for hospitals that participated in the APU program. To gain insights from other groups involved in the APU program, we interviewed officials from two or more QIOs, state hospital associations, and hospital data vendors that submitted data to the IFMC-operated database for their hospital clients. Our assessment of the baseline accuracy of the initial APU program data depended on the availability of suitable information from CMS. We examined CMS's reabstraction process to determine if the CDAC assessments of data accuracy would be appropriate for that purpose. Reabstraction is the re-collection of clinical data for the purpose of assessing the accuracy of data abstractions performed by hospitals. In the APU program, CDAC compares data reported by the hospitals to those it has independently obtained from the same medical records. CDAC has instituted a range of procedures, including training of its abstractors and continuous monitoring of interrater reliability, intended to ensure that its abstractors understand and follow its detailed guidance for arriving at abstraction determinations that are correct in terms of CMS's data specifications. We interviewed CDAC staff and observed the implementation of these procedures during a site visit at the CDAC facility. On the basis of this information we concluded that it would be appropriate for us to use the results of the CDAC reabstractions to estimate baseline data accuracy for the APU program. We obtained the results of the reabstractions that CDAC had conducted on samples of the patients for whom hospitals had submitted data from the first two quarters of 2004. These two quarters were the first two data submissions made by hospitals under the APU program and the most recent available when we conducted these analyses. They constituted 20,465 patient records for the first quarter and 20,259 for the second. These files showed, for each data element that CMS used in assessing abstraction accuracy, the correct entry as determined by the CDAC abstractors and whether this matched the value that the hospital had reported. We applied CMS's algorithms for computing hospital scores on the expanded-measure set in order to determine the extent of missing or invalid data. We found that approximately 2 to 3 percent of patient records could not be scored on any given APU measure due to missing data. We excluded from the analysis records from critical access hospitals and acute care hospitals in Maryland and Puerto Rico (which are paid under different payment systems than other acute care hospitals and therefore are not subject to a reduced annual payment update under the APU program2) and a small number of records not related to the three medical conditions covered by the APU program.3 Next we applied the scoring rules developed by CMS to assess the accuracy of hospital abstractions. We calculated the accuracy score for each hospital in each quarter, using the data elements needed for the APU-measure set and, separately, for the expanded-measure set. Accuracy scores for the expanded-measure set are based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the 10 measures in the APU-measure set plus the 7 additional measures adopted by the Hospital Quality Alliance for hospital discharges through the second calendar quarter of 2004. These scores represented the proportion of data elements where CDAC and the hospital agreed, summing across all the assessed data elements for the five sampled cases. We then calculated the distribution of those scores, and the proportion of hospitals that met or exceeded the 80 accuracy threshold that CMS had set. Next we calculated the confidence interval for each of those accuracy scores, using the formula that CMS had selected for that purpose. However, whereas CMS applied a one-tailed test-passing any hospital that had a confidence interval whose upper bound reached 80 or above-we applied a two-tailed test to assess the statistical uncertainty attached to both passing and failing the threshold. The one-tailed test that CMS applied prevented hospitals from losing their full annual payment update on the basis of their accuracy score if there was less than a 95 percent probability that a score below 80 would have remained below 80 in another sample. This meant that hospitals with large confidence intervals could have accuracy scores well below 80 and still pass the CMS accuracy requirement. Our analysis focused instead on assessing the level of statistical certainty for all the accuracy scores, both above and below the 80 percent threshold. We sought to identify passing as well as failing scores that could have changed with another sample. To do so, we applied a two-tailed test and observed whether a hospital's confidence interval bracketed the 80 percent threshold. To provide descriptive information about variation in the accuracy scores obtained by hospitals in different situations, we collected additional information about the hospitals from other sources. From the Medicare Provider of Services file we obtained the Social Security Administration metropolitan statistical area code (referred to as the SSA MSA code) and Social Security Administration metropolitan statistical area size code (referred to as the SSA MSA size code) to distinguish between urban and rural hospitals. We also obtained from that source the total number of Medicare-certified beds in order to categorize hospitals by size. To compare the accuracy scores of hospitals that employed different data vendors, we obtained from IFMC the identification codes (but not the names) of the various data vendors certified by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) that had submitted to the clinical warehouse data for the APU program on behalf of hospitals they served. Those codes were also available in the case tracking information for the patient records in the CDAC database. We then identified for each CDAC reabstraction whether the case had originally been submitted by a JCAHO-certified data vendor, and if so, which one. These data were aggregated to generate accuracy scores for each hospital that consistently submitted its quality data through one data vendor in a given quarter. This allowed us to determine the proportion of hospitals served by each JCAHO data vendor that met CMS's 80 percent accuracy threshold. We also calculated the proportion of hospitals that submitted their own quality data to CMS (identified in the CDAC case tracking information by the hospital's Medicare provider ID number) that met the accuracy threshold. Although this analysis was limited to data vendors that were JCAHO-certified, those vendors collectively submitted data to the clinical warehouse for 78 to 79 percent of the hospitals we analyzed in the two baseline quarters. Another 13 to 14 percent of hospitals directly submitted their own data, and we do not have information on how the remaining hospitals submitted data to the clinical warehouse. As was the case for our baseline accuracy assessment, our assessment of the baseline completeness of the data submitted for the APU program depended on the availability of suitable data from CMS. Specifically, we considered using CMS's estimates of minimum expected cases derived from Medicare claims data to arrive at estimates of baseline completeness. The CMS officials we spoke with noted that there were numerous reasons why the two data sources-quality data submissions for the APU program and cases billed to Medicare-would be expected to diverge, apart from any underreporting of quality data by hospitals. The claims data were limited to Medicare fee-for-service patients, whereas the hospitals were obliged to submit quality data on all patients over 18 years of age (over 28 days old for most pneumonia measures), including patients belonging to Medicare health maintenance organizations. In addition, hospitals with large numbers of cases could draw samples for the quality data, but would bill for all patients. In making adjustments to its number of "expected cases" to take account of sampling, CMS found that it could not reliably identify the hospitals that should have followed the JCAHO sampling rules, which would result in larger-sized samples. Therefore, in calculating the number of cases it expected hospitals to have submitted to the clinical warehouse, CMS applied to all hospitals across the board the expectation of smaller samples based on rules that pertained to hospitals not accredited by JCAHO. Finally, the Medicare data used for the comparison was an average volume recorded over the previous 2 years, not claims filed for the quarter to which the quality data applied. We found that these limitations precluded our using information from CMS's Medicare claims analysis to assess the baseline completeness of the data submitted by hospitals for the APU program. CMS's comparison of hospital quality data submissions to the clinical warehouse to its estimated number of "expected cases" might have served CMS's purposes, by identifying at least some instances of significant discrepancy between the number of cases for which quality data were submitted and claims filed. However, we determined that it would not provide a reasonable estimate of the magnitude of data completeness for all hospitals. Because the limitations in the CMS analysis would affect some hospitals more than others, depending on how many non-Medicare patients a hospital treated and whether it applied the JCAHO sampling rules, we concluded that using information from this analysis to estimate baseline data completeness would lead to results that were distorted by the uneven impact of those factors on the information produced for different hospitals. To obtain information on other processes that could be used to check data accuracy and completeness, we interviewed officials from organizations that administer reporting systems that collect clinical performance data. To select these organizations, we took several steps. We reviewed reports on reporting systems, including two issued by QIOs: IPRO's 2003 Review of Hospital Quality Reports and Delmarva Foundation's The State-of-the-Art of Online Hospital Public Reporting: A Review of Forty-Seven Websites.4 We solicited input from the authors of each report and interviewed academic researchers who have researched methods of assessing the reliability of performance data. We used on-line resources to obtain information on federal- and state-administered surveillance efforts. Our selection criteria focused on systems that collected clinical data, as opposed to administrative or claims data, and that were mentioned most often in the reports and interviews cited above. To ensure variation, we selected a mix of systems, including those run by public and private organizations, those receiving data from hospitals and those receiving data from other types of providers, and those collecting data across a range of medical conditions and those collecting data on specific medical conditions. Using a structured protocol, we interviewed officials from the following organizations: JCAHO, National Committee for Quality Assurance, Society of Thoracic Surgeons, California Office of Statewide Health Planning and Development, New York State Department of Health, CMS (the units responsible for monitoring nursing home care regarding the Data Assessment and Verification Project (DAVE) contract), and the American College of Cardiology. Each organization reviewed and confirmed the accuracy of the information presented in appendix II. Our analysis is based on the quality measures established for the APU program and the information available as of September 2005 on the accuracy and completeness of data submitted by hospitals for that program. We did not evaluate the appropriateness of these quality measures relative to others that could have been selected. Nor did we examine the actual performance by hospitals on the measures (e.g., how often they provide a particular service or treatment). Our analysis of the baseline level of accuracy and completeness of data submitted for the APU program is based on the procedures developed by CMS to validate the data submitted. We have not independently compared the data submitted by hospitals to the original patient clinical records. We conducted our work from November 2004 through January 2006 in accordance with generally accepted government auditing standards. Agency Comments Appendix I: Scope and Methodology Appendix I: Scope and Methodology 1We downloaded various documents from the www.qnetexchange.org Web site between December 21, 2004, and January 10, 2006. 2CMS included hospitals in Puerto Rico in its list of hospitals qualifying for the full fiscal year 2005 update, but determined in conjunction with the fiscal year 2006 payment update decision that Puerto Rico's hospitals were exempt from the APU program requirements. Hospitals in Puerto Rico receive prospective payments from Medicare, but under a different system than other hospitals. 3The records we excluded were 536 surgery cases for the first quarter and 604 surgery cases for the second quarter, from hospitals providing data on surgical infection prevention measures. 4IPRO, 2003 Review of Hospital Quality Reports for Health Care Consumers, Purchasers and Providers (Lake Success, N.Y.: October 2003); Delmarva Foundation and the Joint Commission on Accreditation of Healthcare Organizations, The State-of-the-Art of Online Hospital Public Reporting: A Review of Forty-Seven Websites (Easton, Md.: September 2004). Appendix II: Other Reporting Systems Appendix II: Other Reporting Systems Table 3: Background Information on CMS and Other Reporting Systems Other reporting systems California Office of Data National Centers for Statewide Assessment Joint Commission Committee Medicare & Health and on Accreditation for New York Society of Medicaid American College Planning Verification of Healthcare Quality State Thoracic Services of Cardiology and Project Organizations Assurance Department Surgeons (CMS) (ACC) Development (DAVE)a (JCAHO)b (NCQA) of Health (STS) Organization Private, Private, Private, Private, status Public nonprofit Public Public nonprofit nonprofit Public nonprofit Data Hospitals Facilities with Hospitals Nursing JCAHO-accredited Health Hospitals Hospitals, submitted by paid under at least one where homes hospitals plans that perform surgeons the catheterization cardiac cardiac Inpatient laboratory surgeries surgery Prospective (includes are and/or Payment in-hospital, performed percutaneous System freestanding, coronary and/or mobile intervention catheterization (PCI) laboratories) Reporting c Voluntaryd Mandatory Mandatory Mandatorye Mandatorye Mandatory Voluntary requirement Are the data Yes No Yes Yes Yes Yesf Yes No publicly reported? Types of Cardiac - Cardiac Cardiac Resident Cardiac - AMI, Preventive Cardiac - Cardiac - conditions acute -diagnostic -coronary health care HF care, CABG, PCI, CABG, for which myocardial cardiac artery acute and and valve aortic and data are infarction catheterization, bypass Resident Pneumonia chronic surgery mitral submitted (AMI), PCI grafting health conditions valve heart (CABG) status Pregnancy failure General (HF) Surgical thoracic infection surgery Pneumonia prevention Congenital heart surgery Number of 3,839g 611h 120 16,266i ~3,350 560 49 700 facilities reporting Approximate 2 years 7 years 2 yearsj 1 year 3 years 14 years 16 years 16 years program duration Sources: CMS, ACC, California Office of Statewide Health Planning and Development, JCAHO, NCQA, New York State Department of Health, and STS. aDAVE is a CMS contract to assess the reliability of minimum data set assessment data that are submitted by nursing homes. Minimum data set assessments are a minimum data set of core elements to use in conducting comprehensive assessments of patient conditions and care needs. These assessments are collected for all residents in nursing homes that serve Medicare and Medicaid beneficiaries. bJCAHO provided information about its ORYX(R) initiative, which integrates outcome and other performance measurement data into the accreditation process. cUnder Section 501(b) of the Medicare Prescription Drug, Improvement, and Modernization Act of 2003, hospitals shall submit data for a set of indicators established by the Department of Health and Human Services (HHS) as of November 1, 2003, related to the quality of inpatient care. Section 501 (b) also provides that any hospital that does not submit data on the 10 quality measures specified by the Secretary of Health and Human Services will have its annual payment update reduced by 0.4 percentage points for each fiscal year from 2005 through 2007. dSome states and insurance companies have started to require hospital participation. eData submission is mandatory to maintain accreditation. fOnly audited data are publicly reported. gThe number of hospitals that submitted data to receive their annual payment update for fiscal year 2005. hThe number of facilities enrolled in ACC's National Cardiovascular Data Registry(R) as of July 13, 2005. iThis number represents the number of nursing homes that submitted minimum data set assessments between January 1, 2004, and December 31, 2004. Accuracy estimates are made by selecting a random sample of records for off-site and on-site medical record review. jMandatory reporting of performance data began in 2003. Table 4: Processes Used by CMS and Other Reporting Systems to Ensure Data Accuracy Other reporting Centers systems for California Data Joint National Medicare Office of Assessment Commission on Committee Society & American Statewide and Accreditation for New York of Medicaid College of Health Verification of Healthcare Quality State Thoracic Services Cardiology Planning and Project Organizations Assurance Department Surgeons (CMS) (ACC) Development (DAVE)a (JCAHO)b (NCQA) of Health (STS) Processes Training 0M 0M 0M 0M 0M 0M 0M 0M Standardized 0Mc 0M 0M 0M 0Mc 0M 0M 0M measures or definitions Standardized 0M 0M 0M 0M 0M 0M 0M 0M processes for data collection Automated 0M 0M 0M 0Md 0M 0M 0M 0M data edits when the data come in as part of the data quality assurance process (identify missing or out-of-range data) Independent 0M 0M 0M 0M e 0M 0M f audits On-site 0M 0M 0M 0M 0M f audits Medical 0M 0M 0M 0M 0M 0M f record review Sample size Patients 5 10% random 70 records 13-16 Not 60 50 j records sample of recordsh applicable records recordsi medical records, 50 record minimumg Facilities All 10% random Outliers and 69 Not All 20 m sample of near-outliers applicable programs/ eligible for mortality yearl sitesk Sources: CMS, ACC, California Office of Statewide Health Planning and Development, JCAHO, NCQA, New York State Department of Health, and STS. aDAVE is a CMS contract to assess the reliability of minimum data set assessment data that are submitted by nursing homes. Minimum data set assessments are a minimum data set of core elements to use in conducting comprehensive assessments of patient conditions and care needs. These assessments are collected for all residents in nursing homes that serve Medicare and Medicaid beneficiaries. bJCAHO provided information about its ORYX(R) initiative, which integrates outcome and other performance measurement data into the accreditation process. cCMS and JCAHO have worked to align their measures. A common set of measures took effect for discharges occurring on or after January 1, 2005. dData checks occur at the state level, for example, the state health department, before the data are accessed by DAVE. eJCAHO performs independent audits of data vendors. fSTS is planning to incorporate an independent audit into its system. STS officials plan on including an on-side audit and medical record review as part of their audit system. gThe 10 percent random sample of medical records is based on annual percutaneous coronary intervention volume. hThe number of cases and facilities identified are limited to on-site audits. Additional cases are reviewed as part of the off-site medical record review process. iAuditors review 100 percent of records when significant discrepancies are identified between the chart and what the hospital reported on specific risk factors. In addition, medical record documentation is reviewed for 100 percent of cases with the risk factors "shock" or "stent thrombosis". jSTS plans to review a minimum of 30 records as a part of its independent auditing process. kACC defines eligible sites as those facilities with a minimum of 50 records to be abstracted over a specified number of quarters. lNew York State Department of Health typically reviews 20 programs per year. In some instances that can mean percutaneous coronary intervention and cardiac surgery at the same hospital, which would count as two programs. mSTS plans on visiting 24 facilities per year as a part of its independent auditing process. Table 5: Processes Used by CMS and Other Reporting Systems to Ensure Data Completeness Other reporting systems Centers California for Office of Data Joint National Medicare Statewide Assessment Commission on Committee Society & American Health and Accreditation for New York of Medicaid College of Planning Verification of Healthcare Quality State Thoracic Services Cardiology and Project Organizations Assurance Department Surgeons (CMS) (ACC) Development (DAVE)a (JCAHO)b (NCQA) of Health (STS) Processes Training 0M 0M 0M 0M 0M 0M Concurrent 0M 0M 0M reviewc Independent 0M 0M 0M d 0M 0M e audits On-site 0M 0M 0M 0M 0M e audits Comparison 0M 0M 0M 0M 0M 0M 0M 0M to another data source Data sources Billing Medicare Hospital ICD-9 Medicare ICD-9 codesf Statewide Medicare claims billing codesf claims data planning provider data records and analysis research and cooperative review system (MEDPAR) (SPARCS) data data Other Patient State death Resident Pharmacy medical files rosters records, records, laboratory catherization records laboratory logs; physician notes Frequency of Twiceg Annuallyh Annually Oncei Annuallyj Annually Annually Oncek data completeness review Sources: CMS, ACC, California Office of Statewide Health Planning and Development, JCAHO, NCQA, New York State Department of Health, and STS. aDAVE is a CMS contract to assess the reliability of minimum data set assessment data that are submitted by nursing homes. Minimum data set assessments are a minimum data set of core elements to use in conducting comprehensive assessments of patient conditions and care needs. These assessments are collected for all residents in nursing homes that serve Medicare and Medicaid beneficiaries. bJCAHO provided information about its ORYX(R) initiative, which integrates outcome and other performance measurement data into the accreditation process. cUnder concurrent review, auditors assess data as they are being collected. dJCAHO performs independent audits of data vendors. eSTS is planning to incorporate an independent audit into its system. STS officials plan on including an on-side audit as part of their audit system. fThe International Classification of Diseases, Ninth Revision (ICD-9) codes were designed to promote international comparability in the collection, processing, classification, and presentation of mortality statistics. gCMS conducted two separate one-time studies that compared Medicare claims data to submitted data. hData completeness reviews are conducted annually for randomly selected sites as part of the on-site audit process and quarterly for data submissions. iA one-time study was conducted; additional studies are planned. jAt a minimum, data completeness reviews are conducted annually. kA one-time study was conducted. Appendix III: Data Tables on Hospital Accuracy Scores Appendix III: Data Tables on Hospital Accuracy Scores Rural hospitals and smaller hospitals generally received accuracy scores that differed minimally from those of urban hospitals and larger hospitals. (See tables 6 and 7.) To the extent there are small differences across categories, they do not show a consistent pattern based on geographic location or size. Table 6: Median Hospital Baseline Accuracy Scores, by Hospital Characteristic, Quarter, and Measure Set April-June 2004 January-March 2004 discharges discharges Median Median accuracy Median accuracy accuracy Median accuracy score for score for score for score for Hospital APU-measure expanded-measure APU-measure expanded-measure characteristic set set set set Urban 92.7 90.0 94.2 91.5 Rural 93.0 91.1 93.8 91.7 < 50 beds 93.0 91.2 93.9 91.8 50-99 beds 93.2 91.1 94.2 92.2 100-199 beds 92.9 90.5 94.1 91.3 200-299 beds 93.0 90.1 94.2 91.7 300-399 beds 92.7 89.8 93.9 91.0 400-499 beds 92.0 89.5 93.8 91.1 500+ beds 92.0 89.0 94.1 91.0 All hospitals 92.9 90.4 94.1 91.6 Source: GAO analysis of CMS data. Note: Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. Table 7: Proportion of Hospitals with Baseline Accuracy Scores Not Meeting 80 Percent Threshold, by Hospital Characteristic, Quarter, and Measure Set April-June 2004 January-March 2004 discharges discharges Percentage Percentage not meeting Percentage not not meeting Percentage not threshold meeting threshold meeting for threshold for for threshold for Hospital APU-measure expanded-measure APU-measure expanded-measure characteristic set set set set Urban 10.3 14.4 7.7 10.3 Rural 9.1 11.6 8.9 9.6 < 50 beds 9.4 12.8 10.3 12.0 50-99 beds 9.6 12.4 8.3 8.5 100-199 beds 8.7 12.3 8.6 9.8 200-299 beds 9.5 12.8 6.0 9.3 300-399 beds 11.8 15.0 6.5 8.6 400-499 beds 10.6 14.1 8.1 11.1 500+ beds 12.2 16.6 8.6 12.2 All hospitals 9.8 13.2 8.2 10.0 Source: GAO analysis of CMS data. Note: Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. CMS deems hospitals that achieve an accuracy score of 80 or better as having met its requirement to submit accurate data. Accuracy scores among hospitals whose data were submitted to CMS by different JCAHO-certified vendors varied more, especially in the percentage of the hospitals that failed to meet the 80 percent threshold. (See table 8.) Collectively, these data vendors submitted data to the clinical warehouse for approximately 78 to 79 percent of hospitals affected by the APU program in the two baseline quarters, while another 13 to 14 percent of hospitals directly submitted their own data. For large data vendors (serving more than 100 hospitals), medium vendors (serving between 20 and 100 hospitals), and small vendors (serving fewer than 20 hospitals), there was marked variation within each size grouping in the proportion of the vendors' hospitals that did not meet the accuracy threshold. Such variation could reflect differences in the hospitals served by different vendors as well as differences in the services provided by those vendors. Table 8: Percentage of Hospitals with Baseline Accuracy Scores Not Meeting 80 Percent Threshold, by JCAHO-Certified Vendor Grouped by Number of Hospitals Served, Quarter, and Measure Set Percentage not meeting Vendors, Percentage not meeting threshold for grouped by threshold for APU-measure set expanded-measure set number of April-June April-June hospitals January-March 2004 2004 January-March 2004 served discharges discharges 2004 discharges discharges Large vendors Vendor 1 2.6 2.6 3.9 2.6 Vendor 2 7.1 7.2 9.3 7.2 Vendor 3 7.7 9.5 14.0 11.3 Vendor 4 10.1 9.8 11.1 10.2 Vendor 5 11.1 8.4 14.4 10.4 Vendor 6 12.2 10.4 16.5 11.3 Vendor 7 12.4 9.0 12.4 13.6 Vendor 8 13.3 5.8 15.8 7.9 Medium vendors Vendor 9 2.4 4.5 2.4 2.3 Vendor 10 3.4 3.1 3.4 6.3 Vendor 11 4.2 6.8 6.9 6.8 Vendor 12 4.8 4.8 4.8 6.5 Vendor 13 4.9 2.8 4.9 2.8 Vendor 14 6.4 4.3 8.5 6.4 Vendor 15 7.1 6.0 7.1 7.5 Vendor 16 7.6 5.0 19.0 13.8 Vendor 17 7.9 2.6 9.2 2.6 Vendor 18 8.0 3.4 12.0 6.9 Vendor 19 8.8 2.9 26.5 8.8 Vendor 20 12.1 5.5 17.6 7.7 Vendor 21 13.5 5.6 13.5 8.3 Vendor 22 15.2 13.9 17.7 17.7 Vendor 23 18.4 10.0 28.6 12.0 Small vendors Vendor 24 0.0 11.8 0.0 11.8 Vendor 25 0.0 7.1 0.0 7.1 Vendor 26 0.0 0.0 0.0 0.0 Vendor 27 0.0 16.7 0.0 16.7 Vendor 28 0.0 0.0 0.0 0.0 Vendor 29 0.0 0.0 0.0 0.0 Vendor 30 0.0 0.0 0.0 0.0 Vendor 31 8.3 0.0 16.7 0.0 Vendor 32 9.1 8.3 9.1 16.7 Vendor 33 9.1 0.0 27.3 0.0 Vendor 34 10.0 9.1 10.0 9.1 Vendor 35 11.1 11.1 11.1 11.1 Vendor 36 20.0 33.3 60.0 33.3 Vendor 37 33.3 0.0 33.3 0.0 Vendor 38 33.3 0.0 33.3 0.0 No vendor 10.2 12.5 11.6 13.2 Source: GAO analysis of CMS data. Note: Large vendors served more than 100 hospitals, medium vendors served 20 to 100 hospitals, and small vendors served fewer than 20 hospitals. Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. CMS deems hospitals that achieve an accuracy score of 80 or better as having met its requirement to submit accurate data. Rank ordering hospitals by the breadth of the confidence intervals around their accuracy scores, from the narrowest to the widest intervals, shows the large variation that we found across both quarters and measure sets. Hospitals with the narrowest confidence intervals, shown in table 9 as the 10th percentile, had a range of no more than 6 percentage points between the lower and upper limits of their confidence interval. That meant that their accuracy scores from one sample to the next were likely to vary by no more than plus or minus 3 percentage points from the accuracy score obtained in the sample drawn by CMS. By contrast, hospitals with the widest confidence intervals, shown in table 9 as the 90th percentile, exceeded 36 percentage points from the lower limit to the upper limit of their confidence interval. The accuracy scores for these hospitals would likely vary from one sample to the next by 18 percentage points or more, up or down, relative to the accuracy score derived from the CMS sample. For hospitals whose confidence interval included the 80 percent threshold, it was statistically uncertain whether a different sample of cases would have altered their result from passing the 80 percent threshold to failing, or vice versa. Table 9: Breadth of Confidence Intervals in Percentage Points Around the Hospital Baseline Accuracy Scores at Selected Percentiles, by Measure Set and Quarter Hospital percentiles Expanded-measure from narrowest APU-measure set set to widest January-March April-June January-March confidence 2004 2004 2004 April-June 2004 intervals discharges discharges discharges discharges 10th percentile 5.4 0.0 6.0 5.6 25th percentile 8.1 7.3 9.3 8.2 Median 14.0 11.8 14.6 13.0 75th percentile 24.2 21.5 23.6 21.3 90th percentile 40.3 41.0 37.9 36.8 Source: GAO analysis of CMS data. Note: Confidence interval based on a 95 percent significance level. Calculation of accuracy scores and confidence intervals for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. One-third to one-fourth of hospitals had statistically uncertain results because their confidence interval extended both above and below the 80 percent threshold. Some of these hospitals had accuracy scores of 80 or above and some had scores of less than 80. Table 10 separates these hospitals into (1) those that had accuracy scores equal to 80 or above and were statistically uncertain and (2) those that had accuracy scores below 80 and were statistically uncertain. The table shows that most of the statistical uncertainty involved hospitals that passed CMS's accuracy threshold, but if a different sample of cases had been reabstracted by CDAC, there was a substantial possibility that they would not have passed. Table 10: For Hospitals with Confidence Intervals That Included the 80 Percent Threshold, Percentage of Total Hospitals with an Actual Baseline Accuracy Score That Either Met or Failed to Meet the Threshold, by Measure Set and Quarter Expanded-measure APU-measure set set January-March April-June January-March 2004 2004 2004 April-June 2004 discharges discharges discharges discharges Percentage of hospitals whose actual accuracy score equals 80 or better 23.9 19.2 28.0 24.0 Percentage of hospitals whose actual accuracy score equals less than 80 8.3 7.0 11.3 8.7 Total 32.2 26.3 39.2 32.7 Source: GAO analysis of CMS data. Note: Confidence interval based on a 95 percent significance level. Calculation of accuracy scores for the expanded-measure set was based on all the measures for which a hospital submitted data, which could range from the APU measures alone to a maximum of 17-the APU measures plus as many as 7 additional measures. CMS deems hospitals that achieve an accuracy score of 80 or better as having met its requirement to submit accurate data. Appendix IV: Comments from the Centers for Medicare & Medicaid Services Appendix IV: Comments from the Centers for Medicare & Medicaid Services Appendix V: A Appendix V: GAO Contact and Staff Acknowledgments GAO Contact Cynthia A. Bascetta (202) 512-7101 or [email protected] Acknowledgments In addition to the contact named above, Linda T. Kohn, Assistant Director; Ba Lin; Nkeruka Okonmah; Eric A. Peterson; Roseanne Price; and Jessica C. Smith made key contributions to this report. (290403) GAO's Mission The Government Accountability Office, the audit, evaluation and investigative arm of Congress, exists to support Congress in meeting its constitutional responsibilities and to help improve the performance and accountability of the federal government for the American people. GAO examines the use of public funds; evaluates federal programs and policies; and provides analyses, recommendations, and other assistance to help Congress make informed oversight, policy, and funding decisions. GAO's commitment to good government is reflected in its core values of accountability, integrity, and reliability. Obtaining Copies of GAO Reports and Testimony The fastest and easiest way to obtain copies of GAO documents at no cost is through GAO's Web site ( www.gao.gov ). Each weekday, GAO posts newly released reports, testimony, and correspondence on its Web site. To have GAO e-mail you a list of newly posted products every afternoon, go to www.gao.gov and select "Subscribe to Updates." Order by Mail or Phone The first copy of each printed report is free. Additional copies are $2 each. A check or money order should be made out to the Superintendent of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or more copies mailed to a single address are discounted 25 percent. Orders should be sent to: U.S. Government Accountability Office 441 G Street NW, Room LM Washington, D.C. 20548 To order by Phone: Voice: (202) 512-6000 TDD: (202) 512-2537 Fax: (202) 512-6061 To Report Fraud, Waste, and Abuse in Federal Programs Contact: Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: [email protected] Automated answering system: (800) 424-5454 or (202) 512-7470 Congressional Relations Gloria Jarmon, Managing Director, [email protected] (202) 512-4400 U.S. Government Accountability Office, 441 G Street NW, Room 7125 Washington, D.C. 20548 Public Affairs Paul Anderson, Managing Director, [email protected] (202) 512-4800 U.S. Government Accountability Office, 441 G Street NW, Room 7149 Washington, D.C. 20548 www.gao.gov/cgi-bin/getrpt? GAO-06-54 . To view the full product, including the scope and methodology, click on the link above. For more information, contact Cynthia A. Bascetta, (202) 512-7101 or [email protected]. Highlights of GAO-06-54 , a report to the Committee on Finance, U.S. Senate January 2006 HOSPITAL QUALITY DATA CMS Needs More Rigorous Methods to Ensure Reliability of Publicly Released Data The Medicare Modernization Act of 2003 directed that hospitals lose 0.4 percent of their Medicare payment update if they do not submit clinical data for both Medicare and non-Medicare patients needed to calculate hospital performance on 10 quality measures. The Centers for Medicare & Medicaid Services (CMS) instituted the Annual Payment Update (APU) program to collect these data from hospitals and report their rates on the measures on its Hospital Compare Web site. For hospital quality data to be useful to patients and other users, they need to be reliable, that is, accurate and complete. GAO was asked to (1) describe the processes CMS uses to ensure the accuracy and completeness of data submitted for the APU program, (2) analyze the results of CMS's audit of the accuracy of data from the program's first two calendar quarters, and (3) describe processes used by seven other organizations that assess the accuracy and completeness of clinical performance data. What GAO Recommends GAO recommends that CMS take steps to improve its processes for ensuring the accuracy and completeness of hospital quality data. In commenting on a draft of this report, CMS agreed to implement steps to improve the quality and completeness of the data. CMS has contracted with an independent medical auditing firm to assess the accuracy of the APU program data submitted by hospitals, but has no ongoing process in place to assess the completeness of those data. CMS's independent audit checks accuracy by comparing the quality data submitted by hospitals from the medical records for a sample of five patients per calendar quarter for each hospital to the quality data that the contractor has reabstracted from the same records. The data are deemed to be accurate if there is 80 percent or greater agreement between these two sets of results. CMS has established no ongoing process to check data completeness. For the payment updates for fiscal years 2005 and 2006, CMS compared the number of cases submitted by a hospital to the number of Medicare claims that hospital submitted. However, these analyses did not address non-Medicare patient records, and the approach that CMS took in these analyses was not capable of detecting incomplete data for all hospitals. Although GAO found a high overall baseline level of accuracy when it examined CMS's assessment of the data submitted for the first two quarters of the APU program, the results are statistically uncertain for up to one-third of hospitals, and a baseline level of data completeness cannot be determined. The median accuracy score of 90 to 94 percent-depending on the calendar quarter and measures used-was well above the 80 percent accuracy threshold set by CMS, and about 90 percent of hospitals met or exceeded that threshold for both the first and the second calendar quarters of 2004. However, for approximately one-fourth to one-third of all the hospitals that CMS assessed for accuracy, the statistical margin of error for their accuracy score included both passing and failing accuracy levels. Consequently, for these hospitals, the small number of cases that CMS examined was not sufficient to establish with statistical certainty whether they met the accuracy threshold set by CMS. With respect to completeness of data, CMS did not assess the extent to which all hospitals submitted data on all eligible patients, or a representative sample thereof, for the two baseline quarters. As a result, there were no data from which to derive an assessment of the baseline level of completeness of the quality data that hospitals submitted for the APU program. Other reporting systems that collect clinical performance data have adopted a range of activities to ensure data accuracy and completeness, which include some methods employed by all, such as checking the data electronically to identify missing data. Officials from some of the other reporting systems and an expert in the field stressed the importance of including an independent audit in the methods used by organizations to check data accuracy and completeness. Most of the other reporting systems incorporate three methods into their process that CMS does not use in its independent audit. Specifically, most include an on-site visit in their independent audit, focus their audits on a selected number of facilities, and review a minimum of 50 patient medical records during the audit. *** End of document. ***