Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Assessing the Reliability of Hospital Performance Measures: A Test-Retest Approach, Exams of Urology

The reliability of hospital performance measures using the intra-class correlation coefficient (icc) and different testing methods such as split-sample and bootstrapping. The analysis is based on medicare ffs datasets and focuses on risk-standardized hospital visit rates and risk-standardized readmission rates. The document also explains the importance of considering smaller volume hospitals and the impact of hierarchical logistic regression models on test-retest reliability.

Typology: Exams

2021/2022

Uploaded on 09/12/2022

maya090
maya090 🇺🇸

4.3

(23)

287 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Split-Half Reliability Method Examples
Example 1
We tested the reliability of the facility measure score by calculating the intra-class correlation coefficient
(ICC) of the measure score. To calculate the ICC, we used the Medicare FFS FY 2012-2015 Dataset. For
ASCs with two or more urology procedures, these procedures were then randomly split into the two
samples (2 years of combined data for each sample). The ICC evaluates the agreement between the risk-
standardized hospital visit rates (RSHVRs) calculated in the two randomly selected samples.
The ICC [2,1] score of 0.45, calculated for two years of data, indicates moderate measure score
reliability.
Example 2
We tested the reliability of the facility measure score by calculating the intra-class correlation coefficient
(ICC) of the measure score. To calculate the ICC, we used the Medicare FFS CYs 2012-2015 Dataset. For
ASCs with two or more general surgery procedures, these procedures were randomly split into the two
samples within each facility. The ASCs with one procedure were randomly split into the two samples.
The ICC evaluated the agreement between the risk-standardized hospital visit ratios (RSHVRs) calculated
in the two randomly selected samples [1].
The ICC [2,1] score of 0.530, calculated for four years of data, indicates moderate measure score
reliability.
Example 3
We defined reliability as described by Lord and Novick using split-sample methodology. (Lord FM,
Novick MR. Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley; 1968)
Using split-sample methodology, FTR had a split half sample correlation estimate of 0.32, with the
upper bound on validity (provided by the square root of the Spearman-Brown reliability correction)
being 0.56.
Example 4
The reliability of a measurement is the degree to which repeated measurements of the same entity
agree with each other. For measures of hospital performance, the measured entity is naturally the
hospital, and reliability is the extent to which repeated measurements of the same hospital give similar
results. In line with this thinking, our approach to assessing reliability is to consider the extent to which
assessments of a hospital using different but randomly selected subsets of patients produces similar
measures of hospital performance. That is, we take a "test-retest" approach in which hospital
performance is measured once using a random subset of patients, then measured again using a second
pf3
pf4

Partial preview of the text

Download Assessing the Reliability of Hospital Performance Measures: A Test-Retest Approach and more Exams Urology in PDF only on Docsity!

Split-Half Reliability Method Examples

Example 1

We tested the reliability of the facility measure score by calculating the intra-class correlation coefficient (ICC) of the measure score. To calculate the ICC, we used the Medicare FFS FY 2012-2015 Dataset. For ASCs with two or more urology procedures, these procedures were then randomly split into the two samples (2 years of combined data for each sample). The ICC evaluates the agreement between the risk- standardized hospital visit rates (RSHVRs) calculated in the two randomly selected samples.

The ICC [2,1] score of 0.45, calculated for two years of data, indicates moderate measure score reliability.

Example 2

We tested the reliability of the facility measure score by calculating the intra-class correlation coefficient (ICC) of the measure score. To calculate the ICC, we used the Medicare FFS CYs 2012-2015 Dataset. For ASCs with two or more general surgery procedures, these procedures were randomly split into the two samples within each facility. The ASCs with one procedure were randomly split into the two samples. The ICC evaluated the agreement between the risk-standardized hospital visit ratios (RSHVRs) calculated in the two randomly selected samples [1].

The ICC [2,1] score of 0.530, calculated for four years of data, indicates moderate measure score reliability.

Example 3

We defined reliability as described by Lord and Novick using split-sample methodology. (Lord FM, Novick MR. Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley; 1968)

Using split-sample methodology, FTR had a split half sample correlation estimate of 0.32, with the upper bound on validity (provided by the square root of the Spearman-Brown reliability correction) being 0.56.

Example 4

The reliability of a measurement is the degree to which repeated measurements of the same entity agree with each other. For measures of hospital performance, the measured entity is naturally the hospital, and reliability is the extent to which repeated measurements of the same hospital give similar results. In line with this thinking, our approach to assessing reliability is to consider the extent to which assessments of a hospital using different but randomly selected subsets of patients produces similar measures of hospital performance. That is, we take a "test-retest" approach in which hospital performance is measured once using a random subset of patients, then measured again using a second

random subset exclusive of the first, and finally comparing the agreement between the two resulting performance measures across hospitals (Rousson et al., 2002). For test-retest reliability, we combined index admissions from successive measurement periods into one dataset, randomly sampled half of patients within each hospital, calculated the measure for each hospital, and repeated the calculation using the second half. Thus, each hospital is measured twice, but each measurement is made using an entirely distinct set of patients. To the extent that the calculated measures of these two subsets agree, we have evidence that the measure is assessing an attribute of the hospital, not of the patients. As a metric of agreement we calculated the intra-class correlation coefficient (ICC) (Shrout and Fleiss, 1979), and assessed the values according to conventional standards (Landis and Koch, 1977). Specifically, we used dataset 1 split sample and calculated the RSRR for each hospital for each sample. The agreement of the two RSRRs was quantified for hospitals using the intra-class correlation as defined by ICC (2,1) by Shrout and Fleiss (1979).

Using two independent samples provides a stringent estimate of the measure’s reliability, compared with using two random but potentially overlapping samples which would exaggerate the agreement. Moreover, because our final measure is derived using hierarchical logistic regression, and a known property of hierarchical logistic regression models is that smaller volume hospitals contribute less ´signal´, a split sample using a single measurement period would introduce extra noise. This leads to an underestimate in the actual test-retest reliability that would be achieved if the measure were reported using the full measurement period, as evidenced by the Spearman Brown prophecy formula (Spearman 1910, Brown 1910), which estimates the reliability of the measure if the whole cohort were used, based on an estimate from half the cohort.

There were 991,007 admissions in the combined 3-year sample, with 494,297 in one sample and 496,710 in the other randomly selected sample. The agreement between the two RSMRs for each hospital was 0.55, which according to the conventional interpretation is “moderate” (Landis & Koch, 1977). Note that this analysis was limited to hospitals with 12 or more cases in each split sample. The intra-class correlation coefficient is based on a split sample of three years of data, resulting in a volume of patients in each sample equivalent to only 1.5 years of data, whereas the measure is reported with the full three years of data. The correlation coefficient is expected to be higher using the full three-year sample since it would include more patients.

Example 5

To test the reliability of facility-level risk-standardized readmission rates (RSRRs), we calculated the intra-class correlation coefficient (ICC) using a test-retest approach that examines the agreement between repeated measures of the same IPF for the same time period. The randomly sampled sets of admissions from a given hospital are assumed to reflect an independent set of re-measurement of readmission rates for the hospital. Good reliability is assumed if the risk-standardized measure rates calculated from the random datasets for the same IPF are similar. Higher ICC values indicate stronger agreement, and hence, better measure reliability.

We used two test-retest approaches to generate independent samples of patients within the same IPF: a split-half sampling design and bootstrapping. For split-half sampling, we randomly sampled half of all

subsets agree, we have evidence that the measure is assessing an attribute of the hospital, not of the patients. As a metric of agreement, we calculated the Pearson correlation between the performance rate estimates: the higher the correlation, the higher the reliability of the measure. In order to produce estimates that are as stable as possible, we repeated this approach 1,000 times, a technique known as bootstrapping [1].

Because we expect hospitals with relatively few cases to have less reliable estimates, we only included scores for hospitals with at least 60 patients in the reliability calculation (i.e., with 30 patients in each of the split samples). This approach is consistent with a reporting strategy that includes smaller hospitals in the measure calculation, but does not publicly release the measure score for smaller hospitals (i.e., labels them in public reporting as having “too few cases” to support a reliable estimate). We note that the minimum sample size for public reporting is a policy choice that balances competing considerations such as the reliability of the measure score and transparency for consumers, and that the cutoff used for this analysis is one of many that might be reasonably used.

In addition, we conducted a second analysis of measure reliability using the intraclass correlation coefficient (ICC) signal-to-noise method to determine a recommended minimum case count to maintain a moderate level of reliability. The ICC is estimated from the random effects model that produces the risk-standardized hospital visit rates, as ICC = V / (V + σ), where V is the between variance and σ is the sampling variance of the estimated provider level results. Because π 2/3 is the sampling variance of the logit distribution, the ICC of the measure, which is based on a logit model, is ICC = V / (V + π 2/3).

We used the intercept variance from the hierarchical logit models used to estimate the measure (0. for inpatient admission, and 0.1108 for ED visits) as the estimate of the between variance. The ICC can be used to calculate the reliability (R) of individual hospitals using the formula: R= N/(N + (1 - ICC)/ICC).[1] The case size required for a given R is: N = R(1 - ICC)/(ICC(1 - R)). We looked for the N required to maintain a reliability level of 0.4 or higher.

Citations

  1. Rousson V, Gasser T, Seifert B. Assessing intrarater, interrater and test–retest reliability of continuous measurements. Statistics in Medicine 2002; 21:3431-3446.

There were 942 hospitals with ≥60 patients in their cohorts in the full sample. This sample was randomly split 1,000 times and the Pearson correlation was calculated each time. For the inpatient admission measure, on average, the agreement between the two hospital visit rates for each hospital was 0. (95% confidence interval (CI) = 0.37-0.45), which according to conventional interpretation is “moderate.” For the ED visit measure, on average, the agreement between the two hospital visit rates for each hospital was 0.270 (95% confidence interval (CI) = 0.22-0.33), which according to conventional interpretation is “moderate.”

In addition, we found to achieve a reliability (ICC) of 0.4, we only require 25 patients for the inpatient admissions rate and 20 patients for the ED visit rate per performance period.