Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth, Study notes of Biostatistics

The estimation of covariance structures using toeplitz and unstructured models in sas. The toeplitz model assumes constant variance within years, while the unstructured model estimates all variances and covariances. The document also covers the use of likelihood ratio tests (lrt) and akaike and schwarz's bayesian criteria to determine the preferred model. Examples of simulations with missing data and their impact on model fit.

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-7im-1
koofers-user-7im-1 🇺🇸

3

(2)

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
The Effect of Missing Data on
The Effect of Missing Data on
Repeated Measures Models
Repeated Measures Models
Maribeth Johnson
Maribeth Johnson
Medical College of Georgia, Augusta, GA
Medical College of Georgia, Augusta, GA
Longitudinal studies problems
Longitudinal studies problems
Subjects don
Subjects don
t return for every follow
t return for every follow-
-up visit, always
up visit, always
some amount of missing data
some amount of missing data
PROC MIXED enables examination of
PROC MIXED enables examination of correlational
correlational
structures and variability changes between repeated
structures and variability changes between repeated
measurements on experimental units across time
measurements on experimental units across time
MIXED handles unbalanced data when the data are
MIXED handles unbalanced data when the data are
missing at random
missing at random
When does the degree of sparseness jeopardize
When does the degree of sparseness jeopardize
inferences and estimates?
inferences and estimates?
Simulation is a tool that can be used to answer these
Simulation is a tool that can be used to answer these
types of questions
types of questions
Motivation
Motivation
Ongoing longitudinal study at MCG of children
Ongoing longitudinal study at MCG of children
from families with a history of hypertension
from families with a history of hypertension
Ambulatory SBP
Ambulatory SBP
every 20 minutes from 6am to 10pm
every 20 minutes from 6am to 10pm
every 30 minutes during the night
every 30 minutes during the night
Missing data due to
Missing data due to
lack of consent
lack of consent
technical problems
technical problems
Unbalanced data structure
Unbalanced data structure
Y_1 Y_2 Y_3 Y_4 Frequency Percent
1 2 . . 29 31.5
1 . 3 . 24 26.1
1 . . 4 14 15.2
1 2 3 . 13 14.1
1 2 . 4 6 6.5
1 . 3 4 4 4.3
1 2 3 4 2 2.2
92 children had at least 2 of 4 measurements
The dataset is only 57% complete
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth and more Study notes Biostatistics in PDF only on Docsity!

The Effect of Missing Data on The Effect of Missing Data on

Repeated Measures ModelsRepeated Measures Models

Maribeth JohnsonMaribeth Johnson

Medical College of Georgia, Augusta, GAMedical College of Georgia, Augusta, GA

Longitudinal studies problemsLongitudinal studies problems

„„ Subjects donSubjects don’’t return for every followt return for every follow--up visit, alwaysup visit, always

some amount of missing datasome amount of missing data

„„ PROC MIXED enables examination ofPROC MIXED enables examination of correlationalcorrelational

structures and variability changes between repeatedstructures and variability changes between repeated

measurements on experimental units across timemeasurements on experimental units across time

„„ MIXED handles unbalanced data when the data areMIXED handles unbalanced data when the data are

missing at randommissing at random

„„ When does the degree of sparseness jeopardizeWhen does the degree of sparseness jeopardize

inferences and estimates?inferences and estimates?

„„^ Simulation is a tool that can be used to answer theseSimulation is a tool that can be used to answer these

types of questionstypes of questions

MotivationMotivation

„ „ Ongoing longitudinal study at MCG of childrenOngoing longitudinal study at MCG of children

from families with a history of hypertensionfrom families with a history of hypertension

„ „ Ambulatory SBPAmbulatory SBP

„„ every 20 minutes from 6am to 10pmevery 20 minutes from 6am to 10pm

„„^ every 30 minutes during the nightevery 30 minutes during the night

„ „ Missing data due toMissing data due to

„„ lack of consentlack of consent

„„ technical problemstechnical problems

Unbalanced data structure Unbalanced data structure

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

    1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
  1. 3 4 4 4. 1 2 3 4 2 2.

ƒ92 children had at least 2 of 4 measurements

ƒThe dataset is only 57% complete

Motivation (cont.)Motivation (cont.)

„ „ Confusing results when MIXED was applied toConfusing results when MIXED was applied to

this data which might be due to the small samplethis data which might be due to the small sample

size and/or the sparseness of the datasize and/or the sparseness of the data

„ „ Simulate and compare a complete dataset andSimulate and compare a complete dataset and

one with the actual pattern of missing dataone with the actual pattern of missing data

„ „ Also use simulation to investigate sample sizesAlso use simulation to investigate sample sizes

needed to make correct determinations ofneeded to make correct determinations of

underlying V-underlying V-C structure when data are missingC structure when data are missing

Objectives Objectives

„„ Simulate a set of correlated dataSimulate a set of correlated data

„„ Analyze the data and select the preferred modelAnalyze the data and select the preferred model

„„ Systematically delete observations in a variety ofSystematically delete observations in a variety of

patternspatterns

„„ Analyze the data and select the preferred modelAnalyze the data and select the preferred model

„„ Compare resultsCompare results

Simulation Simulation

„ „ 4 variables with multivariate normal distribution4 variables with multivariate normal distribution

„„ MeanMean±±SDSD 110 110±±10 mmHg for each year10 mmHg for each year
„„ CorrelationsCorrelations measurements separated by 1 year r=0.70, bymeasurements separated by 1 year r=0.70, by
2 years r=0.60, and by 3 years r=0.482 years r=0.60, and by 3 years r=0.

„ „ Thus, the samples to be generated are of the formThus, the samples to be generated are of the form

y
y
y N
y
y
⎡ ⎤ ⎡^ ⎛ ⎞ ⎛ ⎞⎤
⎢ ⎥ ⎢^ ⎜ ⎟ ⎜ ⎟⎥
= ⎢^ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥

Simulation Simulation

„„ For vectorsFor vectors xx andand yy such that,such that,

„ „ xx ~ N~ N [[00 ,, II ] and] and

„ „ yy == BB xx ++ bb wherewhere BB andand bb are constants, thenare constants, then

„ „ yy ~~ NN [[ BB μμxx ++ bb ,, BBΣΣxx BB ′′] ~] ~ NN [[ bb ,, BBBB ′′]]

„ One method of obtaining B is from a Cholesky

decomposition of the desired V-C matrix BB ′

Simulation Simulation

„ Each run of the SAS macro simulates 1000

groups of some number of subjects (n)

„ Random number seed is reproducible but

changes for each subject within each

simulation

Simulated DataSimulated Data

OBS I X1 X2 X3 X4 SBP1 SBP2 SBP3 SBP 1 1 0.31638 -0.67424 -0.64694 -0.01246 113.164 107.400 104.743 106. 2 2 0.74053 1.69465 1.83787 -0.76201 117.405 127.286 133.904 121. 3 3 0.03693 0.57211 0.44906 1.19068 110.369 114.344 115.596 122. 4 4 0.07798 1.79719 -0.60319 0.75104 110.780 123.380 113.308 119. 5 5 0.47511 -0.28478 -0.14342 -1.41015 114.751 111.292 110.734 100. 6 6 0.21429 -0.90276 -1.29990 -1.23686 112.143 105.053 98.682 94. 7 7 0.21891 0.83217 0.62192 -0.97202 112.189 117.475 118.913 109. 8 8 -0.30969 0.02179 -1.33388 0.92109 106.903 107.988 98.926 109. 9 9 -0.57588 0.17040 0.60636 -1.12974 104.241 107.186 111.441 102. 10 10 0.00854 -0.71012 0.39754 1.68386 110.085 104.988 110.039 120. 11 11 -0.47016 -0.85353 -0.45538 -0.30350 105.298 100.613 100.657 100. 12 12 -0.08329 0.40870 0.82735 -0.28795 109.167 112.336 116.872 112. 13 13 0.00831 -0.37762 -0.14332 -0.48013 110.083 107.361 107.570 104. 14 14 -0.51914 1.02922 0.82259 0.16446 104.809 113.716 116.657 115. 15 15 1.37881 2.02392 -1.34273 0.41042 123.788 134.105 116.845 121. CCCC

AnalysisAnalysis

„ „ Results from the 1000 simulations are analyzedResults from the 1000 simulations are analyzed

using PROC MIXEDusing PROC MIXED

„ „ Use the REPEATED statement to modelUse the REPEATED statement to model R,R, thethe

V-V-C matrix of the vector of errorsC matrix of the vector of errors

„ „ Separate analyses with models using threeSeparate analyses with models using three

different V-different V-C matricesC matrices

„„ Compound symmetryCompound symmetry –– CSCS

„„ ToeplitzToeplitz –– TOEPTOEP

„„ UnstructuredUnstructured -- UNUN

Covariance StructuresCovariance Structures

„„ Compound symmetric (CS):Compound symmetric (CS):

„ „ Most specific structureMost specific structure

„ „ Variance within years is constantVariance within years is constant

„ „ Common covariance between yearsCommon covariance between years

„ „ Two parameters estimatedTwo parameters estimated

2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1

Covariance StructuresCovariance Structures

„ „ Toeplitz (TOEP):Toeplitz(TOEP):

„„ Variance within years is constantVariance within years is constant

„„ Estimates the 3Estimates the 3 covariancescovariances of the banded structureof the banded structure

„„ Four parameters estimatedFour parameters estimated

„„^ The structure that was simulatedThe structure that was simulated

2 1 2 3 2 1 1 2 2 2 1 1 2 3 2 1

σ σ σ σ

σ σ σ σ

σ σ σ σ

σ σ σ σ

Covariance StructuresCovariance Structures

„„ Unstructured (UN):Unstructured (UN):

„ „ Estimates of all four variances and sixEstimates of all four variances and six covariancescovariances

„ „ All of the correlations between years may beAll of the correlations between years may be

differentdifferent

„ „ Most general structureMost general structure

2 11 21 31 41 2 21 22 32 42 2 31 32 33 43 2 41 42 43 44

σ σ σ σ

σ σ σ σ

σ σ σ σ

σ σ σ σ

⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎣ ⎥⎦

AnalysisAnalysis

procproc mixed data=mixed data=alltallt; class; class year;year; modelmodel^ sbpsbp=year;=year; repeated /type=repeated /type=covcov--structurestructure sub=i;sub=i; make 'fitting'make 'fitting' out=out=ftun&jftun&j;; makemake ‘‘CovParmsCovParms’’ out=out=cvun&jcvun&j;;

„ „ REPEATED statement models the covariance structures inREPEATED statement models the covariance structures in RR ,,
the error variance-the error variance-covariance matrixcovariance matrix
„ „ TYPE= option is what determines the V-TYPE= option is what determines the V-C structureC structure
„ „ SUB=I option block diagonalizesSUB=I option blockdiagonalizes RR since subjects are consideredsince subjects are considered
independentindependent

AnalysisAnalysis

procproc mixed data=alltmixed data=allt;; classclass year;year; modelmodel sbpsbp=year;=year; repeated /type=repeated /type=cov-cov-structurestructure (^) sub=i;sub=i; make 'fitting'make 'fitting' out=ftun&jout=ftun&j;; makemake ‘‘CovParmsCovParms’’ out=cvun&jout=cvun&j;;

„„ Model fit information from 3 analyses are merged together toModel fit information from 3 analyses are merged together to
make model comparisonsmake model comparisons
„„ Information from all 1000 simulations are appended to one fileInformation from all 1000 simulations are appended to one file

Create Missing ObservationsCreate Missing Observations

„ „ Systematic deletion of observations stillSystematic deletion of observations still

produces a random sampleproduces a random sample

„ „ Specific patterns are possibleSpecific patterns are possible

Results: Simulation of specific patternResults: Simulation of specific pattern

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

    1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
  1. 3 4 4 4. 1 2 3 4 2 2.

ƒ92 children had at least 2 of 4 measurements

ƒThe dataset is only 57% complete

Results: Simulation of specific pattern Results: Simulation of specific pattern

1000 simulations - n= Tests of preferred models (%)

Data Structure Balanced Specified deletions (N=676) LRT CS 3.6 62. TOEP 92.2 34. UN 4.2 3. AIC CS 1.1 43. TOEP 93.0 49. UN 5.9 7. BIC CS 18.4 86. TOEP 81.6 13. UN 0.0 0.

Results: Simulation of specific patternResults: Simulation of specific pattern

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

    1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
  1. 3 4 4 4. 1 2 3 4 2 2.

ƒ92 children had at least 2 of 4 measurements

ƒThe dataset is only 57% complete

ƒ73% of the subjects have only 2 measurements

Optimal Sample Size DeterminationOptimal Sample Size Determination

„ „ Three levels of missing dataThree levels of missing data

„„ 10% deletion10% deletion

„„ 20% deletion20% deletion

„„ 25% deletion25% deletion

„ „ Three scenariosThree scenarios

„„ Even distributionEven distribution

„„ Clustered distributionClustered distribution

„„ Crop failureCrop failure
„„ Lost to followLost to follow--upup

Create Missing Observations Create Missing Observations

Even distribution scenarioEven distribution scenario

„ „ No observations were deleted from year 1No observations were deleted from year 1

„ „ Deletions evenly distributed across the last threeDeletions evenly distributed across the last three

yearsyears

„ „ No subject had more than one missing observationNo subject had more than one missing observation

Create Missing ObservationsCreate Missing Observations

Even distribution scenario exampleEven distribution scenario example

ƒ ƒ 600 observations for 150 subjects over 4 years600 observations for 150 subjects over 4 years

„„ 10%: 60 observations are deleted, 20 each from year10%: 60 observations are deleted, 20 each from year

2, 3, and 42, 3, and 4

„„ 20%: 120 observations are deleted, 40 each from20%: 120 observations are deleted, 40 each from

year 2, 3, and 4year 2, 3, and 4

„„ 25%: 150 observations are deleted, 50 each from25%: 150 observations are deleted, 50 each from

year 2, 3, and 4 (i.e., each subject is missing oneyear 2, 3, and 4 (i.e., each subject is missing one

observation)observation)

Effect of Clustered Missing DataEffect of Clustered Missing Data

„„ Crop failure scenario (CF2, CF3, CF4)Crop failure scenario (CF2, CF3, CF4)

„ „ CF occurs when all data are missing in the same yearCF occurs when all data are missing in the same year

„„^ Example: 600 observation on 150 subjectsExample: 600 observation on 150 subjects

„ „ 10%: all 60 records are deleted in year 2, or year 3, or10%: all 60 records are deleted in year 2, or year 3, or

year 4year 4

„ „ 20%: 120 of the 150 observations are deleted in year20%: 120 of the 150 observations are deleted in year

2, then year 3, and again for year 42, then year 3, and again for year 4

„ „ 25%: not possible25%: not possible

Initial Study Sample Size DeterminationInitial Study Sample Size Determination

1000 simulations - 10% deletion Tests of preferred models (%)

Sample size n=150 n= LRT CS 0.8 0. TOEP 93.9 94. UN 5.3 5. AIC CS 0.0 0. TOEP 93.5 93. UN 6.5 6. BIC CS 8.7 4. TOEP 91.3 96. UN 0.0 0.

Initial Study Sample Size Determination Initial Study Sample Size Determination

1000 simulations - 20% deletion Tests of preferred models (%)

Sample size n=150 n=185 n= LRT CS 3.3 1.2 0. TOEP 90.4 93.3 94. UN 6.3 5.5 5. AIC CS 1.1 0.3 0. TOEP 91.4 92.5 93. UN 7.5 7.2 6. BIC CS 21.7 12.8 4. TOEP 78.3 87.2 95. UN 0.0 0.0 0.

Initial Study Sample Size DeterminationInitial Study Sample Size Determination

1000 simulations - 25% deletion Tests of preferred models (%)

Sample size n=150 n=225 n= LRT CS 5.2 0.9 0. TOEP 89.9 93.7 94. UN 4.9 5.4 4. AIC CS 2.2 0.1 0. TOEP 91.0 92.9 93. UN 6.8 7.0 6. BIC CS 27.4 10.5 6. TOEP 72.6 89.5 93. UN 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

„„ To determine the effect of missing dataTo determine the effect of missing data

concentrated in certain years rather than evenlyconcentrated in certain years rather than evenly

distributeddistributed

„„ CF2, CF3, CF4 and LFU were simulated at theCF2, CF3, CF4 and LFU were simulated at the

optimal sample size determined in the prioroptimal sample size determined in the prior

sectionsection

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 10% deletion (n=185) 1000 simulations Tests of preferred models (%)

Missing data scenario CF2 CF3 CF4 LFU LRT CS 0.2 0.2 0.5 0. TOEP 94.0 94.7 93.9 94. UN 5.8 5.1 5.6 4. AIC CS 0.1 0.0 0.0 0. TOEP 93.1 93.4 93.3 93. UN 6.8 6.6 6.7 6. BIC CS 3.4 4.3 9.4 4. TOEP 96.6 95.7 90.6 95. UN 0.0 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 20% deletion (n=225) 1000 simulations Tests of preferred models (%)

Missing data scenario CF2 CF3 CF4 LFU LRT CS 0.0 0.0 4.6 1. TOEP 94.2 95.0 90.2 93. UN 5.8 5.0 5.2 5. AIC CS 0.0 0.0 1.2 0. TOEP 92.8 93.5 92.6 93. UN 7.2 6.5 6.2 6. BIC CS 2.6 2.4 27.6 9. TOEP 97.4 97.6 72.4 90. UN 0.0 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 25% deletion (n=250) 1000 simulations Tests of preferred models (%)

Missing data scenario LFU LRT CS 1. TOEP 93. UN 5. AIC CS 0. TOEP 93. UN 6. SBC CS 16. TOEP 83. UN 0.

VarianceVariance--Covariance ParametersCovariance Parameters

1 110 100 70 60 48

2 110 70 100 70 60 ~ , 3 110 60 70 100 70

4 110 48 60 70 100

y

y y N y

y

⎡ ⎤ ⎡ ⎛ ⎞ ⎛ ⎞⎤ ⎢ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ = ⎢^ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ ⎢ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ ⎢ ⎥ ⎢^ ⎜ ⎟ ⎜ ⎟⎥ ⎣ ⎦ ⎢⎣^ ⎝ ⎠ ⎝ ⎠⎥⎦