Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth, Study notes of Biostatistics

Medical College of Georgia (MCG)Biostatistics

Prof. Maribeth H. Johnson

The estimation of covariance structures using toeplitz and unstructured models in sas. The toeplitz model assumes constant variance within years, while the unstructured model estimates all variances and covariances. The document also covers the use of likelihood ratio tests (lrt) and akaike and schwarz's bayesian criteria to determine the preferred model. Examples of simulations with missing data and their impact on model fit.

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-7im-1 🇺🇸

3

(2)

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

1

The Effect of Missing Data on

Repeated Measures Models

Maribeth Johnson

Medical College of Georgia, Augusta, GA

Longitudinal studies problems



Subjects don

Subjects don’

’t return for every follow

t return for every follow-

-up visit, always

up visit, always

some amount of missing data



PROC MIXED enables examination of

PROC MIXED enables examination of correlational

correlational

structures and variability changes between repeated

measurements on experimental units across time



MIXED handles unbalanced data when the data are

MIXED handles unbalanced data when the data are

missing at random



When does the degree of sparseness jeopardize

When does the degree of sparseness jeopardize

inferences and estimates?



Simulation is a tool that can be used to answer these

Simulation is a tool that can be used to answer these

types of questions

Motivation



Ongoing longitudinal study at MCG of children

Ongoing longitudinal study at MCG of children

from families with a history of hypertension



Ambulatory SBP

Ambulatory SBP



every 20 minutes from 6am to 10pm

every 20 minutes from 6am to 10pm



every 30 minutes during the night

every 30 minutes during the night



Missing data due to

Missing data due to



lack of consent

lack of consent



technical problems

technical problems

Unbalanced data structure

Y_1 Y_2 Y_3 Y_4 Frequency Percent

1 2 . . 29 31.5

1 . 3 . 24 26.1

1 . . 4 14 15.2

1 2 3 . 13 14.1

1 2 . 4 6 6.5

1 . 3 4 4 4.3

1 2 3 4 2 2.2

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

Partial preview of the text

Download Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth and more Study notes Biostatistics in PDF only on Docsity!

The Effect of Missing Data on The Effect of Missing Data on

Repeated Measures ModelsRepeated Measures Models

Maribeth JohnsonMaribeth Johnson

Medical College of Georgia, Augusta, GAMedical College of Georgia, Augusta, GA

Longitudinal studies problemsLongitudinal studies problems

Subjects donSubjects don’’t return for every followt return for every follow--up visit, alwaysup visit, always

some amount of missing datasome amount of missing data

PROC MIXED enables examination ofPROC MIXED enables examination of correlationalcorrelational

structures and variability changes between repeatedstructures and variability changes between repeated

measurements on experimental units across timemeasurements on experimental units across time

MIXED handles unbalanced data when the data areMIXED handles unbalanced data when the data are

missing at randommissing at random

When does the degree of sparseness jeopardizeWhen does the degree of sparseness jeopardize

inferences and estimates?inferences and estimates?

^ Simulation is a tool that can be used to answer theseSimulation is a tool that can be used to answer these

types of questionstypes of questions

MotivationMotivation

Ongoing longitudinal study at MCG of childrenOngoing longitudinal study at MCG of children

from families with a history of hypertensionfrom families with a history of hypertension

Ambulatory SBPAmbulatory SBP

every 20 minutes from 6am to 10pmevery 20 minutes from 6am to 10pm

^ every 30 minutes during the nightevery 30 minutes during the night

Missing data due toMissing data due to

lack of consentlack of consent

technical problemstechnical problems

Unbalanced data structure Unbalanced data structure

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
3 4 4 4. 1 2 3 4 2 2.

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

Motivation (cont.)Motivation (cont.)

Confusing results when MIXED was applied toConfusing results when MIXED was applied to

this data which might be due to the small samplethis data which might be due to the small sample

size and/or the sparseness of the datasize and/or the sparseness of the data

Simulate and compare a complete dataset andSimulate and compare a complete dataset and

one with the actual pattern of missing dataone with the actual pattern of missing data

Also use simulation to investigate sample sizesAlso use simulation to investigate sample sizes

needed to make correct determinations ofneeded to make correct determinations of

underlying V-underlying V-C structure when data are missingC structure when data are missing

Objectives Objectives

Simulate a set of correlated dataSimulate a set of correlated data

Analyze the data and select the preferred modelAnalyze the data and select the preferred model

Systematically delete observations in a variety ofSystematically delete observations in a variety of

patternspatterns

Analyze the data and select the preferred modelAnalyze the data and select the preferred model

Compare resultsCompare results

Simulation Simulation

4 variables with multivariate normal distribution4 variables with multivariate normal distribution

MeanMean±±SDSD 110 110±±10 mmHg for each year10 mmHg for each year

CorrelationsCorrelations measurements separated by 1 year r=0.70, bymeasurements separated by 1 year r=0.70, by

2 years r=0.60, and by 3 years r=0.482 years r=0.60, and by 3 years r=0.

Thus, the samples to be generated are of the formThus, the samples to be generated are of the form

y

y N

y

⎡ ⎤ ⎡^ ⎛ ⎞ ⎛ ⎞⎤

⎢ ⎥ ⎢^ ⎜ ⎟ ⎜ ⎟⎥

= ⎢^ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥

Simulation Simulation

For vectorsFor vectors xx andand yy such that,such that,

xx ~ N~ N [[00 ,, II ] and] and

yy == BB xx ++ bb wherewhere BB andand bb are constants, thenare constants, then

yy ~~ NN [[ BB μμxx ++ bb ,, BBΣΣxx BB ′′] ~] ~ NN [[ bb ,, BBBB ′′]]

One method of obtaining B is from a Cholesky

decomposition of the desired V-C matrix BB ′

Simulation Simulation

Each run of the SAS macro simulates 1000

groups of some number of subjects (n)

Random number seed is reproducible but

changes for each subject within each

simulation

Simulated DataSimulated Data

OBS I X1 X2 X3 X4 SBP1 SBP2 SBP3 SBP 1 1 0.31638 -0.67424 -0.64694 -0.01246 113.164 107.400 104.743 106. 2 2 0.74053 1.69465 1.83787 -0.76201 117.405 127.286 133.904 121. 3 3 0.03693 0.57211 0.44906 1.19068 110.369 114.344 115.596 122. 4 4 0.07798 1.79719 -0.60319 0.75104 110.780 123.380 113.308 119. 5 5 0.47511 -0.28478 -0.14342 -1.41015 114.751 111.292 110.734 100. 6 6 0.21429 -0.90276 -1.29990 -1.23686 112.143 105.053 98.682 94. 7 7 0.21891 0.83217 0.62192 -0.97202 112.189 117.475 118.913 109. 8 8 -0.30969 0.02179 -1.33388 0.92109 106.903 107.988 98.926 109. 9 9 -0.57588 0.17040 0.60636 -1.12974 104.241 107.186 111.441 102. 10 10 0.00854 -0.71012 0.39754 1.68386 110.085 104.988 110.039 120. 11 11 -0.47016 -0.85353 -0.45538 -0.30350 105.298 100.613 100.657 100. 12 12 -0.08329 0.40870 0.82735 -0.28795 109.167 112.336 116.872 112. 13 13 0.00831 -0.37762 -0.14332 -0.48013 110.083 107.361 107.570 104. 14 14 -0.51914 1.02922 0.82259 0.16446 104.809 113.716 116.657 115. 15 15 1.37881 2.02392 -1.34273 0.41042 123.788 134.105 116.845 121. CCCC

AnalysisAnalysis

Results from the 1000 simulations are analyzedResults from the 1000 simulations are analyzed

using PROC MIXEDusing PROC MIXED

Use the REPEATED statement to modelUse the REPEATED statement to model R,R, thethe

V-V-C matrix of the vector of errorsC matrix of the vector of errors

Separate analyses with models using threeSeparate analyses with models using three

different V-different V-C matricesC matrices

Compound symmetryCompound symmetry –– CSCS

ToeplitzToeplitz –– TOEPTOEP

UnstructuredUnstructured -- UNUN

Covariance StructuresCovariance Structures

Compound symmetric (CS):Compound symmetric (CS):

Most specific structureMost specific structure

Variance within years is constantVariance within years is constant

Common covariance between yearsCommon covariance between years

Two parameters estimatedTwo parameters estimated

2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1

Covariance StructuresCovariance Structures

Toeplitz (TOEP):Toeplitz(TOEP):

Variance within years is constantVariance within years is constant

Estimates the 3Estimates the 3 covariancescovariances of the banded structureof the banded structure

Four parameters estimatedFour parameters estimated

^ The structure that was simulatedThe structure that was simulated

2 1 2 3 2 1 1 2 2 2 1 1 2 3 2 1

σ σ σ σ

Covariance StructuresCovariance Structures

Unstructured (UN):Unstructured (UN):

Estimates of all four variances and sixEstimates of all four variances and six covariancescovariances

All of the correlations between years may beAll of the correlations between years may be

differentdifferent

Most general structureMost general structure

2 11 21 31 41 2 21 22 32 42 2 31 32 33 43 2 41 42 43 44

σ σ σ σ

⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎣ ⎥⎦

AnalysisAnalysis

procproc mixed data=mixed data=alltallt; class; class year;year; modelmodel^ sbpsbp=year;=year; repeated /type=repeated /type=covcov--structurestructure sub=i;sub=i; make 'fitting'make 'fitting' out=out=ftun&jftun&j;; makemake ‘‘CovParmsCovParms’’ out=out=cvun&jcvun&j;;

REPEATED statement models the covariance structures inREPEATED statement models the covariance structures in RR ,,

the error variance-the error variance-covariance matrixcovariance matrix

TYPE= option is what determines the V-TYPE= option is what determines the V-C structureC structure

SUB=I option block diagonalizesSUB=I option blockdiagonalizes RR since subjects are consideredsince subjects are considered

independentindependent

AnalysisAnalysis

procproc mixed data=alltmixed data=allt;; classclass year;year; modelmodel sbpsbp=year;=year; repeated /type=repeated /type=cov-cov-structurestructure (^) sub=i;sub=i; make 'fitting'make 'fitting' out=ftun&jout=ftun&j;; makemake ‘‘CovParmsCovParms’’ out=cvun&jout=cvun&j;;

Model fit information from 3 analyses are merged together toModel fit information from 3 analyses are merged together to

make model comparisonsmake model comparisons

Information from all 1000 simulations are appended to one fileInformation from all 1000 simulations are appended to one file

Create Missing ObservationsCreate Missing Observations

Systematic deletion of observations stillSystematic deletion of observations still

produces a random sampleproduces a random sample

Specific patterns are possibleSpecific patterns are possible

Results: Simulation of specific patternResults: Simulation of specific pattern

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
3 4 4 4. 1 2 3 4 2 2.

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

Results: Simulation of specific pattern Results: Simulation of specific pattern

1000 simulations - n= Tests of preferred models (%)

Data Structure Balanced Specified deletions (N=676) LRT CS 3.6 62. TOEP 92.2 34. UN 4.2 3. AIC CS 1.1 43. TOEP 93.0 49. UN 5.9 7. BIC CS 18.4 86. TOEP 81.6 13. UN 0.0 0.

Results: Simulation of specific patternResults: Simulation of specific pattern

Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 2.. 29 31.

1. 24 26. 1.. 4 14 15. 1 2 3. 13 14. 1 2. 4 6 6.
3 4 4 4. 1 2 3 4 2 2.

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

73% of the subjects have only 2 measurements

Optimal Sample Size DeterminationOptimal Sample Size Determination

Three levels of missing dataThree levels of missing data

10% deletion10% deletion

20% deletion20% deletion

25% deletion25% deletion

Three scenariosThree scenarios

Even distributionEven distribution

Clustered distributionClustered distribution

Crop failureCrop failure

Lost to followLost to follow--upup

Create Missing Observations Create Missing Observations

Even distribution scenarioEven distribution scenario

No observations were deleted from year 1No observations were deleted from year 1

Deletions evenly distributed across the last threeDeletions evenly distributed across the last three

yearsyears

No subject had more than one missing observationNo subject had more than one missing observation

Create Missing ObservationsCreate Missing Observations

Even distribution scenario exampleEven distribution scenario example

600 observations for 150 subjects over 4 years600 observations for 150 subjects over 4 years

10%: 60 observations are deleted, 20 each from year10%: 60 observations are deleted, 20 each from year

2, 3, and 42, 3, and 4

20%: 120 observations are deleted, 40 each from20%: 120 observations are deleted, 40 each from

year 2, 3, and 4year 2, 3, and 4

25%: 150 observations are deleted, 50 each from25%: 150 observations are deleted, 50 each from

year 2, 3, and 4 (i.e., each subject is missing oneyear 2, 3, and 4 (i.e., each subject is missing one

observation)observation)

Effect of Clustered Missing DataEffect of Clustered Missing Data

Crop failure scenario (CF2, CF3, CF4)Crop failure scenario (CF2, CF3, CF4)

CF occurs when all data are missing in the same yearCF occurs when all data are missing in the same year

^ Example: 600 observation on 150 subjectsExample: 600 observation on 150 subjects

10%: all 60 records are deleted in year 2, or year 3, or10%: all 60 records are deleted in year 2, or year 3, or

year 4year 4

20%: 120 of the 150 observations are deleted in year20%: 120 of the 150 observations are deleted in year

2, then year 3, and again for year 42, then year 3, and again for year 4

25%: not possible25%: not possible

Initial Study Sample Size DeterminationInitial Study Sample Size Determination

1000 simulations - 10% deletion Tests of preferred models (%)

Sample size n=150 n= LRT CS 0.8 0. TOEP 93.9 94. UN 5.3 5. AIC CS 0.0 0. TOEP 93.5 93. UN 6.5 6. BIC CS 8.7 4. TOEP 91.3 96. UN 0.0 0.

Initial Study Sample Size Determination Initial Study Sample Size Determination

1000 simulations - 20% deletion Tests of preferred models (%)

Sample size n=150 n=185 n= LRT CS 3.3 1.2 0. TOEP 90.4 93.3 94. UN 6.3 5.5 5. AIC CS 1.1 0.3 0. TOEP 91.4 92.5 93. UN 7.5 7.2 6. BIC CS 21.7 12.8 4. TOEP 78.3 87.2 95. UN 0.0 0.0 0.

Initial Study Sample Size DeterminationInitial Study Sample Size Determination

1000 simulations - 25% deletion Tests of preferred models (%)

Sample size n=150 n=225 n= LRT CS 5.2 0.9 0. TOEP 89.9 93.7 94. UN 4.9 5.4 4. AIC CS 2.2 0.1 0. TOEP 91.0 92.9 93. UN 6.8 7.0 6. BIC CS 27.4 10.5 6. TOEP 72.6 89.5 93. UN 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

To determine the effect of missing dataTo determine the effect of missing data

concentrated in certain years rather than evenlyconcentrated in certain years rather than evenly

distributeddistributed

CF2, CF3, CF4 and LFU were simulated at theCF2, CF3, CF4 and LFU were simulated at the

optimal sample size determined in the prioroptimal sample size determined in the prior

sectionsection

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 10% deletion (n=185) 1000 simulations Tests of preferred models (%)

Missing data scenario CF2 CF3 CF4 LFU LRT CS 0.2 0.2 0.5 0. TOEP 94.0 94.7 93.9 94. UN 5.8 5.1 5.6 4. AIC CS 0.1 0.0 0.0 0. TOEP 93.1 93.4 93.3 93. UN 6.8 6.6 6.7 6. BIC CS 3.4 4.3 9.4 4. TOEP 96.6 95.7 90.6 95. UN 0.0 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 20% deletion (n=225) 1000 simulations Tests of preferred models (%)

Missing data scenario CF2 CF3 CF4 LFU LRT CS 0.0 0.0 4.6 1. TOEP 94.2 95.0 90.2 93. UN 5.8 5.0 5.2 5. AIC CS 0.0 0.0 1.2 0. TOEP 92.8 93.5 92.6 93. UN 7.2 6.5 6.2 6. BIC CS 2.6 2.4 27.6 9. TOEP 97.4 97.6 72.4 90. UN 0.0 0.0 0.0 0.

Clustered Missing Data EffectClustered Missing Data Effect

Optimal sample size for 25% deletion (n=250) 1000 simulations Tests of preferred models (%)

Missing data scenario LFU LRT CS 1. TOEP 93. UN 5. AIC CS 0. TOEP 93. UN 6. SBC CS 16. TOEP 83. UN 0.

VarianceVariance--Covariance ParametersCovariance Parameters

1 110 100 70 60 48

2 110 70 100 70 60 ~ , 3 110 60 70 100 70

4 110 48 60 70 100

y

y y N y

y

⎡ ⎤ ⎡ ⎛ ⎞ ⎛ ⎞⎤ ⎢ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ = ⎢^ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ ⎢ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥ ⎢ ⎥ ⎢^ ⎜ ⎟ ⎜ ⎟⎥ ⎣ ⎦ ⎢⎣^ ⎝ ⎠ ⎝ ⎠⎥⎦

Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth, Study notes of Biostatistics

Related documents

Partial preview of the text

Download Covariance Structures: Toeplitz, Unstructured and their Comparison in SAS - Prof. Maribeth and more Study notes Biostatistics in PDF only on Docsity!

The Effect of Missing Data on The Effect of Missing Data on

Repeated Measures ModelsRepeated Measures Models

Maribeth JohnsonMaribeth Johnson

Medical College of Georgia, Augusta, GAMedical College of Georgia, Augusta, GA

Longitudinal studies problemsLongitudinal studies problems

 Subjects donSubjects don’’t return for every followt return for every follow--up visit, alwaysup visit, always

some amount of missing datasome amount of missing data

 PROC MIXED enables examination ofPROC MIXED enables examination of correlationalcorrelational

structures and variability changes between repeatedstructures and variability changes between repeated

measurements on experimental units across timemeasurements on experimental units across time

 MIXED handles unbalanced data when the data areMIXED handles unbalanced data when the data are

missing at randommissing at random

 When does the degree of sparseness jeopardizeWhen does the degree of sparseness jeopardize

inferences and estimates?inferences and estimates?

^ Simulation is a tool that can be used to answer theseSimulation is a tool that can be used to answer these

types of questionstypes of questions

MotivationMotivation

  Ongoing longitudinal study at MCG of childrenOngoing longitudinal study at MCG of children

from families with a history of hypertensionfrom families with a history of hypertension

  Ambulatory SBPAmbulatory SBP

 every 20 minutes from 6am to 10pmevery 20 minutes from 6am to 10pm

^ every 30 minutes during the nightevery 30 minutes during the night

  Missing data due toMissing data due to

 lack of consentlack of consent

 technical problemstechnical problems

Unbalanced data structure Unbalanced data structure

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

Motivation (cont.)Motivation (cont.)

  Confusing results when MIXED was applied toConfusing results when MIXED was applied to

this data which might be due to the small samplethis data which might be due to the small sample

size and/or the sparseness of the datasize and/or the sparseness of the data

  Simulate and compare a complete dataset andSimulate and compare a complete dataset and

one with the actual pattern of missing dataone with the actual pattern of missing data

  Also use simulation to investigate sample sizesAlso use simulation to investigate sample sizes

needed to make correct determinations ofneeded to make correct determinations of

underlying V-underlying V-C structure when data are missingC structure when data are missing

Objectives Objectives

 Simulate a set of correlated dataSimulate a set of correlated data

 Analyze the data and select the preferred modelAnalyze the data and select the preferred model

 Systematically delete observations in a variety ofSystematically delete observations in a variety of

patternspatterns

 Analyze the data and select the preferred modelAnalyze the data and select the preferred model

 Compare resultsCompare results

Simulation Simulation

  4 variables with multivariate normal distribution4 variables with multivariate normal distribution

 MeanMean±±SDSD 110 110±±10 mmHg for each year10 mmHg for each year

 CorrelationsCorrelations measurements separated by 1 year r=0.70, bymeasurements separated by 1 year r=0.70, by

2 years r=0.60, and by 3 years r=0.482 years r=0.60, and by 3 years r=0.

  Thus, the samples to be generated are of the formThus, the samples to be generated are of the form

y

y

y N

y

y

⎡ ⎤ ⎡^ ⎛ ⎞ ⎛ ⎞⎤

⎢ ⎥ ⎢^ ⎜ ⎟ ⎜ ⎟⎥

= ⎢^ ⎥ ⎢^ ⎜^ ⎟ ⎜^ ⎟⎥

Simulation Simulation

 For vectorsFor vectors xx andand yy such that,such that,

  xx ~ N~ N [[00 ,, II ] and] and

  yy == BB xx ++ bb wherewhere BB andand bb are constants, thenare constants, then

  yy ~~ NN [[ BB μμxx ++ bb ,, BBΣΣxx BB ′′] ~] ~ NN [[ bb ,, BBBB ′′]]

 One method of obtaining B is from a Cholesky

decomposition of the desired V-C matrix BB ′

Simulation Simulation

 Each run of the SAS macro simulates 1000

groups of some number of subjects (n)

 Random number seed is reproducible but

changes for each subject within each

simulation

Simulated DataSimulated Data

AnalysisAnalysis

  Results from the 1000 simulations are analyzedResults from the 1000 simulations are analyzed

using PROC MIXEDusing PROC MIXED

  Use the REPEATED statement to modelUse the REPEATED statement to model R,R, thethe

Subjects donSubjects don’’t return for every followt return for every follow--up visit, alwaysup visit, always

PROC MIXED enables examination ofPROC MIXED enables examination of correlationalcorrelational

MIXED handles unbalanced data when the data areMIXED handles unbalanced data when the data are

When does the degree of sparseness jeopardizeWhen does the degree of sparseness jeopardize

^ Simulation is a tool that can be used to answer theseSimulation is a tool that can be used to answer these

Ongoing longitudinal study at MCG of childrenOngoing longitudinal study at MCG of children

Ambulatory SBPAmbulatory SBP

every 20 minutes from 6am to 10pmevery 20 minutes from 6am to 10pm

^ every 30 minutes during the nightevery 30 minutes during the night

Missing data due toMissing data due to

lack of consentlack of consent

technical problemstechnical problems

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

Confusing results when MIXED was applied toConfusing results when MIXED was applied to

Simulate and compare a complete dataset andSimulate and compare a complete dataset and

Also use simulation to investigate sample sizesAlso use simulation to investigate sample sizes

Simulate a set of correlated dataSimulate a set of correlated data

Analyze the data and select the preferred modelAnalyze the data and select the preferred model

Systematically delete observations in a variety ofSystematically delete observations in a variety of

Analyze the data and select the preferred modelAnalyze the data and select the preferred model

Compare resultsCompare results

4 variables with multivariate normal distribution4 variables with multivariate normal distribution

MeanMean±±SDSD 110 110±±10 mmHg for each year10 mmHg for each year

CorrelationsCorrelations measurements separated by 1 year r=0.70, bymeasurements separated by 1 year r=0.70, by

Thus, the samples to be generated are of the formThus, the samples to be generated are of the form

For vectorsFor vectors xx andand yy such that,such that,

xx ~ N~ N [[00 ,, II ] and] and

yy == BB xx ++ bb wherewhere BB andand bb are constants, thenare constants, then

yy ~~ NN [[ BB μμxx ++ bb ,, BBΣΣxx BB ′′] ~] ~ NN [[ bb ,, BBBB ′′]]

One method of obtaining B is from a Cholesky

Each run of the SAS macro simulates 1000

Random number seed is reproducible but

Results from the 1000 simulations are analyzedResults from the 1000 simulations are analyzed

Use the REPEATED statement to modelUse the REPEATED statement to model R,R, thethe

Separate analyses with models using threeSeparate analyses with models using three

Compound symmetryCompound symmetry –– CSCS

ToeplitzToeplitz –– TOEPTOEP

UnstructuredUnstructured -- UNUN

Compound symmetric (CS):Compound symmetric (CS):

Most specific structureMost specific structure

Variance within years is constantVariance within years is constant

Common covariance between yearsCommon covariance between years

Two parameters estimatedTwo parameters estimated

Toeplitz (TOEP):Toeplitz(TOEP):

Variance within years is constantVariance within years is constant

Estimates the 3Estimates the 3 covariancescovariances of the banded structureof the banded structure

Four parameters estimatedFour parameters estimated

^ The structure that was simulatedThe structure that was simulated

Unstructured (UN):Unstructured (UN):

Estimates of all four variances and sixEstimates of all four variances and six covariancescovariances

All of the correlations between years may beAll of the correlations between years may be

Most general structureMost general structure

REPEATED statement models the covariance structures inREPEATED statement models the covariance structures in RR ,,

TYPE= option is what determines the V-TYPE= option is what determines the V-C structureC structure

SUB=I option block diagonalizesSUB=I option blockdiagonalizes RR since subjects are consideredsince subjects are considered

Model fit information from 3 analyses are merged together toModel fit information from 3 analyses are merged together to

Information from all 1000 simulations are appended to one fileInformation from all 1000 simulations are appended to one file

Systematic deletion of observations stillSystematic deletion of observations still

Specific patterns are possibleSpecific patterns are possible

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

92 children had at least 2 of 4 measurements

The dataset is only 57% complete

73% of the subjects have only 2 measurements

Three levels of missing dataThree levels of missing data

10% deletion10% deletion

20% deletion20% deletion

25% deletion25% deletion

Three scenariosThree scenarios

Even distributionEven distribution

Clustered distributionClustered distribution

Crop failureCrop failure

Lost to followLost to follow--upup

No observations were deleted from year 1No observations were deleted from year 1

Deletions evenly distributed across the last threeDeletions evenly distributed across the last three

No subject had more than one missing observationNo subject had more than one missing observation

600 observations for 150 subjects over 4 years600 observations for 150 subjects over 4 years

10%: 60 observations are deleted, 20 each from year10%: 60 observations are deleted, 20 each from year

20%: 120 observations are deleted, 40 each from20%: 120 observations are deleted, 40 each from