Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

biostatistics cheat sheet, Cheat Sheet of Biostatistics

Complete and schematic biostatistics cheat sheet

Typology: Cheat Sheet

2018/2019
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 09/02/2019

ekambar
ekambar 🇺🇸

4.7

(23)

265 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Populationentirecollectionofobjectsor
individualsaboutwhichinformationisdesired.
easiertotakeasample
Samplepartofthepopulation
thatisselectedforanalysis
Watchoutfor:
Limitedsamplesizethat
mightnotbe
representativeof
population
SimpleRandomSampling
Everypossiblesampleofacertain
sizehasthesamechanceofbeing
selected
ObservationalStudytherecanalwaysbe
lurkingvariablesaffectingresults
i.e,strongpositiveassociationbetween
shoesizeandintelligenceforboys
**shouldnevershowcausation
ExperimentalStudylurkingvariablescanbe
controlled;cangivegoodevidenceforcausation
DescriptiveStatisticsPartI
SummaryMeasures
Meanarithmeticaverageofdata
values
**Highlysusceptibleto
extremevalues(outliers).
Goestowardsextremevalues
Meancouldneverbelargeror
smallerthanmax/minvaluebut
couldbethemax/minvalue
Medianinanorderedarray,the
medianisthemiddlenumber
**Notaffectedbyextreme
values
Quartilessplittherankeddatainto4
equalgroups
BoxandWhiskerPlot
Range=Xmaximum Xminimum
Disadvantages:Ignoresthe
wayinwhichdataare
distributed;sensitivetooutliers

InterquartileRange(IQR)=3rd
quartile1stquartile
Notusedthatmuch
Notaffectedbyoutliers
Variancetheaveragedistance
squared
 sx
2=n1
(x x)
n
i=1
i
2
getsridofthenegativesx
2
values
unitsaresquared
StandardDeviationshowsvariation
aboutthemean
 s=
n1
(x x)
n
i=1
i
2
highlyaffectedbyoutliers
hassameunitsasoriginal
data
finance=horriblemeasureof
risk(trampolineexample)
DescriptiveStatisticsPartII
LinearTransformations
Lineartransformationschangethe
centerandspreadofdata
ar(a X )V ar(X)V+b=b2
Average(a+bX)=a+b[Average(X)]
biostatistics cheat sheet
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
Discount

On special offer

Partial preview of the text

Download biostatistics cheat sheet and more Cheat Sheet Biostatistics in PDF only on Docsity!

Population entire collection of objects or individuals about which information is desired. ➔ easier to take a sample ◆ Sample part of the population that is selected for analysis ◆ Watch out for: ● Limited sample size that might not be representative of population ◆ Simple Random Sampling Every possible sample of a certain size has the same chance of being selected

Observational Study there can always be lurking variables affecting resultsi.e, strong positive association between shoe size and intelligence for boys_should never show causation_**

Experimental Study lurking variables can be controlled; can give good evidence for causation

Descriptive Statistics Part ISummary Measures

Mean arithmetic average of data values *◆ *** _Highly susceptible to extreme values (outliers). Goes towards extreme values_Mean could never be larger or smaller than max/min value but could be the max/min value

Median in an ordered array, the median is the middle number _Not affected by extreme values_**

Quartiles split the ranked data into 4 equal groups ◆ Box and Whisker Plot

➔ Range = Xmaximum Xminimum

◆ Disadvantages: Ignores the way in which data are distributed; sensitive to outliers

Interquartile Range (IQR) = 3 rd quartile 1 st quartile ◆ Not used that much ◆ Not affected by outliers

Variance the average distance squared

sx^2 = (^) n 1

∑ ( x x )

n i = 1 i^

2

◆ sx^2 gets rid of the negative

values ◆ units are squared

Standard Deviation shows variation about the mean

s = √ n 1

∑ ( x x )

n i = 1 i

2

◆ highly affected by outliers ◆ has same units as original data ◆ finance = horrible measure of risk (trampoline example)

Descriptive Statistics Part II Linear Transformations

➔ Linear transformations change the center and spread of data

➔ V ar ( a + bX ) = b^2 V ar ( X )

Average(a+bX) = a+b[Average(X)]

Effects of Linear Transformations: ◆ meannew = a + bmean ◆* mediannew = a + bmedian ◆* stdevnew = (^) | b | *stdev ◆ IQRnew = (^) | b | *IQRZ score new data set will have mean 0 and variance 1 z =^ X^ SX

Empirical RuleOnly for mound shaped data Approx. 95 % of data is in the interval: ( x 2 s (^) x , x + 2 s (^) x ) = x + (^) / 2 s (^) xonly use if you just have mean and std. dev.

Chebyshev's RuleUse for any set of data and for any number k, greater than 1 ( 1. 2 , 1. 3 , etc.)

k^2 ➔ (Ex) for k= 2 ( 2 standard deviations), 75 % of data falls within 2 standard deviations

Detecting OutliersClassic Outlier Detectiondoesn't always work ◆ | z | = | |^ X^ SX^ | | ≥ 2 ➔ The Boxplot RuleValue X is an outlier if: X<Q 1 1. 5 (Q 3 Q 1 ) or X>Q 3 + 1. 5 (Q 3 Q 1 )

Skewnessmeasures the degree of asymmetry exhibited by datanegative values= skewed leftpositive values= skewed rightif (^) | s kewness | < 0. 8 = don't need to transform data

Measurements of AssociationCovariance ◆ Covariance > 0 = larger x, larger y ◆ Covariance < 0 = larger x, smaller y

◆ s xy = n^11 ∑( x )( y )

n

i = 1

x y

◆ Units = Units of x Units of y ◆ Covariance is only +, , or 0 (can be any number)

Correlation measures strength of a linear relationship between two variables

rxy =

covariancexy ( std. dev. (^) x ) ( std. dev. (^) y ) ◆ correlation is between 1 and 1Sign: direction of relationshipAbsolute value: strength of relationship ( 0. 6 is stronger relationship than + 0. 4 )

Correlation doesn't imply causationThe correlation of a variable with itself is one

Combining Data SetsMean (Z) = Z = aX + bYVar (Z) = sz^2 = a^2 V^ ar ( X ) + b^2 V^ ar ( Y )+ 2 a bCov ( X , Y )

PortfoliosReturn on a portfolio:

Rp = wA R A + wB RB

weights add up to 1return = mean ◆ risk = std. deviation

Variance of return of portfolio

s^2 p = w^2 A s^2 A + w^2 Bs^2 B + 2 w Aw B ( s A , B )

Risk(variance) is reduced when stocks are negatively correlated. (when there's a negative covariance)

Probabilitymeasure of uncertaintyall outcomes have to be exhaustive (all options possible) and mutually exhaustive (no 2 outcomes can occur at the same time)

Combining Random Variables ◆ If X and Y are independent:

E ( X + Y ) = E ( X ) + E ( Y )

V ar ( X + Y ) = V ar ( X ) + Var ( Y )

◆ If X and Y are dependent: E ( X + Y ) = E ( X ) + E ( Y ) V ar ( X + Y ) = V ar ( X ) + V ar ( Y ) + 2 C ov ( X , Y )

Covariance: C ov ( X , Y ) = E ( XY ) E ( X ) E ( Y ) ➔ If X and Y are independent, Cov(X,Y) = 0

Binomial Distributiondoing something n timesonly 2 outcomes: success or failuretrials are independent of each otherprobability remains constant

1 .) All Failures P ( all f ailures ) = ( 1 p ) n

2 .) All Successes P ( all successes )= pn 3 .) At least one success P ( at least 1 success ) = 1 ( 1 p ) n 4 .) At least one failure P ( at least 1 f ailure ) = 1 pn 5 .) Binomial Distribution Formula for x=exact value

6 .) Mean (Expectation)

μ = E ( x ) = np

7 .) Variance and Standard Dev.

σ^2 = n pq

σ = √ npq

q = 1 p

Binomial Example

Continuous Probability Distributionsthe probability that a continuous random variable X will assume any particular value is 0Density Curves ◆ Area under the curve is the probability that any range of values will occur. Total area = 1

Uniform Distribution

X ~ U nif ( a , b )

Uniform Example

(Example cont'd next page)

Mean for uniform distribution:

E ( X )= 2

( a + b )

Variance for unif. distribution:

V ar ( X )= 12

( b a )^2

Normal Distributiongoverned by 2 parameters: μ (the mean) and σ (the standard deviation)

➔ X^ ~^ N^ (μ,^ σ^2 )

Standardize Normal Distribution:

Z = (^) σ

X μ

Z score is the number of standard deviations the related X is from its mean ➔ ****Z< some value, will just be the probability found on table** ➔ ****Z> some value, will be ( 1 probability) found on table**

Normal Distribution Example

Sums of Normals

Sums of Normals Example:

Cov(X,Y) = 0 b/c they're independent

Central Limit Theoremas n increases,x should get closer to μ (population mean)mean( x ) = μ ➔ variance (^) ( x ) = σ^2 / nX ~ N (μ, σ n )

2

if population is normally distributed, n can be any valueany population, n needs to be (^) ≥ 30

Z =

X μ σ/√ n

Confidence Intervals = tells us how good our estimate is **Want high confidence, narrow interval **As confidence increases , interval also increases

A. One Sample Proportion

➔ p

︿

= x n = sample size

number of successes in sample

We are thus 95 % confident that the true population proportion is in the interval…We are assuming that n is large, n^ ︿ p> 5 and our sample size is less than 10 % of the population size.

**One Sample Hypothesis Tests

  1. Confidence Interval (can be used only for two sided tests)**

2. Test Statistic Approach (Population Mean) (^3). Test Statistic Approach (Population Proportion)

4. P Valuesa number between 0 and 1the larger the p value, the more consistent the data is with the nullthe smaller the p value, the more consistent the data is with the alternative** If P is low (less than 0. 05 ), H 0 must go reject the null hypothesis

**Two Sample Hypothesis Tests

  1. Comparing Two Proportions (Independent Groups)** ➔ Calculate Confidence Interval

Test Statistic for Two Proportions 2. Comparing Two Means (large independent samples n> 30 )

Calculating Confidence Interval

Test Statistic for Two Means

Matched PairsTwo samples are DEPENDENT Example:

Assumptions of Simple Linear Regression 1. We model the AVERAGE of something rather than something itself

2.

◆ As ε (noise) gets bigger, it’s harder to find the line

Estimating Se

S

2 e =^ n 2

SSE

Se^2 is our estimate of σ^2 ➔ Se = (^) √ Se^2 is our estimate of σ ➔ 95 % of the Y values should lie within

the interval b 0 + b 1 X^ +^1. 96 Se

Example of Prediction Intervals:

Standard Errors for b 1 a nd b 0 ➔ standard errors when noisesb 0 amount of uncertainty in our estimate of β 0 (small s good, large s bad)sb 1 amount of uncertainty in our estimate of β 1

Confidence Intervals for b 1 and b 0

n small → bad se big → bad s^2 x small→ bad (wants x’s spread out for better guess)

*Regression Hypothesis Testing always a two sided testwant to test whether slope ( β 1 ) is needed in our modelH 0 : β 1 = 0 (don’t need x) Ha : (^) β 1 =/ 0 (need x)Need X in the model if: a. 0 isn’t in the confidence interval b. t > 1. 96 c. P value < 0. 05

Test Statistic for Slope/Y interceptcan only be used if n> 30if n < 30 , use p values

Multiple Regression

➔ ➔ Variable Importance:higher t value, lower p value = variable is more importantlower t value, higher p value = variable is less important (or not needed)

Adjusted R squaredk = # of X’s

Adj. R squared will as you add junk x variablesAdj. R squared will only if the x you add in is very useful ➔ ****want Adj. R squared to go up and Se low for better model**

The Overall F Test

Always want to reject F test (reject null hypothesis)Look at p value (if < 0. 05 , reject null)H 0 : β 1 = β 2 = β 3. ..= β k = 0 (don’t need any X’s) Ha : (^) β 1 = β 2 = β 3 ... = β k =/ 0 (need at least 1 X)If no x variables needed, then SSR= 0 and SST=SSE

Modeling Regression Backward Stepwise Regression

  1. Start will all variables in the model
  2. at each step, delete the least important variable based on largest p value above
  3. 05
  4. stop when you can’t delete anymore ➔ Will see Adj. R squared and Se

Dummy VariablesAn indicator variable that takes on a value of 0 or 1 , allow intercepts to change

Interaction Termsallow the slopes to changeinteraction between 2 or more x variables that will affect the Y variable

How to Create Dummy Variables (Nominal Variables)If C is the number of categories, create (C 1 ) dummy variables for describing the variableOne category is always the “baseline”, which is included in the intercept

Recoding Dummy Variables Example: How many hockey sticks sold in the summer (original equation) h o ckey = 10 0 + 10 W tr 20 Spr + 30 F all Write equation for how many hockey sticks sold in the winter h o ckey = 11 0 + 20 F all 30 Spri 10 Summer ➔ ****always need to get same exact values from the original equation**