Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Basic Biostatistics formula sheet, Cheat Sheet of Biostatistics

Biostatistics formula sheet include sum of squares, mean, variance, deviation, median, range, upper and lower fence. From San Jose State university.

Typology: Cheat Sheet

2021/2022

Uploaded on 02/07/2022

thecoral
thecoral šŸ‡ŗšŸ‡ø

4.4

(29)

401 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Basic Biostatistics Formulas
Jane Pham & B. Burt Gerstman
C:\data\biostat-text\formulas1.doc Last printed 12/20/2007 4:39:00 PM Page 1 of 3
Exploratory and Summary Statistics (Chapters 3 & 4)
Statistic Parameter Point Estimate Formula InterprƩtation Notes
Sum of squares dfƗ
2
σ
SS āˆ‘
=
āˆ’= n
iixxSS
1
2
)(
No easy interpretation.
Mean μ
x
āˆ‘
=
=n
ii
x
n
x
1
1 A measure of central
location; balancing pt.
Variance σ2 s2 1
2
āˆ’
=n
SS
s A measure of spread
expressed in units squared
Standard
Deviation σ s 2
ss = or 1āˆ’n
SS A measure of spread
expressed in data units.
More appropriate for
descriptive purposes.
• Mean and standard deviation are best
suited to symmetrical distributions.
• When distribution is Normal, 68% of data
points lie within +1σ of µ, 95% within
+2σ of µ, and 99.7% lie within +3σ of µ
• For other distributions, use Chebychev’s
rule (e.g., at least 75% of data lie within
+2σ of µ).
Statistic Formula Interpretation 5-point Summary Notes of boxplot
Median
Median has depth of
2
1+n
A measure of central
location
Interquartile
Range
()
IQR 13 QQIQR āˆ’= A measure of spread,
aka ā€œhinge-spreadā€
Lower Fence
()
l
F
()
IQRQFl5.11āˆ’=
Helps determine:
Lower inside value
Lower outside value(s)
Upper Fence
()
u
F
()
IQRQFu5.13 +=
Helps determine:
Upper inside value
Upper outside value(s)
Q0 – Minimum
Q1 – First Quartile
Q2 – Median
Q3 – Third quartile
Q4 – Maximum
• Provide information about locations, spread, and
shape. The box contains middle 50% of data. Line
inside the box is the median.
• Anything above the upper fence or below the lower
fence is ā€œoutside.ā€ (Fences are not drawn.) Plot
outside values as separate points.
• The lower whisker is drawn from Q1 to the lower
inside value. The upper whisker is drawn from Q3
to the upper inside value.
pf3

Partial preview of the text

Download Basic Biostatistics formula sheet and more Cheat Sheet Biostatistics in PDF only on Docsity!

Basic Biostatistics

Formulas

Jane Pham & B. Burt Gerstman

C:\data\biostat-text\formulas1.doc Last printed 12/20/2007 4:39:00 PM

Page 1 of 3

Exploratory and Summary Statistics (Chapters 3 & 4) Statistic

Parameter

Point Estimate

Formula

InterprƩtation

Notes

Sum of squares

df

2 Ɨ

σ^

SS

āˆ‘=

n i

i^

x

x

SS

1

(^

No easy interpretation.

Mean

μ^

x^

āˆ‘=

n i

xi

n

x^

1

A measure of centrallocation; balancing pt.

Variance

2

2 s

2

=^

SSn

s^

A measure of spreadexpressed in units squared

StandardDeviation

σ^

s^

2 s

s^

=^

or

SSn

A measure of spreadexpressed in data units.More appropriate fordescriptive purposes.

-^

Mean and standard deviation are bestsuited to symmetrical distributions.

-^

When distribution is Normal, 68% of datapoints lie within +

σ^

of μ, 95% within

σ^

of μ, and 99.7% lie within +

σ^

of μ

-^

For other distributions, use Chebychev’srule (e.g., at least 75% of data lie within+

σ^

of μ).

Statistic

Formula

Interpretation

5-point Summary

Notes of boxplot

Median

Median has depth of

n^

A measure of centrallocation

InterquartileRange

(

)

IQR

Q

Q

IQR

=^

A measure of spread,aka ā€œhinge-spreadā€

Lower Fence^ (^

) l

F^

(^

)

IQR

Q

Fl

Helps determine:Lower inside valueLower outside value(s)

Upper Fence^ (^

) u

F^

(^

)

IQR

Q

Fu

Helps determine:Upper inside valueUpper outside value(s)

Q0 – MinimumQ1 – First QuartileQ2 – MedianQ3 – Third quartileQ4 – Maximum

-^

Provide information about locations, spread, andshape. The box contains middle 50% of data. Lineinside the box is the median.

-^

Anything above the upper fence or below the lowerfence is ā€œoutside.ā€ (Fences are

not

drawn.)

Plot

outside values as separate points.

-^

The lower whisker is drawn from Q1 to the lowerinside value. The upper whisker is drawn from Q3to the upper inside value.

Basic Biostatistics

Formulas

Jane Pham & B. Burt Gerstman

C:\data\biostat-text\formulas1.doc Last printed 12/20/2007 4:39:00 PM

Page 2 of 3

Probability (Chapters 5–7)

ƒ^

Probability

relative frequency in the population; expected proportion after a very long run of trials; can be used to quantify subjective statements.

ƒ^

Properties of probabilities Basic: (1) 0

Pr(A)

1; (2) Pr(S) = 1; (3) Pr(

āˆ’Pr(A); and (4) Pr(A or B) = Pr(A) + Pr(B) for disjoint events.

Advanced: (5) If A and B are independent, Pr(A and B) = Pr(A) Ā· Pr(B) (6) Pr(A or B) = Pr(A) + Pr(B)

Pr(A and B) (7) Pr(B|A) = Pr(A and B) / Pr(A) (8) Pr(A

and B) = Pr(A) Ā· Pr(B|A) (9) Pr(B) = [Pr(B and A)] + Pr(B and

) (10) Bayes’ Theorem (p. 111)

ƒ^

Binomial variables

:^ X

~ b(

n ,^

p ),

x n x x n^

q p C x X^

āˆ’

=^

Pr(

where

!^ x n x

n

C^ xn

=^

and

q

p

ƒ^

Cumulative probability:

Pr(

X^

≤^

x ) = sum all probabilities up to and including Pr(

X^

=^

x ); corresponds to AUC in the left tail of the

pmf

or

pdf.

ƒ^

Normal variables

:^ X

~ N(

μ,

σ

). To determine Pr(

X^

≤^

x ), standardize

=^

x z^

and look up cumulative probability in

Z

table. Use the fact that the AUC sums to 1

to determine probabilities for various ranges.To find a value that corresponds to a given probability, look up closest

z^ p

in the Z table and then unstandardize according to

x^

=^

μ^

+^

z^ p

·σ.

Introduction to Inference (Chapters 8–11)

ƒ^

The

sampling distribution of the mean (SDM)

is governed by the central limit theorem, law of large numbers, and square root law. When

n^

is large,

~^

x

N x

μ^

where

x

σ^

is the standard error (

SE

) and is equal to

σ^ n

. The standard estimate is estimated by

s^ n

when the population standard deviation is

not known. ƒ^

α)100% confidence interval for

μ

.^ Use

x SE z x^

±^

α^ āˆ’ 12

when

σ

is known. Use

x

n^

SE

t x^

ā‹…

±^

āˆ’ āˆ’^ 12 , 1

α^

when relying on

s.

ƒ^

Hypothesis testing basics.

Know all the steps, not just the conclusion and keep in mind that hypothesis tests require certain conditions (e.g., Normality,

independence, data quality) to be valid. The steps are:A.

H

and 0

H

[For one-sample test of a mean, 1

H

0: μ = μ

where μ 0

is the mean specified by the null hypothesis.] 0

B. Test statistic [For one-sample test of a mean, use either

x x SE

z^

0

stat

=^

or

with 0

stat

=^

n df

x SE

t

μ^ x

.]

C.

P

-value. Convert the test statistic to a

P

-value. Small

P

strong evidence against

H

D. Significance level. It is unwise to draw too firm a line. However, you can use the conventions regarding marginal significance, significance, and highsignificance when first learning. ƒ^

Power and sample size basics.

Approach from estimation, testing, or ā€œpowerā€ perspective. Sample size requirement for limiting margin of error

m

is given by

2

1

2

=^

āˆ’^

m z n

The power of testing a mean is

āŽ›^ āŽœāŽœāŽ

āˆ’^

āˆ’

α

n

z^

2 1

.^ The sample size requirement of a one-sample

z^

or

t^

test:

(^

2

2 1

1 2

2

  • Ī”

=^

āˆ’

āˆ’

α

β

σ^

z

z

n^

. It is OK to use

s^

as a substitute for

σ

in power and sample size formulas, when necessary.