
















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A lecture note from Statistics 102 at the University of California, Berkeley, delivered by Colin Rundel on April 15, 2013. The lecture focuses on logistic regression, a generalized linear model (GLM) used to model binary categorical variables using numerical and categorical predictors. The Donner Party example is used to illustrate the concepts, with data provided on the survival status (Died or Survived) of various members based on their age and gender.
What you will learn
Typology: Slides
1 / 56
This page cannot be seen from the preview
Don't miss anything!
Statistics 102
Colin Rundel
April 15, 2013
Background
(^1) Background
(^2) GLMs
(^3) Logistic Regression
(^4) Additional Example
Statistics 102
Background
At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical predictor Multiple regression Relationship between numerical response and multiple numerical and/or categorical predictors
Background
At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical predictor Multiple regression Relationship between numerical response and multiple numerical and/or categorical predictors
What we haven’t seen is what to do when the predictors are weird (nonlinear, complicated dependence structure, etc.) or when the response is weird (categorical, count data, etc.)
Background
Odds are another way of quantifying the probability of an event, commonly used in gambling (and logistic regression).
Odds For some event E , odds(E ) = P(E ) P(E c^ ) =^
P(E ) 1 − P(E ) Similarly, if we are told the odds of E are x to y then
odds(E ) = x y = x/(x + y ) y /(x + y ) which implies P(E ) = x/(x + y ), P(E c^ ) = y /(x + y )
GLMs
(^1) Background
(^2) GLMs
(^3) Logistic Regression
(^4) Additional Example
Statistics 102
GLMs
Age Sex Status 1 23.00 Male Died 2 40.00 Female Survived 3 40.00 Male Survived 4 30.00 Male Died 5 28.00 Male Died .. .
43 23.00 Male Survived 44 24.00 Male Died 45 25.00 Female Survived
GLMs
Status vs. Gender: Male Female Died 20 5 Survived 10 10
GLMs
It seems clear that both age and gender have an effect on someone’s survival, how do we come up with a model that will let us explore this relationship?
GLMs
It seems clear that both age and gender have an effect on someone’s survival, how do we come up with a model that will let us explore this relationship?
Even if we set Died to 0 and Survived to 1, this isn’t something we can transform our way out of - we need something more.
GLMs
It turns out that this is a very general way of addressing this type of problem in regression, and the resulting models are called generalized linear models (GLMs). Logistic regression is just one example of this type of model.
GLMs
It turns out that this is a very general way of addressing this type of problem in regression, and the resulting models are called generalized linear models (GLMs). Logistic regression is just one example of this type of model.
All generalized linear models have the following three characteristics: (^1) A probability distribution describing the outcome variable (^2) A linear model η = β 0 + β 1 X 1 + · · · + βnXn (^3) A link function that relates the linear model to the parameter of the outcome distribution g (p) = η or p = g −^1 (η)
Logistic Regression
Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors.
We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors.
Logistic Regression
Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors.
We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors.
To finish specifying the Logistic model we just need to establish a reasonable link function that connects η to p. There are a variety of options but the most commonly used is the logit function.
Logit function
logit(p) = log
p 1 − p
, for 0 ≤ p ≤ 1