Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Logistic Regression: A Statistical Approach using GLMs and the Donner Party Example, Slides of Machine Learning

A lecture note from Statistics 102 at the University of California, Berkeley, delivered by Colin Rundel on April 15, 2013. The lecture focuses on logistic regression, a generalized linear model (GLM) used to model binary categorical variables using numerical and categorical predictors. The Donner Party example is used to illustrate the concepts, with data provided on the survival status (Died or Survived) of various members based on their age and gender.

What you will learn

  • How can we interpret the coefficients in a logistic regression model?
  • What is logistic regression and how is it different from linear regression?
  • What is the significance of the Donner Party example in understanding logistic regression?
  • What is the role of the odds ratio in logistic regression?
  • How can we test hypotheses about the coefficients in a logistic regression model?

Typology: Slides

2020/2021

Uploaded on 09/18/2022

abhinandan-chougule
abhinandan-chougule 🇮🇳

1 document

1 / 56

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 20 - Logistic Regression
Statistics 102
Colin Rundel
April 15, 2013
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38

Partial preview of the text

Download Logistic Regression: A Statistical Approach using GLMs and the Donner Party Example and more Slides Machine Learning in PDF only on Docsity!

Lecture 20 - Logistic Regression

Statistics 102

Colin Rundel

April 15, 2013

Background

(^1) Background

(^2) GLMs

(^3) Logistic Regression

(^4) Additional Example

Statistics 102

Background

Regression so far ...

At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical predictor Multiple regression Relationship between numerical response and multiple numerical and/or categorical predictors

Background

Regression so far ...

At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical predictor Multiple regression Relationship between numerical response and multiple numerical and/or categorical predictors

What we haven’t seen is what to do when the predictors are weird (nonlinear, complicated dependence structure, etc.) or when the response is weird (categorical, count data, etc.)

Background

Odds

Odds are another way of quantifying the probability of an event, commonly used in gambling (and logistic regression).

Odds For some event E , odds(E ) = P(E ) P(E c^ ) =^

P(E ) 1 − P(E ) Similarly, if we are told the odds of E are x to y then

odds(E ) = x y = x/(x + y ) y /(x + y ) which implies P(E ) = x/(x + y ), P(E c^ ) = y /(x + y )

GLMs

(^1) Background

(^2) GLMs

(^3) Logistic Regression

(^4) Additional Example

Statistics 102

GLMs

Example - Donner Party - Data

Age Sex Status 1 23.00 Male Died 2 40.00 Female Survived 3 40.00 Male Survived 4 30.00 Male Died 5 28.00 Male Died .. .

43 23.00 Male Survived 44 24.00 Male Died 45 25.00 Female Survived

GLMs

Example - Donner Party - EDA

Status vs. Gender: Male Female Died 20 5 Survived 10 10

GLMs

Example - Donner Party - ???

It seems clear that both age and gender have an effect on someone’s survival, how do we come up with a model that will let us explore this relationship?

GLMs

Example - Donner Party - ???

It seems clear that both age and gender have an effect on someone’s survival, how do we come up with a model that will let us explore this relationship?

Even if we set Died to 0 and Survived to 1, this isn’t something we can transform our way out of - we need something more.

GLMs

Generalized linear models

It turns out that this is a very general way of addressing this type of problem in regression, and the resulting models are called generalized linear models (GLMs). Logistic regression is just one example of this type of model.

GLMs

Generalized linear models

It turns out that this is a very general way of addressing this type of problem in regression, and the resulting models are called generalized linear models (GLMs). Logistic regression is just one example of this type of model.

All generalized linear models have the following three characteristics: (^1) A probability distribution describing the outcome variable (^2) A linear model η = β 0 + β 1 X 1 + · · · + βnXn (^3) A link function that relates the linear model to the parameter of the outcome distribution g (p) = η or p = g −^1 (η)

Logistic Regression

Logistic Regression

Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors.

We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors.

Logistic Regression

Logistic Regression

Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors.

We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors.

To finish specifying the Logistic model we just need to establish a reasonable link function that connects η to p. There are a variety of options but the most commonly used is the logit function.

Logit function

logit(p) = log

p 1 − p

, for 0 ≤ p ≤ 1