HPG 6104 • Epidemiological Methods II

Logistic Regression in Epidemiology

Week 6 Student Companion Guide

Why Logistic Regression?

Epidemiologic outcomes are often binary (Disease/No Disease). Linear regression is inappropriate because it can predict probabilities < 0 or > 1. Logistic regression models the logit (log-odds), ensuring predictions fall between 0 and 1.

Feature	Linear Regression	Logistic Regression
Outcome Type	Continuous (e.g., BP)	Binary (0/1)
Coefficient	Change in mean Y	Change in log-odds
Effect Measure	Coefficient (β)	Odds Ratio (exp[β])

Interpreting Model Output

# The Equation
logit(p) = β₀ + β₁X₁ + ... + βₖXₖ

# Getting the OR
OR = exp(β₁)
95% CI = exp(β₁ ± 1.96 * SE)

OR < 1

Protective effect. Every unit increase in X decreases the odds of the outcome.

OR = 1

No association. The exposure does not affect the odds of the outcome.

OR > 1

Increased risk. Every unit increase in X increases the odds of the outcome.

Assumptions Checklist

Binary Outcome: The dependent variable must be strictly dichotomous (0 or 1).

Independence of Observations: Outcome events must not be clustered. Use mixed models if data are nested.

Linearity in the Logit: Continuous predictors must have a linear relationship with the log-odds, not the probability. Check via Box-Tidwell test.

Adequate Sample Size (EPV Rule): The Rule of 10: Aim for ≥ 10 outcome events per predictor variable in the model.

R Implementation

# 1. Fit the model
model <- glm(diabetes ~ glucose + age + mass,
data = df,
family = binomial(link = 'logit'))

# 2. Extract ORs and 95% CIs
exp(cbind(OR = coef(model), confint(model)))