HPG 6104 • Epidemiological Methods II

Logistic Regression in Epidemiology

Week 6 Student Companion Guide

01

Why Logistic Regression?

Epidemiologic outcomes are often binary (Disease/No Disease). Linear regression is inappropriate because it can predict probabilities < 0 or > 1. Logistic regression models the logit (log-odds), ensuring predictions fall between 0 and 1.

Feature Linear Regression Logistic Regression
Outcome Type Continuous (e.g., BP) Binary (0/1)
Coefficient Change in mean Y Change in log-odds
Effect Measure Coefficient (β) Odds Ratio (exp[β])
02

Interpreting Model Output

# The Equation
logit(p) = β₀ + β₁X₁ + ... + βₖXₖ

# Getting the OR
OR = exp(β₁)
95% CI = exp(β₁ ± 1.96 * SE)

OR < 1

Protective effect. Every unit increase in X decreases the odds of the outcome.

OR = 1

No association. The exposure does not affect the odds of the outcome.

OR > 1

Increased risk. Every unit increase in X increases the odds of the outcome.

03

Assumptions Checklist

1
Binary Outcome: The dependent variable must be strictly dichotomous (0 or 1).
2
Independence of Observations: Outcome events must not be clustered. Use mixed models if data are nested.
3
Linearity in the Logit: Continuous predictors must have a linear relationship with the log-odds, not the probability. Check via Box-Tidwell test.
4
Adequate Sample Size (EPV Rule): The Rule of 10: Aim for ≥ 10 outcome events per predictor variable in the model.
04

R Implementation

# 1. Fit the model
model <- glm(diabetes ~ glucose + age + mass,
            data = df,
            family = binomial(link = 'logit'))

# 2. Extract ORs and 95% CIs
exp(cbind(OR = coef(model), confint(model)))
← Back to Portal © University of Nairobi