Week 2 Companion Handout
A large share of epidemiologic data is categorical (e.g., smoking status, HIV status, vaccination). Once the data are arranged in a contingency table, we need to choose the correct test to assess association.
| Situation | Usual Statistical Approach |
|---|---|
| Two categorical variables, adequate sample size | Chi-square test |
| 2×2 table with small expected counts (< 5 in most cells) | Fisher's exact test |
| Need to control for one categorical confounder using strata | Mantel-Haenszel analysis |
| Multiple confounders, complex modelling, interaction | Logistic regression |
Assesses whether there is evidence of an association between two categorical variables. The null hypothesis is that the variables are independent.
Especially useful when sample sizes are small or expected cell counts are low. It does not rely on large-sample approximations.
A small p-value tells you the observed distribution is unlikely under the null hypothesis. It does not tell you how strong the association is, or if it is causal. You must look at the Odds Ratio (OR) for direction and strength!
Sometimes an observed crude association is distorted by a third variable (like Age). Mantel-Haenszel combines stratum-specific 2×2 tables to produce an adjusted estimate of association.