Lord's paradox

From Wikipedia, the free encyclopedia

In statistics, Lord's paradox raises the issue of when it is appropriate to control for baseline status. In three papers, Frederic M. Lord gave examples when statisticians could reach different conclusions depending on whether they adjust for pre-existing differences.[1][2][3] Holland & Rubin (1983) use these examples to illustrate how there may be multiple valid descriptive comparisons in the data, but causal conclusions require an underlying (untestable) causal model.[4]

Lord's formulation[]

Sketch of Lord's paradox.

The most famous formulation of Lord's paradox comes from his 1967 paper:[1]

“A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June are recorded.” (Lord 1967, p. 304)

In both September and June, the overall distribution of male weights is the same, although individuals' weights have changed, and likewise for the distribution of female weights.

Lord imagines two statisticians who use different common statistical methods but reach opposite conclusions.

One statistician does not adjust for initial weight, instead using analysis of variance (ANOVA) and comparing gain scores (individuals' average final weight − average initial weight) as the outcome. The first statistician claims no significant difference between genders: "[A]s far as these data are concerned, there is no evidence of any interesting effect of diet (or of anything else) on student weights. In particular, there is no evidence of any differential effect on the two sexes, since neither group shows any systematic change." (pg. 305) Visually, the first statistician sees that neither group mean ('A' and 'B') has changed, and concludes that the new diet had no causal impact.

The second statistician adjusts for initial weight, using analysis of covariance (ANCOVA), and compares (adjusted) final weights as the outcome. He finds a significant difference between the two dining halls. Visually, the second statistician fits a regression model (green dotted lines), finds that the intercept differs for boys vs girls, and concludes that the new diet had a larger impact for males.

Responses[]

There have been many attempts and interpretations of the paradox, along with its relationship to other statistical paradoxes. While initially framed as a paradox, later authors have used the example to clarify the importance of untestable assumptions in causal inference.

Importance of modeling assumptions[]

Bock (1975)[]

Bock responded to the paradox by positing that both statisticians in the scenario are correct once the question being asked is clarified. The first statistician (who compares group means and distributions) is asking "are there differences in average weight gain?", whereas the second is asking "what are the differences in individual weight gain?"[5][additional citation(s) needed]

Cox and McCullagh (1982)[]

Cox and McCullagh interpret the problem by constructing a model of what could have happened had the students not dined in the dining hall, where they assume that a student's weight would have stayed constant. They conclude that in fact the first statistician was right when asking about group differences, while the second was right when asking about the effect on an individual.[5][additional citation(s) needed]

Holland and Rubin (1983)[]

Holland & Rubin (1983)[4] argue that both statisticians have captured accurate descriptive features of the data: Statistician 1 accurately finds no difference in relative weight changes across the two genders, while Statistician 2 accurately finds a larger average weight gain for boys conditional on a boy and girl have the same starting weight. However, when turning these descriptions into causal statements, they implicitly assert that weight would have otherwise stayed constant (Statistician 1) or that it would have followed the posited linear model (Statistician 2).

“In summary, we believe that the following views resolve Lord's Paradox. If both statisticians made only descriptive statements, they would both be correct. Statistician 1 makes ther unconditional descriptive statements that the average weight gains for males and females are equal; Statistician 2 makes the conditional (on X) statement that for males and females of equal September weight, the males gain more than the females. In contrast, if the statisticians turned these descriptive statements into causal statements, neither would be correct or incorrect because untestable assumptions determine the correctness of causal statements... Statistician 1 is wrong because he makes a causal statement without specifying the assumption needed to make it true. Statistician 2 is more cautious, since he makes only a descriptive statement. However, unless he too makes further assumptions, his descriptive statement is completely irrelevant to the campus dietician's interest in the effect of the dining hall diet." (pg. 19)

Moreover, the underlying assumptions necessary to turn descriptive statements into causal statements are untestable. Unlike descriptive statements (e.g. "the average height in the US is X"), causal statements involve a comparison between what happened and what would have happened absent an intervention. The latter is unobservable in the real world, a fact that Holland & Rubin term "the fundamental problem of causal inference" (pg. 10). This is explains why researchers often turn to experiments: while we still never observe both counterfactuals for a single subject, experiments let us make statistical claims about these differences in the population under minimal assumptions. Absent an experiment, modelers should carefully describe the model they use to make causal statements and justify those models as strongly as possible.

Initial weight as a mediator[]

Pearl (2016)[5] elaborates on previous critiques to illustrate why the so-called "paradox" appears surprising at first, and argues that both researchers could actually be measuring valid (but different) causal quantities. To focus on the comparison between boys and girls, he considers sex to be the "treatment" of interest.

First, Pearl describes why the result appears surprising at first: despite a positive association between being male and final weight within each strata of initial weight, there appears to be no difference when every weight strata is averaged together. The puzzle is therefore closely related to Simpson's paradox, and shares the same resolution: statistical associations may disappear or reverse upon aggregation when strata are of different sizes.

Second, Pearl posits a causal DAG to describe the situation, whereby sex and initial weight both influence final weight.[6] This frames initial weight as a mediating variable: sex influences final weight both through a direct effect and an indirect effect (by influencing initial weight, which then influences final weight). Note that none of these variables are confounders, so controls are not strictly necessary in this model. However, the choice of whether to control for initial weight dictates which effect the researcher is measuring: the first statistician does not control and measures a total effect, while the second does control and measures a direct effect.

"Cases where total and direct effects differ in sign are commonplace. For example, we are not at all surprised when smallpox inoculation carries risks of fatal reaction, yet reduces overall mortality by eradicating smallpox. The direct effect (fatal reaction) in this case is negative for every stratum of the population, yet the total effect (on mortality) is positive for the population as a whole." (pg 4)

Tu, Gunnell, and Gilthorpe (2008) use a similar causal framework, but counter that the conceptualization of direct and total effect is not the best framework in many cases because there are many different variables that could be controlled for, without an experimental basis that these are separate causal paths.[7]

Relation to other paradoxes[]

According to Tu, Gunnell, and Gilthorpe, Lord's paradox is the continuous version of Simpson's paradox. [8] Those authors state that Lord's Paradox, Simpson's Paradox, and the suppression of covariates by uncorrelated predictor variables are all the same thing, namely a reversal paradox.

Importance[]

Broadly, the "fundamental problem of causal inference"[4] and related aggregation concepts Simpson's paradox play major roles in applied statistics. Lord's Paradox and associated analyses provide a powerful teaching tool to understand these fundamental statistical concepts.

More directly, Lord's Paradox may have implications for both education and health policies that attempt to reward educators or hospitals for the improvements that their children/patients made under their care, which is the basis for No Child Left Behind.[9] It has also been suspected to be at work in studies linking IQ to eye defects.[10]

References[]

  1. ^ a b Lord, E. M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304–305. doi:10.1037/h0025105
  2. ^ Lord, F. M. (1969). Statistical adjustments when comparing preexisting groups. Psychological Bulletin, 72, 336–337. doi:10.1037/h0028108
  3. ^ Lord, E. M. (1975). Lord's paradox. In S. B. Anderson, S. Ball, R. T. Murphy, & Associates, Encyclopedia of Educational Evaluation (pp. 232–236). San Francisco, CA: Jossey-Bass.
  4. ^ a b c Holland, Paul W.; Rubin, Donald B. (1983). "On Lord's paradox". Principals of modern psychological measurement. pp. 3–25.
  5. ^ a b c Cited in Pearl, Judea (2016-01-01). "Lord's Paradox Revisited – (Oh Lord! Kumbaya!)". Journal of Causal Inference. 4 (2). doi:10.1515/jci-2016-0021. ISSN 2193-3677.
  6. ^ Clark, Michael. "Lord's Paradox". m-clark.github.io. Retrieved 2021-05-16.
  7. ^ From the text:

    "Whilst the total effect of birth weight on BP is not affected by the numbers of intermediate

    body size variables in the model, the estimation of 'direct' effect differs when different intermediate variables are adjusted for. Unless there is experimental evidence to support the notion that there are indeed different paths of direct and indirect effects from birth weight to BP, we are cautious of using such terminology to label the

    results from multiple regression, as with model 3. In other words, to determine whether the unconditional or conditional relationship reflects the true physiological relationship between birth weight and blood pressure, experiments in which birth weight and current weight can be manipulated are required in order to estimate the impact of birth weight on blood pressure." (pg8)

    Yu-Kang Tu, David Gunnell, Mark S Gilthorpe. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox. Emerg Themes Epidemiol. 2008; 5: 2. doi:10.1186/1742-7622-5-2 PMC 2254615

  8. ^ Yu-Kang Tu, David Gunnell, Mark S Gilthorpe. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox. Emerg Themes Epidemiol. 2008; 5: 2. doi:10.1186/1742-7622-5-2 PMC 2254615
  9. ^ DB Rubin, EA Stuart, EL Zanutto, A potential outcomes view of value-added assessment in education, Journal of educational and behavioral Statistics, Vol. 29, No. 1, Value-Added Assessment Special Issue (Spring, 2004), pp. 103–116 doi:10.3102/10769986029001103
  10. ^ Refractive state, intelligence, education, and Lord's paradox Sorjonen, K et al Intelligence 61, March–April 2017, 115–119 doi:10.1016/j.intell.2017.01.011
Retrieved from ""