Law of the unconscious statistician

In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function g(X) of a random variable X when one knows the probability distribution of X but one does not know the distribution of g(X). The form of the law can depend on the form in which one states the probability distribution of the random variable X. If it is a discrete distribution and one knows its probability mass function ƒ_X (but not ƒ_g(X)), then the expected value of g(X) is

\operatorname {E} [g(X)]=\sum _{x}g(x)f_{X}(x),\,

where the sum is over all possible values x of X. If it is a continuous distribution and one knows its probability density function ƒ_X (but not ƒ_g(X)), then the expected value of g(X) is

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f_{X}(x)\,\mathrm {d} x

If one knows the cumulative probability distribution function F_X (but not F_g(X)), then the expected value of g(X) is given by a Riemann–Stieltjes integral

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)\,\mathrm {d} F_{X}(x)

(again assuming X is real-valued).^[1]^[2]^[3]^[4]

Etymology[]

This proposition is known as the law of the unconscious statistician because of a purported tendency to use the identity without realizing that it must be treated as the result of a rigorously proved theorem, not merely a definition.^[4]

Joint distributions[]

A similar property holds for joint distributions. For discrete random variables X and Y, a function of two variables g, and joint probability mass function f(x, y):^[5]

\operatorname {E} [g(X,Y)]=\sum _{y}\sum _{x}g(x,y)f(x,y)

In the absolutely continuous case, with f(x, y) being the joint probability density function,

\operatorname {E} [g(X,Y)]=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }g(x,y)f(x,y)\,\mathrm {d} x\,\mathrm {d} y

Proof[]

This law is not a trivial result of definitions as it might at first appear, but rather must be proved.^[5]^[6]^[7]

Continuous case[]

For a continuous random variable X, let Y = g(X), and suppose that g is differentiable and that its inverse g⁻¹ is monotonic. By the formula for inverse functions and differentiation,

{\frac {d}{dy}}(g^{-1}(y))={\frac {1}{g^{\prime }(g^{-1}(y))}}

Because x = g⁻¹(y),

dx={\frac {1}{g^{\prime }(g^{-1}(y))}}dy

So that by a change of variables,

\int _{-\infty }^{\infty }g(x)f_{X}(x)\,dx=\int _{-\infty }^{\infty }yf_{X}(g^{-1}(y)){\frac {1}{g^{\prime }(g^{-1}(y))}}\,dy

Now, notice that because the cumulative distribution function $F_{Y}(y)=P(Y\leq y)$ , substituting in the value of g(X), taking the inverse of both sides, and rearranging yields $F_{Y}(y)=F_{X}(g^{-1}(y))$ . Then, by the chain rule,

f_{Y}(y)=f_{X}(g^{-1}(y)){\frac {1}{g^{\prime }(g^{-1}(y))}}

Combining these expressions, we find

\int _{-\infty }^{\infty }g(x)f_{X}(x)\,dx=\int _{-\infty }^{\infty }yf_{Y}(y)\,dy

By the definition of expected value,

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f_{X}(x)\,dx

Discrete case[]

Let $Y=g(X)$ . Then begin with the definition of expected value.

\operatorname {E} [Y]=\sum _{y}yf_{Y}(y)

\operatorname {E} [g(X)=Y]=\sum _{y}yP(g(X)=y)

\operatorname {E} [g(X)]=\sum _{y}y\sum _{x\,:\,g(x)=y}f_{X}(x)

\operatorname {E} [g(X)]=\sum _{x}g(x)f_{X}(x)

From measure theory[]

A technically complete derivation of the result is available using arguments in measure theory, in which the probability space of a transformed random variable g(X) is related to that of the original random variable X. The steps here involve defining a pushforward measure for the transformed space, and the result is then an example of a change of variables formula.^[5]

\int _{\Omega }g\circ X\,\mathrm {d} P=\int _{\mathbb {R} }g\,\mathrm {d} (X_{*}P)

We say $X$ has a density if $\mathrm {d} (X_{*}P)$ is absolutely continuous with respect to the Lebesgue measure $\mu$ . In that case

\mathrm {d} (X_{*}P)=f\,\mathrm {d} \mu

where $f:\mathbb {R} \to \mathbb {R}$ is the density (see Radon-Nikodym derivative). So the above can be rewritten as the more familiar

\operatorname {E} [g(X)]=\int _{\Omega }g\circ X\,\mathrm {d} P=\int _{\mathbb {R} }g(x)f(x)\,\mathrm {d} x

References[]

^ Eric Key (1998) Lecture 6: Random variables Archived 2009-02-15 at the Wayback Machine, Lecture notes, University of Leeds
^ Bengt Ringner (2009) "Law of the unconscious statistician", unpublished note, Centre for Mathematical Sciences, Lund University^{[dead link]}
^ Blitzstein, Joseph K.; Hwang, Jessica (2014). Introduction to Probability (1st ed.). Chapman and Hall. p. 156.
^ ^a ^b DeGroot, Morris; Schervish, Mark (2014). Probability and Statistics (4th ed.). Pearson Education Limited. p. 213.
^ ^a ^b ^c Ross, Sheldon M. (2010). Introduction to Probability Models (10th ed.). Elsevier, Inc.
^ Virtual Laboratories in Probability and Statistics, Sect. 3.1 "Expected Value: Definition and Properties", item "Basic Results: Change of Variables Theorem".
^ Rumbos, Adolfo J. (2008). "Probability lecture notes" (PDF). Retrieved 6 November 2018.

[1] Eric Key (1998) Lecture 6: Random variables Archived 2009-02-15 at the Wayback Machine, Lecture notes, University of Leeds

[2] Bengt Ringner (2009) "Law of the unconscious statistician", unpublished note, Centre for Mathematical Sciences, Lund University^{[dead link]}

[3] Blitzstein, Joseph K.; Hwang, Jessica (2014). Introduction to Probability (1st ed.). Chapman and Hall. p. 156.

[DeGroot2014-4] DeGroot, Morris; Schervish, Mark (2014). Probability and Statistics (4th ed.). Pearson Education Limited. p. 213.

[Ross2010-5] Ross, Sheldon M. (2010). Introduction to Probability Models (10th ed.). Elsevier, Inc.

[6] Virtual Laboratories in Probability and Statistics, Sect. 3.1 "Expected Value: Definition and Properties", item "Basic Results: Change of Variables Theorem".

[7] Rumbos, Adolfo J. (2008). "Probability lecture notes" (PDF). Retrieved 6 November 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]