Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where linearly independent variables are highly correlated.^[1] It has been used in many fields including econometrics, chemistry, and engineering.^[2]

The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers “RIDGE regressions: biased estimation of nonorthogonal problems” and “RIDGE regressions: applications in nonorthogonal problems”.^[3] ^[4] ^[1] This was the result of ten years of research into the field of ridge analysis.^[5]

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.^[6]^[2]

Mathematical details[]

In standard linear regression, an ${\textstyle n\times 1}$ column vector ${\textstyle y}$ is to be projected onto the column space of the ${\textstyle n\times p}$ design matrix ${\textstyle X}$ (typically ${\textstyle p\ll n}$ ) whose columns are highly correlated. The ordinary least squares estimator of the coefficients ${\textstyle \beta \in \mathbb {R} ^{p\times 1}}$ by which the columns are multiplied to get the orthogonal projection ${\textstyle X\beta }$ is

{\widehat {\beta }}=(X^{T}X)^{-1}X^{T}y

(where ${\textstyle X^{T}}$ is the transpose of ${\textstyle X}$ ).

By contrast, the ridge regression estimator is

{\widehat {\beta }}_{\text{ridge}}=(X^{T}X+kI_{p})^{-1}X^{T}y

where ${\textstyle I_{p}}$ is the ${\textstyle p\times p}$ identity matrix and ${\textstyle k>0}$ is small. The name 'ridge' refers to the shape along the diagonal of I.

References[]

^ ^a ^b Hilt, Donald E.; Seegrist, Donald W. (1977). "Ridge, a computer program for calculating ridge regression estimates".
^ ^a ^b Gruber, Marvin (26 February 1998). Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. ISBN 9780824701567.
^ Hoerl, Arthur E., and Robert W. Kennard. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, vol. 12, no. 1, 1970, pp. 55–67. [www.jstor.org/stable/1267351 JSTOR]. Accessed 13 March 2021.
^ Hoerl, Arthur E., and Robert W. Kennard. “Ridge Regression: Applications to Nonorthogonal Problems.” Technometrics, volume 12, number 1, 1970, pp. 69–82. [www.jstor.org/stable/1267352 JSTOR]. Accessed 13 March 2021.
^ Beck, James Vere; Arnold, Kenneth J. (1977). Parameter Estimation in Engineering and Science. ISBN 9780471061182.
^ Jolliffe, I. T. (9 May 2006). Principal Component Analysis. ISBN 9780387224404.

[Hilt-1] Hilt, Donald E.; Seegrist, Donald W. (1977). "Ridge, a computer program for calculating ridge regression estimates".

[Gruber-2] Gruber, Marvin (26 February 1998). Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. ISBN 9780824701567.

[3] Hoerl, Arthur E., and Robert W. Kennard. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, vol. 12, no. 1, 1970, pp. 55–67. [www.jstor.org/stable/1267351 JSTOR]. Accessed 13 March 2021.

[4] Hoerl, Arthur E., and Robert W. Kennard. “Ridge Regression: Applications to Nonorthogonal Problems.” Technometrics, volume 12, number 1, 1970, pp. 69–82. [www.jstor.org/stable/1267352 JSTOR]. Accessed 13 March 2021.

[Beck-5] Beck, James Vere; Arnold, Kenneth J. (1977). Parameter Estimation in Engineering and Science. ISBN 9780471061182.

[Jolliffe-6] Jolliffe, I. T. (9 May 2006). Principal Component Analysis. ISBN 9780387224404.

[1]

[2]

[3]

[4]

[5]

[6]