Tukey's range test

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test,^[1] is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

Named after John Tukey,^[2] it compares all possible pairs of means, and is based on a studentized range distribution (q) (this distribution is similar to the distribution of t from the t-test. See below).^[3]

Tukey's test compares the means of every treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise comparisons

\mu _{i}-\mu _{j}\,

and identifies any difference between two means that is greater than the expected standard error. The confidence coefficient for the set, when all sample sizes are equal, is exactly $1-\alpha$ for any $0\leq \alpha \leq 1$ . For unequal sample sizes, the confidence coefficient is greater than 1 − α. In other words, the Tukey method is conservative when there are unequal sample sizes.

A common mistaken belief is that the Tukey hsd should only be used following a significant ANOVA. The ANOVA is not necessary because the Tukey test controls the Type I error rate on its own.

Assumptions[]

The observations being tested are independent within and among the groups.
The groups associated with each mean in the test are normally distributed.
There is equal within-group variance across the groups associated with each mean in the test (homogeneity of variance).

The test statistic[]

Tukey's test is based on a formula very similar to that of the $t$ -test. In fact, Tukey's test is essentially a $t$ -test, except that it corrects for family-wise error rate.

The formula for Tukey's test is:

q_{s}={\frac {Y_{A}-Y_{B}}{SE}},

where $Y$ _A is the larger of the two means being compared, $Y$ _B is the smaller of the two means being compared, and SE is the standard error of the sum of the means.

This $q s$ value can then be compared to a $q$ value from the studentized range distribution. If the $q s$ value is larger than the critical value $q α$ obtained from the distribution, the two means are said to be significantly different at level $\alpha :0\leq \alpha \leq 1$ .^[3]

Since the null hypothesis for Tukey's test states that all means being compared are from the same population (i.e. $μ$ ₁ = $μ$ ₂ = $μ$ ₃ = ... = $μ k$ ), the means should be normally distributed (according to the central limit theorem). This gives rise to the normality assumption of Tukey's test.

The studentized range (q) distribution[]

The Tukey method uses the studentized range distribution. Suppose that we take a sample of size n from each of k populations with the same normal distribution N(μ, σ²) and suppose that ${\bar {y}}$ _min is the smallest of these sample means and ${\bar {y}}$ _max is the largest of these sample means, and suppose S² is the pooled sample variance from these samples. Then the following random variable has a Studentized range distribution.

q={\frac {{\overline {y}}_{\max }-{\overline {y}}_{\min }}{S{\sqrt {2/n}}}}

This value of q is the basis of the critical value of q, based on three factors:

α (the Type I error rate, or the probability of rejecting a true null hypothesis)
k (the number of populations)
df (the number of degrees of freedom (N – k) where N is the total number of observations)

The distribution of q has been tabulated and appears in many textbooks on statistics. In some tables the distribution of q has been tabulated without the ${\sqrt {2}}$ factor. To understand which table it is, we can compute the result for k = 2 and compare it to the result of the Student's t-distribution with the same degrees of freedom and the same α. In addition, R offers a cumulative distribution function (ptukey) and a quantile function (qtukey) for q.

Confidence limits[]

The Tukey confidence limits for all pairwise comparisons with confidence coefficient of at least 1 − α are

{\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\pm {\frac {q_{\alpha ;k;N-k}}{\sqrt {2}}}{\widehat {\sigma }}_{\varepsilon }{\sqrt {\frac {2}{n}}}\qquad i,j=1,\ldots ,k\quad i\neq j.

Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison. The only difference between the confidence limits for simultaneous comparisons and those for a single comparison is the multiple of the estimated standard deviation.

Also note that the sample sizes must be equal when using the studentized range approach. ${\widehat {\sigma }}_{\varepsilon }$ is the standard deviation of the entire design, not just that of the two groups being compared. It is possible to work with unequal sample sizes. In this case, one has to calculate the estimated standard deviation for each pairwise comparison as formalized by in 1956, so the procedure for unequal sample sizes is sometimes referred to as the Tukey–Kramer method which is as follows:

{\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\pm {\frac {q_{\alpha ;k;N-k}}{\sqrt {2}}}{\widehat {\sigma }}_{\varepsilon }{\sqrt {{\frac {1}{n}}_{i}+{\frac {1}{n}}_{j}}}\qquad

where n_i and n_j are the sizes of groups i and j respectively. The degrees of freedom for the whole design is also applied.

Notes[]

^ Lowry, Richard. "One Way ANOVA – Independent Samples". Vassar.edu. Archived from the original on October 17, 2008. Retrieved December 4, 2008. Also occasionally as "honestly," see e.g. Morrison, S.; Sosnoff, J. J.; Heffernan, K. S.; Jae, S. Y.; Fernhall, B. (2013). "Aging, hypertension and physiological tremor: The contribution of the cardioballistic impulse to tremorgenesis in older adults". Journal of the Neurological Sciences. 326 (1–2): 68–74. doi:10.1016/j.jns.2013.01.016.
^ Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99–114. JSTOR 3001913.
^ ^a ^b Linton, L.R., Harder, L.D. (2007) Biology 315 – Quantitative Biology Lecture Notes. University of Calgary, Calgary, AB

External links[]

NIST/SEMATECH e-Handbook of Statistical Methods: Tukey's method

[Vassar-1] Lowry, Richard. "One Way ANOVA – Independent Samples". Vassar.edu. Archived from the original on October 17, 2008. Retrieved December 4, 2008. Also occasionally as "honestly," see e.g. Morrison, S.; Sosnoff, J. J.; Heffernan, K. S.; Jae, S. Y.; Fernhall, B. (2013). "Aging, hypertension and physiological tremor: The contribution of the cardioballistic impulse to tremorgenesis in older adults". Journal of the Neurological Sciences. 326 (1–2): 68–74. doi:10.1016/j.jns.2013.01.016.

[2] Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99–114. JSTOR 3001913.

[Calgary-3] Linton, L.R., Harder, L.D. (2007) Biology 315 – Quantitative Biology Lecture Notes. University of Calgary, Calgary, AB

[1]

[2]

[3]