## Methodology and Application of Oneway ANOVA

**Eva Ostertagová**^{1}, **Oskar Ostertag**^{2,}

^{1}Department of Mathematics and Theoretical Informatics, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Nemcovej 32, 042 00 Košice, Slovak republic

^{2}Department of Applied Mechanics and Mechatronics, Faculty of Mechanical Engineering, Technical University of Košice, Letná 9, 042 00 Košice, Slovak republic

2. Oneway ANOVA Test Procedure

3. Post Hoc Comparison Procedures

4. Tests for Homogeneity of Variances

### Abstract

This paper describes the powerful statistical technique one-way ANOVA that can be used in many engineering and manufacturing applications and presents its application. This technique is intended to analyze variability in data in order to infer the inequality among population means. The application data were analyzed using computer program MATLAB that performs these calculations.

### At a glance: Figures

**Keywords:** one-way ANOVA test, normality tests, homoscedasticity tests, multiple comparison tests, MATLAB

*American Journal of Mechanical Engineering*, 2013 1 (7),
pp 256-261.

DOI: 10.12691/ajme-1-7-21

Received October 15, 2013; Revised October 28, 2013; Accepted November 13, 2013

**Copyright:**© 2013 Science and Education Publishing. All Rights Reserved.

### Cite this article:

- Ostertagová, Eva, and Oskar Ostertag. "Methodology and Application of Oneway ANOVA."
*American Journal of Mechanical Engineering*1.7 (2013): 256-261.

- Ostertagová, E. , & Ostertag, O. (2013). Methodology and Application of Oneway ANOVA.
*American Journal of Mechanical Engineering*,*1*(7), 256-261.

- Ostertagová, Eva, and Oskar Ostertag. "Methodology and Application of Oneway ANOVA."
*American Journal of Mechanical Engineering*1, no. 7 (2013): 256-261.

Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |

### 1. Introduction

Analysis of variance (ANOVA) is a statistical procedure concerned with comparing means of several samples. It can be thought of as an extension of the t-test for two independent samples to more than two groups. The purpose is to test for significant differences between class means, and this is done by analysis the variances.

The ANOVA test of the hypothesis is based on a comparison of two independent estimates of the population variance ^{[3]}.

When performing an ANOVA procedure the following assumptions are required:

Ÿ The observations are independent of one another.

Ÿ The observations in each group come from a normal distribution.

Ÿ The population variances in each group are the same (homoscedasticity).

ANOVA is the most commonly quoted advanced research method in the professional business and economic literature. This technique is very useful in revealing important information particularly in interpreting experimental outcomes and in determining the influence of some factors on other processing parameters.

The original ideas of analysis of variance were developed by the English statistician Sir Ronald A. Fisher (1890-1962) in his book “Statistical Methods for Research Workers” (1925). Much of the early work in this area dealt with agricultural experiments ^{[1]}.

### 2. Oneway ANOVA Test Procedure

The simplest case is one-way ANOVA. A one-way analysis of variance is used when the data are divided into groups according to only one factor.

Assume that the data are sample from population 1, are sample from population 2, , are sample from population *k*. Let denote the data from the *i*^{th} group (level) and *j*^{th} observation.

We have values of independent normal random variables and with mean and constant standard deviation ~ . Alternatively, each where are normally distributed independent random errors, ~ . Let is the total number of observations (the total sample size across all groups), where is sample size for the *i*^{th} group.

The parameters of this model are the population means and the common standard deviation

Using many separate two-sample *t*-tests to compare many pairs of means is a bad idea because we don’t get a *p*-value or a confidence level for the complete set of comparisons together.

We will be interested in testing the null hypothesis

(1) |

against the alternative hypothesis

(2) |

(there is at least one pair with unequal means).

Let represent the mean sample *i* ():

(3) |

represent the grand mean, the mean of all the data points:

(4) |

represent the sample variance:

(5) |

and is an estimate of the variance common to all *k* populations,

(6) |

ANOVA is centered around the idea to compare the variation between groups (levels) and the variation within samples by analyzing their variances.

Define the total sum of squares *SST*, sum of squares for error (or within groups) *SSE*, and the sum of squares for treatments (or between groups) *SSC*:

(7) |

(8) |

(9) |

Consider the deviation from an observation to the grand mean written in the following way:

(10) |

Notice that the left side is at the heart of *SST*, and the right side has the analogous pieces of *SSE* and *SSC*. It actually works out that:

(11) |

The total mean sum of squares *MST*, the mean sums of squares for error *MSE*, and the mean sums of squares for treatment *MSC* are:

(12) |

(13) |

(14) |

The one-way ANOVA, assuming the test conditions are satisfied, uses the following test statistic:

(15) |

Under *H*_{0} this statistic has Fisher’s distribution . In case it holds for the test criteria

(16) |

where is quantile of *F*distribution with and degrees of freedom, then hypothesis

*H*_{0} is rejected on significance level *α* ^{[1, 3]}.

The results of the computations that lead to the *F*statistic are presented in an ANOVA table, the form of which is shown in the Table 1.

In statistical softwares is used to be in this table column with *p*value. This *p*-value says the probability of rejection the null hypothesis in case the null hypothesis holds. In case , where *α* is chosen significance level, is the null hypothesis rejected with probability greater than % probability.

### 3. Post Hoc Comparison Procedures

One possible approach to the multiple comparison problem is to make each comparison independently using a suitable statistical procedure. For example, a statistical hypothesis test could be used to compare each pair of means, and , ; , where the null and alternative hypotheses are of the form

(17) |

An alternative way to test for a difference between and is to calculate a confidence interval for . A confidence interval is formed using a point estimate a margin of error, and the formula

(18) |

The point estimate is the best guess for the value of based on the sample data. The margin of error reflects the accuracy of the guess based on variability in the data. It also depends on a confidence coefficient, which is often denoted by . The interval is calculated by subtracting the margin of error from the point estimate to get the lower limit and adding the margin of error to the point estimate to get the upper limit ^{[6]}.

If the confidence interval for does not contain zero (thereby ruling out that ), then the null hypothesis is rejected and and are declared different at level of significance *α*.

The multiple comparison tests for population means, as well as the *F*-test, have the same assumptions.

There are many different multiple comparison procedures that deal with these problems. Some of these procedures are as follows: Fisher’s method, Tukey’s method, Scheffé’s method, Bonferroni’s adjustment method, DunnŠidák method. Some require equal sample sizes, while some do not. The choice of a multiple comparison procedure used with an ANOVA will depend on the type of experimental design used and the comparisons of interest to the analyst ^{[8]}.

The Fisher (LSD) method essentially does not correct for the type 1 error rate for multiple comparisons and is generally not recommended relative to other options.

The Tukey (HSD) method controls type 1 error very well and is generally considered an acceptable technique. There is also a modification of the test for situation where the number of subjects is unequal across cells called the Tukey-Kramer test.

The Scheffé test can be used for the family of all pairwise comparisons but will always give longer confidence intervals than the other tests ^{[6]}. Scheffé’s procedure is perhaps the most popular of the post hoc procedures, the most flexible, and the most conservative.

There are several different ways to control the experimentwise error rate. One of the easiest ways to control experimentwise error rate is use the Bonferroni* *correction. If we plan on making *m* comparisons or conducting *m* significance tests the Bonferroni correction is to simply use as our significance level rather than *α*. This simple correction guarantees that our experimentwise error rate will be no larger than *α*. Notice that these results are more conservative than with no adjustment. The Bonferroni is probably the most commonly used post hoc test, because it is highly flexible, very simple to compute, and can be used with any type of statistical test (e.g., correlations), not just post hoc tests with ANOVA.

The Šidák method has a bit more power than the Bonferroni method. So from a purely conceptual point of view, the Šidák method is always preferred.

The confidence interval for is calculated using the formula:

Ÿ

(19) |

where is the quantile of the Student’s *t*probability distribution, by Fisher method (LSD − Least Significant Difference);

Ÿ

(20) |

where represents the quantile for the Studentized range probability distribution, by TukeyKramer method* *(HSD − Honestly Significant Difference);

Ÿ

(21) |

by Scheffé method;

Ÿ

(22) |

where , is the number of pairwise comparisons in the family, by Bonferonni method;

Ÿ

(23) |

where and , by DunnŠidák method ^{[2]}.

### 4. Tests for Homogeneity of Variances

Many statistical procedures, including analysis of variance, assume that the different populations have the same variance. The test for equality of variances is used to determine if thtion of equal variances is valid.

We will be interested in testing the null hypothesis

(24) |

against the alternative hypothesis

(25) |

There are many testse assump of homogeneity of variances. Commonly used tests are the Bartlett (1937), Hartley (1940, 1950), Cochran (1941), Levene (1960), and Brown and Forsythe (1974) tests. The Bartlett, Hartley and Cochran are technically test of homogeneity. The Levene and Brown and Forsythe methods actually transform the data and then tests for equality of means.

Note that Cochran's and Hartley's test assumes that there are equal numbers of participants in each group.

The tests of Bartlett, Cochran, Hartley and Levene may be applied for number of samples . In such situation, the power of these tests turns out to be different. When the assumption of the normal distribution holds for these tests may be ranked by power decrease as follows: Cochran Bartlett Hartley Levene. This preference order also holds in case when the normality assumption is disturbed. An exception concerns the situations when samples belong to some distributions which have more heavy tails then the normal law. For example, in case of belonging samples to the Laplace distribution the Levene test turns out to be slightly more powerful than three others ^{[7]}.

Bartlett’s* *test has the following test statistic:

(26) |

where constant and meaning of all the others symbols is evident (see section 2). The hypothesis *H*_{0} is rejected on significance level *α*, when

(27) |

where is the critical value of the *chi*-*square* distribution with degrees of freedom.

Cochran’s test is one of the best methods for detecting cases where the variance of one of the groups is much larger than that of the other groups. This test uses the following test statistic:

(28) |

The hypothesis *H*_{0} is rejected on significance level *α*, when

(29) |

where critical value is in special statistical tables.

Hartley’s test uses the following test statistic:

(30) |

The hypothesis *H*_{0} is rejected on significance level *α*, when

(31) |

where critical value is in special statistical tables ^{[2]}.

Originally Levene’s test was defined as the one-way analysis of variance on the absolute residuals , and where

*k* is the number of groups and *n*_{i} the sample size of the *i*^{th} group. The test statistic has Fisher’s distribution and is given by:

(32) |

where , , .

To apply the ANOVA test, several assumptions must be verified, including normal populations, homoscedasticity, and independent observations. The absolute residuals do not meet any of these assumptions, so Levene’s test is an approximate test of homoscedasticity ^{[5]}.

Brown and Forsythe subsequently proposed the absolute deviations from the median of the *i*^{th} group, so is

### 5. Example from Technical Practice

What follows is an example of the one-way ANOVA procedure using the statistical software package, MATLAB.

One important factor in selecting software for word processing and database management systems is the time required to learn how to use a particular system. In order to evaluate three database management systems, a firm devised a test to see how many training hours were needed for six of its word processing operators to become proficient in each of three systems ^{[9]}. The data from this experiment are in the Table 2. Using a 5 % significance level, is there any difference between the training time needed for the three systems?

**5.1. Testing the Assumption of Normality**

One of the first steps in using the oneway ANOVA test is to test the assumption of normality. Even if the distribution is somewhat different from normal, oneway ANOVA can still work good if the sample sizes are large enough. However, when sample sizes are small, oneway ANOVA can be unreliable if the data in one or more of the groups comes from a highly nonnormal distribution.

For evaluating normality there are graphical and statistical methods. For example normal probability plot is a graph specifically designed to check for normality. If the data comes from a normal distribution the points should form a line. The statistical methods include diagnostic hypothesis tests for normality, where the null hypothesis is that there is no significant departure from normality for each of the groups/levels. The alternative hypothesis is that there is a significant departure from normality. The main tests for the assessment of normality are KolmogorovSmirnov (KS) test, Lilliefors test (corrected K-S test), ShapiroWilk test, AndersonDarling test, Cramervon Mises test, D’Agostino test and JarqueBera test.

For the above example we are using MATLAB with functions ^{[4]}:

Ÿ [h,p]=lillietest(x,0.05,'norm') for Lilliefors test,

Ÿ [h,p]=swtest(x,0.05) for ShapiroWilk test.

For example the ShapiroWilk test using significance level 0.05 give these results: for system 1, for system 2, and for system 3.

We would conclude that each of the levels of the independent variable are normally distributed.

**5.2. Testing the Assumption of Homogeneity of Variances**

We seek to test equality of variances (see part 4) and have run Bartlett’s test in MATLAB:

X=[20,17,15,19,14,13;18,17,14,20,13,12;23,25,20,21,...19,20]';[p,stats]=vartestn(X)

From the following analysis in MATLAB, the *p*value for Bartlett’s test (significance level 0.05 is here default) is .

Therefore, we would fail to reject the null hypothesis .

**5.3. Hypothesis Testing Using ANOVA**

Letting *µ*_{1}, *µ*_{2},_{ }and *µ*_{3} be the mean for the three systems, the null hypothesis is . The alternative is for at least one *i*,* l* pair ().

In MATLAB we use command:

[p,tbl,stats]=anova1(X)

This will return an ANOVA table, showing the value of the *F*statistic and *p*value, and a boxplot of three different groups. The results of the calculations for this case are summarized in Table 3 and Figure 1.

Since *p*-value is less then given significance level 0.05 for this problem, we reject the null hypothesis. There is a difference between the mean learning times for at least two of the three database management systems.

**Fig**

**ure**

**1.**Boxplot of three different group

**5.4. Pairwise Comparison**

When the null hypothesis is rejected using the *F*-test in ANOVA, we want to know where the difference among the means is. To determine which pairs of means are significantly different, and which are not, we can use the multiple comparison tests.

MATLAB implements for example the TukeyKramer procedure, the Bonferroni procedure, DunnŠidák procedure and reports the results in terms of the confidence interval.

Now we can make the 95 % confidence interval for differences in pair of population group means , ; .

In MATLAB we use the following series of commands:

multcompare(stats,'alpha',.05,'ctype','tukey-kramer')

multcompare(stats,'alpha',.05,'ctype','bonferroni')

multcompare(stats,'alpha',.05,'ctype','dunn-sidak')

The statistical outputs are, respectively, shown in Table 4, Table 5, Table 6 and Figure 2.

Figure 2 represents an interactive figure. By clicking on the group symbol at the bottom, in part of the figure is displayed the group from which the selected one statistically differs.

**Fig**

**ure**

**2**. The interactive figure

Using all of the three multiple comparison methods, we discover that system 3 takes significantly longer to learn than systems 1 and 2 which are similar.

### 6. Conclusion

In many statistical applications in business administration, psychology, social science, and the natural sciences we need to compare more than two groups. For hypothesis testing more than two population means scientists have developed ANOVA method.

The ANOVA test procedure compares the variation in observations between samples (sum of squares for groups, *SSC*) to the variation within samples (sum of squares for error, *SSE*). The ANOVA *F*test rejects the null hypothesis that the mean responses are equal in all groups if *SSC* is large relative to *SSE*.

The analysis of variance assumes that the observations are normally and independently distributed with the same variance for each treatment or factor level ^{[3]}. If the normality assumption of the oneway ANOVA *F*test is not met, we can use the KruskalWallis rank test.

### Acknowledgement

This article was created by implementation of the grant project VEGA no. 1/0102/11 Experimental methods and modeling techniques in-house manufacturing and non manufacturing processes.

### References

[1] | Aczel, A.D., Complete Business Statistics, Irwin, 1989. | ||

In article | |||

[2] | Brown, M., Forsythe, A., “Robust tests for the equality of variances,” Journal of the American Statistical Association, 364-367. 1974. | ||

In article | CrossRef | ||

[3] | Montgomery, D.C., Runger, G.C., Applied Statistics and Probability for Engineers, John Wiley & Sons, 2003. | ||

In article | |||

[4] | Ostertagová, E., Applied Statistic (in Slovak), Elfa, Košice, 2011. | ||

In article | |||

[5] | Parra-Frutos, I., “The behaviour of the modified Levene’s test when data are not normally distributed,” Comput Stat, Springer, 671-693. 2009. | ||

In article | |||

[6] | Rafter, J.A., Abell, M.L., Braselton, J.P., “Multiple Comparison Methods for Means,” SIAM Review, 44 (2). 259-278. 2002. | ||

In article | CrossRef | ||

[7] | Rykov, V.V., Balakrishnan, N., Nikulin, M.S., Mathematical and Statistical Models and Methods in Reliability, Springer, 2010. | ||

In article | CrossRef | ||

[8] | Stephens, L.J., Advanced Statistics demystified, McGraw-Hill, 2004. | ||

In article | |||

[9] | Taylor, S., Business Statistics.www.palgrave.com. | ||

In article | |||