-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path09.0_Anova1.Rmd
2084 lines (1480 loc) · 103 KB
/
09.0_Anova1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
html_document:
fig_caption: yes
number_sections: yes
theme: readable
toc: yes
toc_depth: 3
---
```{r setup0, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE, include=FALSE}
# load packages for chapter
options(scipen = 999)
options(digits = 10)
library(bookdown)
library(emmeans)
library(ggplot2)
library(dplyr)
library(kableExtra)
library(knitr)
library(tables)
library(pander)
library(multcomp)
library(agricolae)
library(nlme)
library(car)
library(tidyr)
```
# Analysis of Variance {#chAnova}
see Formulas SG Ch 7-9 Nov 15, 2017 in Notability
See CRDwithSS in Explain Everything
## Learning Objectives for Chapter
1. Translate scientific questions into null and alternative hypotheses.
1. State the null and alternative hypotheses for ANOVA.
1. State the logical structure of testing differences among means using ANOVA.
1. Write down the model for ANOVA and label its components.
1. Define experimental unit and determine number of experimental units in an experiment.
1. State the typical assumptions for ANOVA.
1. Test ANOVA assumptions.
1. Describe how variance is partitioned in a simple one-way ANOVA.
1. Calculate sum of squares and the corresponding degrees of freedom.
1. Compare ANOVA with t-tests for hypothesis testing.
1. Calculate of experimental error with and without subsamples.
1. Run an ANOVA and interpret the results in terms of the original scientific question.
## Introduction to ANOVA
In the previous chapter we used t-tests to test the null hypothesis that two means are equal. The general idea was that if sample means differ a lot more than expected if true means were equal, we conclude that means were not equal. In each test we incurred a risk of making a mistake, either by rejecting a true null hypothesis of by failing to reject a false one. People are particularly keen on making sure that they do not reject true null hypotheses^[There may be some reasons to favor concern about error type I vs. II, but neither error is inherently more troubling. It is more useful to assess the relative importance of errors by thinking about their consequences. What happens if you fail to reject a false hypothesis? What happens if you reject one that is true?]. This means that we want to make sure that we control the probability or making an error type I, called $\alpha$ and that we keep it at the selected value, usually 0.05.
However, there are situations when researchers want to test more than two treatments at the same time. For example, a researcher may want to test the effects of four different fertilizers on grain yield, or of four different diets on milk quality. If we used independent t-tests to assess the hypothesis that all means are equal, with four treatments (A, B, C, D) we would have to do 6 tests (AB, AC, AD, BC, BD, CD), with a probability of making a mistake in any one of them equal to 0.05. If the tests are independent, the probability of making at least one error in the set of comparisons is $1 - 0.95^6 = `r round(1-0.95^6, 3)`$, which is much greater than the nominal 0.05. In an experiment with 10 treatments, one can make 45 comparisons. If the comparisons are independent, which is the worse case scenario, the probability of making at least one error type I is $1 - 0.95^{45} = 0.90$!
Analysis of variance helps correct this problem by performing a single test of the null hypothesis, regardless of the number of treatments. Preview: the remedy is limited if the null hypothesis is rejected (Be the first to ask why in lecture and get an Easter egg!! Make sure to mention that you want your Easter egg.)
```{block, type='mydef'}
The probability of making an error in one test is $\alpha$ and it is called the *test-wise* error rate.\
The probability of making at least one error in a set or "family" of tests is called *family-wise* error rate.\
For all tests done in a experiment, the error rate is called *experiment-wise*.
```
Before we present the method in detail, we illustrate it with an intuitive example. The null hypothesis of equality of all means is rejected if the variation among groups or sample averages is much larger than expected on the basis of the variation within samples. All objects in each of the square groups below were randomly selected from a single larger set called *parent* set. The set of objects in any square is called a *group*. Does it look like all groups have the same parent?
```{r anova.intuition1, echo=FALSE, out.width='60%', fig.cap="Does it look like all groups come from the same parent population?"}
# Code to generate 4 groups from two populations.
library(randomcoloR)
set.seed(1005)
par(mfrow = c(2,2), mai = c(1, 0.1, 0.1, 0.1))
for (i in 1:4) {
plot(rep(1:3, 3),
rep(1:3, each = 3),
pch = 21,
cex = 6,
xlim = c(-1, 5),
ylim = c(0.5, 3.5),
bg = randomColor(count = 9,
hue = c(rep("orange", 3), "yellow")[i],
luminosity = c(rep("light", 3), "bright")[i]),
axes = FALSE,
xlab = paste("GROUP", i, sep = " "),
ylab = "")
}
```
Is the answer the same for the next figure?
```{r anova.intuition2, echo=FALSE, out.width='60%', fig.cap="Does it look like all groups come from the same parent population?"}
# Code to generate 4 groups from a single population.
set.seed(3833)
par(mfrow = c(2,2), mai = c(1, 0.1, 0.1, 0.1))
for (i in 1:4) {
plot(rep(1:3, 3),
rep(1:3, each = 3),
pch = 21,
cex = 6,
xlim = c(-1, 5),
ylim = c(0.5, 3.5),
bg = randomColor(count = 9,
hue = "orange",
luminosity = "light"),
axes = FALSE,
xlab = paste("GROUP", i, sep = " "),
ylab = "")
}
```
Chances are that you thought one group was different from the rest in the first case, whereas no group stood out in the second case. Your eye (actually, mostly your brain) automatically assessed the variability between and within groups and compared them. One group in the first picture differed more from the rest than expected based on the variation of color within groups. This is the basis for ANOVA.
The intuitive recognition of group differences above is probably not the same for all people because of variation in the way people perceive and process color. The subjective method to assess differences is not reliable. Therefore, a formal and objective method has to be used. Most of the rest of this chapter consists of explaining how the intuition is formalized to allow calculations that are not subjective and that lead to the same results, regardless of the person doing the test.
In analysis of variance we use data to calculate the following:
1. Total variation among observations, variation between groups and variation within groups.
2. Ratio of variance between:within groups and its expected statistical distribution.
3. Probability of obtaining a ratio equal to or more extreme than the observed one if means are all equal.
If the ratio (calculated F-value) observed is too extreme, we reject the null hypothesis.
The main idea behind ANOVA is based directly on the most important equation in this course:
<br>
\begin{equation}
\sigma^2_{\bar{Y}} = \frac{\sigma_Y^2}{n} \quad \text{ for populations} \\[20pt]
S^2_{\bar{Y}} = \frac{S_Y^2}{n} \quad \text{ for samples} \\[20pt]
\end{equation}
<br>
```{block, type = 'stattip'}
- The variance of the average is the variance of the original random variable divided by the size of the sample used to calculate the average.
```
When there are several treatments, and IF THE TREATMENTS HAVE NO EFFECTS on the response variable, the observations in each treatment are a sample from the same population, and all samples come from the same population. Therefore, there is one sample from the same population for each treatment in the experiment. Each sample provides an average, so we have as many averages as there are treatments. Because we have many averages, we can estimate the variance among averages $\sigma_{\bar{Y}}$ directly, using the calculation formula for the estimated variance of any random variable applied over k independent observations of the random variable:
$$S^2_{\bar{Y}} = \frac{1}{k-1} \sum_{i=1}^k (\bar{Y}_i - \bar{Y}_{..})^2$$.
We obtain the estimated variance of Y simply by multiplying the estimated variance of the averages times the sample size:
$$S^2_Y = k \ S^2_{\bar{Y}}$$
Note that this calculation results in a valid estimate of the variance ONLY if all means and variances are equal.
We can also estimate the variance using the pooled variance:
$$S^2_Y = \frac{1}{k \ (r - 1)} \sum_{i=1}^k \sum_{j=1}^r (\bar{Y}_{ij} - \bar{Y}_{.k})^2$$
The pooled variance is the average squared deviations from each sample average, and thus, it does not need the means to be equal to be a valid estimate of the common variance for all treatments. This pooled variance is based on the same concept used for comparing two means using independent samples in [Chapter 8](#Case1). Pooling variances is desirable because it leads to better precision and narrower confidence intervals. A pooled estimated variance is based on more observations and is therefore more stable (the variance of the estimated variance is smaller!!), moreover, the variance of Student's t distribution is also reduced as the degrees of freedom increase. Pooling leads to smaller critical t-values, which equals more power (lower probability of error type II) and narrower confidence intervals.
If indeed the first estimate of the variance above, based on the variation among treatment averages, is a valid one, the quotient of the first estimate over the second one, based on the pooled variance, should have an F distribution with values about 1.0. We saw this when testing for [differences between independent variance estimates] (#compare2Variances). The statistical distribution resulting from the quotient of two variance estimates is called an F-distribution, which was fully described in the *F Distribution* section of the [Probability chapter](#chProb). If the quotient calculated, called "calculated F-value," is too far from 1.0 we conclude that the two variances are not equal, therefore, the estimation of the variance using the averages is not a valid estimate, and the averages are considered too different to come from the same population. Thus, one reasons, at least two means must be different, and the null hypothesis is rejected. This is further illustrated with a numerical example below.
## Model and Partitioning of Variance
When the different components of an experiment are identified and the effects are additive, then we can use a linear equation to calculate the value of any observation in the experiment. When we assign each of k treatments randomly to r replicates, we have an experimental design called "Completely Randomized Design" or CRD. The model for this design states that each observation is a random draw from a normal distribution that has a potentially different mean for each treatment, but with a common variance for all treatments. Moreover, the treatment mean is expressed as an overall mean plus a treatment effect. The model is written as follows:
<br>
\begin{align}
&Y_{ij} = \mu_{..} + \tau_i + \epsilon_{ij} \quad \quad i = 1, \dots, k, \quad j = 1, \dots, r \\[10pt]
&\epsilon_{ij} \sim iid \ N(0, \sigma^2) \\[15pt]
&\quad \quad \text{where} \\[15pt]
&\mu_{..} \quad \text{is the overall mean} \quad \mu_{..} = \frac{1}{k}\sum_{i = 1}^{k} \mu_{i.}\\[15pt]
&\tau_i \quad \text{is the effect of treatment i } \\[15pt]
&\epsilon_{ij} \quad \text{is the experimental error for treatment i and replicate j} \\[15pt]
& iid \quad \text{means "independent and identically distributed"}
(\#eq:modelCRD)
\end{align}
<br>
When we do not know the parameters of the actual distributions, which is usually the case, parameters are estimated and each observation is partitioned as follows:
<br>
\begin{align}
&Y_{ij} = \hat{\mu}_{..} + \hat{\tau}_i + \hat{\epsilon}_{ij} \\[10pt]
&= \bar{Y}_{..} + \hat{\tau}_i + e_{ij} \\[15pt]
&= \bar{Y}_{i.} + e_{ij} \quad \quad \text{where} \\[15pt]
&\bar{Y}{..} \quad \text{is the overall average calculated as } \frac{1}{k}\sum_{i = 1}^{k} \bar{Y}_{i.}\ \\[15pt]
&\bar{Y}_{i.} \quad \text{is the average for treatment i} \\[15pt]
&e_{ij} \quad \text{is the residual error for treatment i and repetition j} \\[15pt]
(\#eq:estModelCRD)
\end{align}
<br>
Based on the model above, each observation is partitioned into components:
- overall mean, which is estimated with the average of treatment averages,
- treatment effects, which are estimated as the differences between treatment averages and overall average,
- error, which is estimated with the residual or difference between observation and treatment average.
<br>
\begin{align}
Y_{ij} &= \bar{Y}_{..} + \hat{\tau}_i + \hat{\epsilon}_{ij} \\[15pt]
Y_{ij}- \bar{Y}_{..} &= \hat{\tau}_i + \hat{\epsilon}_{ij} \\[15pt]
Y_{ij}- \bar{Y}_{..} &= (\bar{Y}_{i.} - \bar{Y}_{..})+ (\bar{Y}_{ij} - \bar{Y}_{i.}) \\[15pt]
(\#eq:obsPartCRD)
\end{align}
<br>
The total deviation from observation to overall average is partitioned into the deviation from treatment average to overall average plus residual, where the residual is the difference between observation and treatment average. It can be shown that the sum of all total deviations squared equals the sum of squared estimated treatment effects plus the sum of squared residuals: total sum of squares equals treatment sum of squares plus residual sum of squares.
<br>
\begin{align}
\sum_{i=1}^k \sum_{j=1}^r (Y_{ij}- \bar{Y}_{..})^2 &= \sum_{i=1}^k \sum_{j=1}^r (\bar{Y}_{i.} - \bar{Y}_{..})^2 + \sum_{i=1}^k \sum_{j=1}^r (\bar{Y}_{ij} - \bar{Y}_{i.})^2 \\[15pt]
TSS &= SST + SSE\\[15pt]
\text{where} \\[15pt]
TSS &= \sum_{i=1}^k \sum_{j=1}^r (Y_{ij}- \bar{Y}_{..})^2 \\[15pt]
SST &= \sum_{i=1}^k \sum_{j=1}^r (\bar{Y}_{i.} - \bar{Y}_{..})^2 = \ r \ \sum_{i=1}^k (\bar{Y}_{i.} - \bar{Y}_{..})^2\\[15pt]
SSE &= \sum_{i=1}^k \sum_{j=1}^r (\bar{Y}_{ij} - \bar{Y}_{i.})^2\\[15pt]
(\#eq:ssPartCRD)
\end{align}
<br>
## Degrees of freedom
Degrees of freedom are quantities associated with sums of squares. Each sum of squares has an associated number that is called its degrees of freedom. We operationally define **degrees of freedom** associated with a sum of squares as the number of different terms in the sum minus the number of independent parameters estimated to be used in the calculation.
In reality, degrees of freedom refers to the number of dimensions in which vectors can vary. The basic mathematics of statistics involves large-dimensional spaces and sub spaces. In short, and for those who are curious, a sample of size N is a vector in N dimensions and has N degrees of freedom because it can vary freely over N dimensions. Once a sample is set, the overall average is a vector in N dimensions with each coordinate value equal to the average of the observations. The average vector and the observation vector form a hyper plane in N-1 dimensions where the difference between observation and average reside. The difference $Y_{i} - \bar{Y}_{.}$ is the vector of deviations or residuals (for the simplest model $Y_i = \mu + \epsilon_i$) from observation to average. Given a set sample, this vector of residuals has only N-1 degrees of freedom, because it must stay in the subspace spanned by the observation and the overall average. The restriction comes from the fact that given the overall average and N-1 of the observation coordinates, the last one can be calculated as $Y_N = N \ \bar{Y}_. - \sum_{i=1}^{N-1} Y_i$.
This concept is illustrated for a sample consisting of three observations, so we can plot them in a 3D figure.
<br>
```{r 3Dsample, echo=FALSE, warning=FALSE, fig.cap="Three-dimensional representation of a sample with 3 observations. The longest arrow (hypotenuse) represents the sample vector. The base of the triangle is the average and the third side is the vector of deviations from the average to the observation."}
# library(rgl)
smpl <- c(3, 4, 8)
avg <- rep(mean(smpl), 3)
rsdl <- smpl - avg
xyz0 <- c(0, 0, 0)
xyz.tips <- rbind(smpl, avg, smpl)
xyz.orig <- rbind(xyz0, xyz0, avg)
colnames(xyz.tips) <- c("obs1", "obs2", "obs3")
colnames(xyz.orig) <- c("obs1", "obs2", "obs3")
ssr <- rsdl %*% rsdl
#
# plot3d(xyz.tips,
# xlab = "observation 1",
# ylab = "observation 2",
# zlab = "observation 3",
# box = FALSE,
# xlim = c(0, 10),
# ylim = c(0, 10),
# zlim = c(0, 10))
#
#
# arrow3d(xyz0, xyz.tips[1,], type = "rotation", col = "grey", barblen = 0.05, width = 0.4)
# arrow3d(xyz0, xyz.tips[2,], type = "rotation", col = "lightblue", barblen = 0.05, width = 0.4)
# arrow3d(xyz.tips[1,], xyz.tips[2,], type = "rotation", col = "maroon", barblen = 0.05, width = 0.4)
# planes3d(a = -4,
# b = 5,
# c = -1,
# d = 0,
# alpha = 0.4)
library(plot3D)
arrows3D(x0 = xyz.orig[, 1],
y0 = xyz.orig[, 2],
z0 = xyz.orig[, 3],
x1 = xyz.tips[, 1],
y1 = xyz.tips[, 2],
z1 = xyz.tips[, 3],
xlim = c(0, 10),
ylim = c(0, 10),
zlim = c(0, 10),
phi = 35,
theta = 450,
lwd = 2,
d = 3,
bty = "g",
ticktype = "detailed")
```
<br>
Note that the three arrows form a square triangle with the observation vector being the hypotenuse, and the other two sides being the average and the residual. Therefore the length of the observation vector equals the length of the average vector plus the length of the residual vector. Using the Pythagoras theorem repeatedly, we can see that the length of the observation vector is the sum of squares of individual observations; the length of the average vector is 3 times the squared average and the length of the residual vector is the sum of squares of the residuals or deviations. The length of the average vector is called "correction factor" because it is used to calculate the total variation about the average by subtracting it from the length of the observation vector.
A really good reading to understand the concept of degrees of freedom is Walker [@Walker1940]. The following quote is taken from that paper:
>"A canoe or an automobile moves over a two-dimensional surface which lies upon a three-dimensional space, is a section of a three-dimensional space. At any given moment, the position of the canoe, or auto, can be given by two coordinates. Referred to a four-dimensional space-time universe, three coordinates would be needed to give its location, and its path would be a space of three dimensions, lying upon one of four."
>
> --- Walker (1940)
Thus, the car and canoe move in a 4-dimensional space but they have only 3 degrees of freedom, because they must move on a preset surface. Given the surface, their position on the surface and time is completely determined by 3 numbers. Analogously, the residuals exist in an N-dimensional space, but they must be on the hyper plane defined by the observation and the mean vector, which has N-1 dimensions or degrees of freedom. The geometric interpretation of sums of squares is applicable to all linear models, and in particular to the few that we study in this course. Although graphical representations that are realistic can only be done for the simplest models, graphical analogies are valid for all models. If you think you understand the geometric interpretation or concepts more easily than the equations, let your instructors know and they will be able to help you pursue better understanding using the geometric approach.
**Total degrees of freedom**
Consider an experiment where r plots are assigned randomly to each of k treatments (i is for treatments and j is for replicates or plots). Such experiment is a completely randomized design and has a total of N = r k plots or experimental units. The concept of experimental unit is explained in detail in the [Chapter about Experimental Design](#chEdesign). For now, just think of an experimental unit as a unit of experimental material (e.g.animal, group of animals, pot, land plot, Petri dish, etc.) that yields an observation independent of the rest of the observations.
When we use the average to estimate the overall mean, the set of deviations from the overall average has only rk-1 degrees of freedom (df). Thus, the total sum of squares or length of the vector of residuals has rk-1 df. This is called the total degrees of freedom associated with the total sum of squares of a set of samples.
<br>
\begin{align}
\text{Total sum of squares } &= TSS = \sum_{i-1}^k \sum_{j=1}^r (Y_{ij} - \bar{Y}_{..})^2 \\[15pt]
\text{Total df } &= r \ k \quad \text{observations} - 1 \ \text{estimated mean}
\end{align}
<br>
Using the operational definition above we can calculate the total df as follows: the summation has rk independent terms (one for each observation), but we use one parameter estimate, the estimate of the overall mean. Therefore the total df are rk-1.
**Degrees of freedom of treatments**
<br>
\begin{align}
&\text{Sum of squares of Treatments } \\[15pt]
= SST &= \sum_{i-1}^k \sum_{j=1}^r (\bar{Y}_{i.} - \bar{Y}_{..})^2 \\[15pt]
&= r \sum_{i-1}^k (\bar{Y}_{i.} - \bar{Y}_{..})^2 \\[15pt]
\text{Treatment df } &= k \quad \text{treatments} - 1 \ \text{estimated mean}
\end{align}
<br>
The summation has k independent terms, one for each treatment, and it uses the estimate of the overall mean. As a result, the treatment degrees of freedom equals k-1.
**Degrees of freedom of residuals**
<br>
\begin{align}
\text{Residual sum of squares } &= SSE = \sum_{i-1}^k \sum_{j=1}^r (Y_{ij} - \bar{Y}_{i.})^2 \\[15pt]
\text{Residual df } &= r \ k \quad \text{observations} - k \ \text{estimated treatment means }\\[15pt]
= dfe &= k(r-1)
\end{align}
<br>
The summation has rk independent terms, one for each observation, and it uses the k estimates for the treatment means. Therefore, the residual degrees of freedom are rk - k = k (r-1)
Both sum of squares and degrees of freedom are additive in the sense that the total is equal to the sum of the components:
<br>
\begin{align}
TSS &= SST + SSE \\[15pt]
\text{df Total } &= \text{df Treatment }+ dfe
\end{align}
<br>
## ANOVA Table
The equations above show how each observation and the total variation are partitioned into treatment and residual. Total sum of squares is partitioned into SS of treatments and SS of residual or error ^[We try to use the term "residual" for the calculated values, which are estimates of the unknown true errors. We mention the term "error" because most books use "SSE" to refer to the SS of residuals. The "E" comes from "error."]. Each sum of squares has an associated degrees of freedom. By dividing the SS by the df we obtain **mean squares**. The mean square of residuals (MSE) is the best estimate of the common variance for all samples, and it does not depend on treatment means being equal. The mean square of treatments (MST) has an expectation equal to the common variance PLUS a component that is directly proportional to the difference among means. Only when means are equal is the MST another estimate of the variance of the error. See the optional section of Expected Mean Squares for details.
<br>
|Source | df | Sum of Squares (SS) | Mean Squares (MS) | Calculated F |
|:----------------:|:--------:|:-------------------:|:-----------------:|:------------:|
|Total | k r - 1 | TSS | | |
|Among Treatments | k - 1 | SST | MST | MST / MSE |
|Within Treatments | k (r-1) | SSE | MSE | |
<br>
Sums of squares, mean squares and calculated F are all statistics. The values based on one sample are specific realized values of the random variables, which have their own expected values variances and distributions. If the null hypothesis that all treatment means are the same is true and all assumptions are correct, the calculated F has an F distribution with mean or expectation equal to $dfe/(dfe - 2)$, where dfe is the residual or error degrees of freedom.
```{block, type = 'stattip'}
- When MST is much greater than MSE, calculated F will be much greater than expected if treatment means were equal.
- If calculated F is greater than critical F value, or equivalently, if the probability of an F more extreme than calculated F is smaller than $\alpha$, we REJECT Ho.
```
## Assumptions of ANOVA
In order for the ANOVA results and interpretation to be valid, the following assumptions must be true:
a. the variable of interest must be normally distributed
b. each experimental unit must be independent from any other experimental unit
c. variances of the different treatments must be all equal ("homogeneous")
d. treatment and residual effects must be additive
These assumptions are part of the model above and are important for ANOVA, although some deviation from certain assumptions is not always a grave problem. The main part of the model states clearly that each observation is composed of three additive parts: overall mean, treatment effect and error. Further, the errors,-one for each observation- are assumed to all have identical, independent distributions that are normal with mean 0 and common variance $\sigma^2$ (or standard deviation $\sigma$).
<br>
```{block, type = 'stattip'}
ANOVA involves two tests of equality of variances:
1. The focal test that MST and MSE both estimate the same variance, which is a test of $H_0$.
2. The ancillary test that variances within treatments are equal, which is a corroboration of the assumption of homogeneity of variances.
```
<br>
## ANOVA example
We will study the ANOVA procedure with an example. First, the analysis is presented with the theory but as concisely as possible, from research question to conclusion, using R functions. Then, each step is explained in detail and calculations are presented step by step.
This is a fictitious example with elements based on actual research. Alfalfa geneticists and breeders created two transgenic varieties with genetic modifications of the metabolic pathways that synthesize ligning. Lignin is indigestible fiber that reduces the nutritional quality of forages. A common, unmodified variety and the two GMO varieties need to be compared to determine if the modification leads to better nutrition of cattle. Twelve animals were randomly selected from a large herd of beef cattle for the experiment. Varieties were labeled A, B and C, where A is the unmodified one. Twelve identical ping-pong balls were used for the randomization of treatment assignment to experimental units (animals). Four balls were labelled "A," four "B" and four "C." Balls were put into a bucket and mixed, then a blindfolded operator drew them randomly, one at a time and without replacement. The random sequence of letters was written down. Animals were lined up haphazardly and assigned the treatments from left to right in the order given by the random sequence. Each animal was housed in an individual pen and fed the corresponding variety of alfalfa for 30 days. Weight gain in lbs per day was calculated for the last 10 days of the experiment and used as the response variable.
<br>
```{r TblAnovaExmpl1, echo=FALSE, warning=FALSE}
aaAnovaExmpl1.dat <- read.table(header = TRUE, text = "
variety wgain
A 1.0
B 2.0
C 3.0
A 2.0
B 3.0
C 4.0
A 2.0
B 2.0
C 3.0
A 3.0
B 3.0
C 2.0
"
)
knitr::kable(
aaAnovaExmpl1.dat,
digits = 2,
align = "cc",
caption = 'Fictitious data. Weight gain in lbs/day of steers fed different varieties of alfalfa. The experiment was a completely randomized design with four replicates of each treatment'
) %>%
kable_styling(bootstrap_options = c("condensed"),
font_size = 11,
full_width = F,
position = "center") %>%
column_spec(c(1,2), width = "1.5in")
```
<br>
**Null hypothesis**: Weight gain is the same for all alfalfa varieties tested.
\begin{align}
H_0: \quad &\mu_A = \mu_B = \mu_C \quad &\text{or equivalently:} \\
&\tau_A = \tau_B = \tau_C = 0 \quad &\text{treatment effects are zero}
\end{align}
**Alternative hypothesis** There is at least one pair of varieties that differs in the resulting weight gains.
\begin{align}
H_A: \quad &\mu_A \ne \mu_B \quad \text{or} \quad \mu_A \ne \mu_C \quad \text{or} \quad \mu_B \ne \mu_C \\
\end{align}
###Formulas and calculations in R
We use the `lm` and `anova` functions of R. The `lm` function finds the optimal values for the estimated parameters by minimizing the residual sum of squares. The `anova` uses the estimated parameters to partition the total variance and gives the resulting calculated F with the corresponding probability of observing a value at least that extreme if indeed all varieties result in the same weight gain.
```{r, echo = TRUE, warning=FALSE}
transg.aa.m1 <- lm(wgain ~ variety, data = aaAnovaExmpl1.dat)
summary(transg.aa.m1)
```
The output shows the formula used for the model, the quartiles for the residuals, a table of coefficients with the estimated overall mean (Intercept), $\tau_B$ (varietyB), $\tau_C$ (varietyC) and the corresponding standard errors and t-tests for the null hypotheses that each coefficient is zero. We are not interested in testing that the coefficients are zero, so the t-tests can be ignored. The bottom part of the report shows the standard deviation of residuals and their degrees of freedom, the $R^2$ and adjusted $R^2$, and finally the calculated F statistic with its degrees of freedom and probability of observing a value equal to that or more extreme if $H_0$ were true. The $R^2$ is the proportion of the total variation represented by the variation among treatments. We will ignore the adjusted $R^2$ for now.
Before we move further, we need to inspect the residuals to make sure that there are no deviations from the assumptions. If residuals clearly do no meet assumptions, we cannot use ANOVA and the results above are not relevant.
```{r AnovaResPlot1, echo = TRUE, warning=FALSE, out.width = '90%', fig.align='center', fig.cap ="Left: Plot of residuals against estimated treatment means for the transgenic alfalfa experiment. Points were jittered horizontally because several residuals were equal. Right: quantile-quantile plot of residuals. If residuals have a normal distribution they should not deviate significantly from the straight line."}
par(mfrow = c(1,2))
plot(residuals(transg.aa.m1) ~ jitter(fitted(transg.aa.m1)),
ylab = "Residuals",
xlab = "Treatment averages (lbs/day)")
qqp(residuals(transg.aa.m1), id = FALSE)
```
<br>
The graph on the left shows that the vertical variation of points does not differ extremely among treatments, which are depicted as different values in the horizontal axis. Therefore, we do not have sufficient information to question the assumption of homogeneity of variance. The graph on the right shows that the observed quantiles do not deviate significantly (they are within the dashed curves) from the normal quantiles, so we cannot question the assumption of normality. In this data we do not have enough information to assess the independence of the observations, so that assumption is based on the way that animals were treated and kept separately, and it is not challenged. One possible way to challenge the independence of residuals would be to determine if observations have more similar residuals when animals are in closer pens, but the spatial distribution of pens is not known (the data are fictitious and no spatial distribution was created)^[It is strongly recommended that experimenters always record the operator ID, date, time and spatial location of all observations. For lab experiments, location on benches, batches, runs etc. should be recorded with each observation. The time-place-set-group information will allow experimenters to test for lack of homogeneity in variance and lack of independence in the residuals, as well as for nuisance effects of groupings or batches].
Although there are other more formal tests to assess whether assumptions are met, a visual inspection of the residuals is usually sufficient. Formal methods to test for deviation from assumptions are beyond the scope of this course. This data set is very small and the residuals do not show major problems, so the analysis is pursued. Note that when doing real data analysis, this step should be more detailed.
```{r, echo = TRUE, warning=FALSE}
kable(anova(transg.aa.m1), digits = c(0, 0, 3, 3, 4)) %>%
kable_styling(full_width = FALSE)
```
The analysis of variance table has columns for source of variations, degrees of freedom, sum of squares, mean squares, calculated F-value and probability of observing an F that large or larger if the null hypothesis were true. Of course, the F value and the probability are the same as in the previous table. The probability reported is the probability that an F-distributed random variable with 2 df in the numerator and 9 df in the denominator will take values equal to or greater than the calculated F. If the means are indeed all the same, then the calculated F is supposed to be an F-distributed random variable with 2 df in the numerator and 9 df in the denominator. If the probability is too low, we reject the F distribution, which cascades back to rejecting $H_0$.
Because the probability associated with the calculated F-value is larger than 0.05, we cannot reject the null hypothesis and the study is inconclusive. This is the same as to say that the calculated F is smaller than the critical $F$. We do not have enough evidence to say that the varieties lead to different weight gains, because the variation among treatment means is not sufficiently larger than that within treatments, due to experimental error (variation among experimental units due to other, unmeasured causes).
Keep in mind that not being able to reject the null hypothesis is not the same as accepting it. The reason for this is that *it is very easy not to reject a null hypothesis*: all you have to do is not try hard enough, for example by obtaining a very small sample. If not being able to reject the null meant that you accept it, then you would be able to accept any null hypothesis, which obviously is not very useful. It would simply lead you to be wrong most of the time.
If the above is not clear, consider the following. Say you took a a multiple choice test and obtained 75% correct. The questions are a sample of your knowledge. Your professor hypothesizes that your real knowledge of the material is at 73%. With the sample size available, he cannot reject his null hypothesis. Will you accept a grade of 73%?
```{block, type = 'stattip'}
- Failure to reject the null hypothesis DOES NOT mean that you accept it. It means the study is inconclusive, unless you did a calculation of power.
```
Even though treatments were not significantly different, we can calculate confidence intervals for the means and we can estimate the minimum difference between means that could be detected with the present variance and sample size. In all cases, the best estimate of the variance of the error we have is the MSE with dfe degrees of freedom.
**Confidence interval for a treatment mean**
<br>
\begin{align}
&{U \atop L} = \hat{\mu}_{i.} \pm t_{(1-\alpha/2), dfe} \ S_{\hat{\mu}_{i.}} \quad \text{ where} \\[15pt]
& \hat{\mu}_{i.} = \bar{Y}_{i.} \quad \text{is the estimated treatment mean or average} \\[15pt]
&t_{(1-\alpha/2), dfe} \quad \text{ is the } \quad 1-\alpha/2 \quad \text{ t quantile with dfe degrees of freedom} \\[15pt]
&S_{\hat{\mu}_{i.}} = S_{\bar{Y}_{i.}} \quad \text{is the standard deviation of the estimated treatment mean} \\[15pt]
&S_{\bar{Y}_{i.}}^2 = MSE/r \\[15pt]
(\#eq:CiTrtMeanCRD)
\end{align}
<br>
The key step in the calculation is finding $S_{\bar{Y}_{i.}}$. We recall that the treatment average is the average of all replicates for the treatment and apply the most important equation of this course. If the estimated variance of $Y$ is $S^2_Y$, then the estimated variance of $\bar{Y}_{i.}$ is $S^2_Y/r_i$ where $r_i$ is the number of replicates in treatment i. If the experiment is balanced and all treatments have the same number of replications, then $r = r_i$ which in the example is 4 replications. For $S^2_Y$ we use the best estimate of the variance of the error we have, which is the MSE. Note that the MSE is preferred to using the estimated variance based just on the r replicates for the treatment because it has more degrees of freedom. This works if the assumption of homogeneity of variance is correct.
For the calculation of confidence intervals we us the `emmeans` function in the `emmeans` [@Lenth2018] package of R. The `emmeans` function computes the estimated treatment means, their standard errors and confidence intervals. Note that the code below repeats the code for the table to have it printed in a nice format using the functions `kable` and `kableExtras` Thus, you can see the table as direct output form R and then as formatted by the software used to write this book.
<br>
```{r aaAnovaTbl}
emmeans::emmeans(transg.aa.m1, "variety")
emmeans(transg.aa.m1, "variety") %>%
kable(caption = "Estimated mean weight gains for animals
fed each variety of alfalfa, with the corresponding
standard errors and confidence intervals.",
digits = c(0, 1, 4, 0, 4, 4)) %>%
kable_styling(full_width = FALSE)
```
<br>
Table \@ref(tab:aaAnovaTbl) shows that all estimated treatment means have the same SE and df of the error, which leads to identical confidence interval widths. The t-value for the confidence intervals, known as "critical" t or colloquially "t from table" (as opposed to calculated t) is $t_{\ 0.975, \ 9} =$ `r qt(0.975,9)`
**Comparing two treatment means**
In this specific example the single F-test did not lead to rejection of $H_0$, so comparisons among means should not be pursued. However, we present the calculations that are necessary for comparing means using the *Least Significant Difference* (LSD) procedure. In this procedure, we calculate the smallest difference necessary to conclude that any two means are different. Any two means that differ by more than the LSD are considered to be "significantly" different, which means that we reject the null that they are the same.
As we saw in [Chapter 8: Two populations](#ch2pops), when we test the $H_0$ that two means are equal we have two random variables: each of the two estimated means. Because experimental units were randomly assigned to treatments and treated independently, the estimates should be, and are assumed to be, independent random variables. The test of equality becomes a test of whether the difference between the two random variable realizations is significantly different from zero. That is, we ask "is the difference much greater that it would be expected if the two means were actually equal?" The difference between two normal random variables is also a normal random variable with mean equal to the difference in the true means and variance equal to the sum of the variances of the two estimated means. Because we are assuming that there is homogeneity of variance, then the two variances are equal, so we use a pooled estimated variance. In the two sample case in Chapter 8 we pooled the two estimated variances. When we have several treatments, we use the estimated variance pooling all within-treatment variation, which is the MSE.
<br>
\begin{align}
&H_0: \mu_{1.} = \mu_{2.} \quad \equiv \quad \mu_{1.} - \mu_{2.} = 0 \\[15pt]
&H_A: \mu_{1.} \ne \mu_{2.}\\[15pt]
& \bar{d} = \bar{Y}_{1.} - \bar{Y}_{2.} \sim N(\mu_{\bar{d}}, \sigma^2_{\bar{d}}) \\[15pt]
& \sigma^2_{\bar{d}} = \sigma^2_{\bar{Y}_{1.}} + \sigma^2_{\bar{Y}_{2.}} = 2 \ \sigma^2_Y/r \quad \text{ which is estimated by } \quad S^2_{\bar{d}} = 2 \ MSE/r\\[15pt]
&\text{ therefore, if } \quad \mu_{\bar{d}} = 0 \implies t_{calc} = \frac{\bar{d}}{S_{\bar{d}}} \sim t_{dfe} \\[15pt]
(\#eq:DifTrtMeanCRD)
\end{align}
<br>
The decision rule is to compare the calculated t with the critical "two-tailed" t for $\alpha$. If the calculated t is greater than the critical t, then $H_0$ is rejected and the treatment means are said to be "significantly different." Note that if $$\bar{d}/S_{\bar{d}} \gt t_{(1-\alpha/2)}$$ then $$\bar{d} \gt t_{(1-\alpha/2)} \ S_{\bar{d}}$$ so the product of the critical t times the standard error of the difference between to averages is the smallest difference that will lead to a rejection of the null hypothesis.
```{r}
library(agricolae) #load package necessary for LSD test
LSD.test(transg.aa.m1,
"variety",
console = TRUE,
p.adj = "none")
```
The output reports:
- MSE,
- a table with estimated treatment means with their group standard deviations without pooling, number of observations in each group, confidence intervals (same as reported by `emmeams` above) and rank of each extreme of the confidence intervals,
- $\alpha$, dfe and critical t for the given $\alpha$, and dfe,
- least significant difference,
- table with treatment averages connected with letters.
The last part of the output, a table of averages followed with letters is a standard way to present the results. Means that are followed by at least one letter in common are not significantly different from each other. In this case there is only one letter following all the means because there are no significant differences.
### Detailed calculations
In this section we will repeat some of the analysis above in more detail. We will show the detailed calculations that are "behind the scene" when we used the high-level R functions.
There are two principal goals for the calculations:
1. get the calculated-F to compare to the "F from table," and
1. get the standard error of estimated means and differences between estimated means to test pairs of means or make confidence intervals.
To obtain the calculated F we need to complete the ANOVA table, which contains the MSE. The standard error of estimated means and of their differences is calculated using the MSE.
Sums of squares, the central component of the ANOVA table can be calculated by making a table with all components for each of the observations and then squaring and summing the components.
```{r}
# First calculate treatment averages and add them to each corresponding observation
# in a new column. We use the aggregate function.
var.avg <- aggregate(wgain ~ variety,
data = aaAnovaExmpl1.dat,
FUN = mean)
names(var.avg)[2] <- "avg.wgain" # Change name to avoid conflict during merge.
aaAnovaExmpl1.dat <- merge(aaAnovaExmpl1.dat, var.avg, all = TRUE)
# Calculate treatment effects
aaAnovaExmpl1.dat$trt.fx <- aaAnovaExmpl1.dat$avg.wgain - mean(aaAnovaExmpl1.dat$wgain)
# Calculate residuals
aaAnovaExmpl1.dat$rsdl <- aaAnovaExmpl1.dat$wgain - aaAnovaExmpl1.dat$avg.wgain
# Calculate total deviations from average
aaAnovaExmpl1.dat$tot.dev <- with(aaAnovaExmpl1.dat, trt.fx + rsdl)
print(aaAnovaExmpl1.dat)
```
The data frame now has the original observations plus their decomposition into overall average ($Y_{..} = `r mean(aaAnovaExmpl1.dat$wgain)`), treatment effect and residual. The total deviation from each observation to the overall average is the sum of treatment effect and residual, for example for the first two rows:
\begin{equation}
Y_{11} = 1.0 = 2.5 + (-0.5) + (-1.0) \\[15pt]
Y_{12} = 3.0 = 2.5 + (-0.5) + 1.0
\end{equation}
Total sum of squares is the sum of the squared values in the `tot.dev` column:
\begin{align}
TSS &= (-1.5)^2 + 0.5^2 + (-0.5)^2 + (-0.5)^2 + 0.5^2 + (-0.5)^2 \\
&+ 0.5^2 + (-0.5)^2 + 0.5^2 + 1.5^2 + 0.5^2 + (-0.5)^2 \\
&= 7.0
\end{align}
Treatment and residual SS can be found in a similar way using the corresponding columns.
\begin{align}
SST &= (-0.5)^2 + (-0.5)^2 + (-0.5)^2 + (-0.5)^2 + 0^2 + 0^2 \\
&+ 0^2 + 0^2 + 0.5^2 + 0.5^2 + 0.5^2 + 0.5^2 \\
&= 2.0
\end{align}
\begin{align}
SSE &= (-1)^2 + 1^2 + 0^2 + 0^2 + 0.5^2 + (-0.5)^2 + 0.5^2 \\
&+ (-0.5)^2 + 0^2 + 1^2 + 0^2 + (-1)^2 \\
&= 5.0
\end{align}
Degrees of freedom are as follows: For SSE there are 12 independent terms in the summation, and we estimated 3 treatment means to calculate it. Therefore, dfe = 12 - 3 = 9. Using the geometric approach, the observation vector is in a 12-dimensional space and the vector of treatment means is in a 3-dimensional space, so the residual vector is restricted to a 9-dimensions.
For TSS there are 12 independent terms and one estimate of the overall mean. Thus, total df = 12 - 1 = 11. Treatment or variety SS has 3 independent terms that repeat over each of the replications and it uses one estimate for the overall mean. Thus, df treatments = 3 - 1 = 2.
$$ MSE = \frac{SSE}{dfe} = \frac{5.0} {9} = 0.55556$$
$$MST = \frac{SST}{df_{treat}} = \frac{2.0} {2} = 1.0$$
$$F_{calc} = \frac{MST}{MSE} = 1.8 \quad \text{with 2 df in the numerator and 9 df in the denominator}$$
Critical F and the probability associated with the observed F are calculated in R as follows. Recall that the value of the critical F is a quantile, more specifically the $1 - \alpha$ quantile:
```{r}
# Critical F quantile
qf(p = 0.05, df1 = 2, df2 = 9, lower.tail = FALSE)
# Probability of F > Fcalc
pf(q = 1.8, df1 = 2, df2 = 9, lower.tail = FALSE)
```
This completes all the information in the ANOVA table. Now we proceed to make a confidence interval for an estimated treatment mean, for example for variety C. The standard error of the estimated mean is the MSE divided by the number of observations used to estimated the mean. In this case, all treatments has the same sample size or number of replications, and thus, all treatment standard errors are the same. However, in other experiments, sample sizes can differ among treatments and the standard error will be larger for the treatments with fewer observations.
$$S_C = \sqrt{\frac{MSE}{r}} = \sqrt{\frac{0.55556}{4}} = 0.372678$$
The critical value of t for the CI is obtained using the `qt` function in R. For a confidence interval we always use the 2-tailed quantile, which leaves $\alpha/2$ on each tail.:
```{r}
# Critical t
qt(p = 0.05/2, df = 9, lower.tail = FALSE)
# Average for variety C
mean(aaAnovaExmpl1.dat$wgain[aaAnovaExmpl1.dat$variety == "C"])
```
We corroborate that the value obtained is the same as above when we used the `LSD.test` function.
$${U \atop L} = 3.0 \ \pm \ 2.262157 \cdot 0.372678 = {3.843056 \atop 2.156944}$$
Finally, we construct a test and calculate the LSD to compare the means of varieties A and C. Because we are calculating a difference between two estimated means, the variance is twice that of each estimated mean. Critical t is the same, given that we use the same $\alpha = 0.05$ and that the dfe are the same as before. The mean of variety A is 2.0.
$$S_{\bar{d}} = S_{C-A} = \sqrt{\frac{2MSE}{r}} = \sqrt{\frac{2\cdot0.55556}{4}} = 0.5270463$$
$$\bar{d} = 3.0 - 2.0 = 1.0$$
$$t_{calc} = \frac{\bar{d}}{S_{\bar{d}}} = \frac{1.0}{0.5270463} = 1.897367$$
The probability associated with the calculated t is found in R taking into account that the test is two-tailed, so the probability for one tail has to be multiplied times 2:
```{r}
2 * pt(1.897367, df = 9, lower.tail = FALSE)
```
The LSD is the product of the critical t and $S_{\bar{d}}$
$$ LSD = 2.262157 \cdot 0.5270463 = 1.192262$$
We cannot reject the null hypothesis that the means for varieties A and C are equal because of three equivalent facts (in addition to the non-significant ANOVA test):
1. The probability associated with calculated t is greater than 0.05.
1. Calculated t is smaller than critical t.
1. The difference between averages is smaller than the LSD.
## Exercises and Solutions
1. Explain what analysis of variance means practically? How can one use variances to compare the means of different treatments?
2. What are the assumptions of the analysis of variance and why are each of them required prior to analysis?
3. Data is collected on 4 groups of 30 Romney ewes that were clipped and shorn on different days across 3 replicated time periods in New Zealand. Mean clean fleece weight (g) is shown in the table below.
<br>
Table: (\#tab:SheepWool) Mean clean fleece weight (g) is measured from ewes that were clipped and shorn on different days across three consecutive 112 day intervals (Bigham, 1974)
| Treatment | Rep 1 | Rep 2 | Rep 3 | Total | Mean |
|--------------:|:--------------:|:--------------:|:--------------:|:---------:|:---------:|
| A(28,28) | 1.44 | 1.13 | 1.93 | 4.50 | |
| B(56,56) | 1.34 | 1.11 | 1.87 | 4.32 | |
| C(112,112) | 1.30 | 1.03 | 1.88 | 4.21 | |
| D(28,112) | 1.33 | 1.00 | 1.87 | 4.20 | |
<br>
```{r}
sheep.wool <- data.frame(
'A' = c(1.44, 1.13, 1.93),
'B' = c(1.34, 1.11, 1.87),
'C' = c(1.30, 1.03, 1.88),
'D' = c(1.33, 1.00, 1.87))
```
a. What is the MST and MSE of this data?
b. What is the calculated F-value of this data?
c. Is there a significant difference among the treatments at a 5% significance level?
d. Is there a significant difference among the treatments at a 1% significance level?
e. Is there a significant difference among the treatments at a 10% significance level?
4. If treatments are considered significantly different at the 5% level, will they also be considered significantly at the 10% level? at the 1% level?
5. Data is collected on the yield (bushels/acre) in four different varieties of wheat replicated at a single location.
<br>
Table: (\#tab:WheatYield) Wheat yield (bushels/acre) is measured from four different varieties.
| Treatment | Rep 1 | Rep 2 | Rep 3 | Rep 4 | Rep 5 | Rep 6 |
|--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
| A | 60 | 40 | 45 | 55 | | |
| B | 25 | 30 | 25 | 35 | 30 | 25 |
| C | 55 | 50 | 40 | 60 | 55 | |
| D | 60 | 55 | 55 | 45 | 50 | |
<br>
```{r}
wheat.yield <- data.frame(
'A' = c(60, 40, 45, 55, NA, NA),
'B' = c(25, 30, 25, 35, 30, 25),
'C' = c(55, 50, 40, 60, 55, NA),
'D' = c(60, 55, 55, 45, 50, NA))
```
a. What is the MST and MSE of this data?
b. What is the calculated F-value?
c. Is there a significant difference among the treatments at the 5% significance level?
## Homework
### Introduction to ANOVA
This set of exercises introduces the need for and shows the use of a method called Analysis of Variance (ANOVA). This method is used to determine if there are differences in the means of several populations. Instead of two populations, now there are many. We start with a type of ANOVA called One-way ANOVA. The term "One-way" means that the populations may differ in only one factor. For example, the factor can be the amount of fertilizer applied to a crop or the type of diet fed to an animal. Conversely, "Two-way" ANOVA involves more than one factor, for example when plots differ not only in the amount of fertilizer but also in the amount of irrigation. For now we focus on one-way ANOVA.
### Need for ANOVA
We already know how to compare two means. When we have many means and want to determine if any of them are different we could simply test all possible pairs. Why is this not a good idea unless we take precautions? This first part of this homework illustrates this.
### Comparison of Diets
You are looking for diets that reduce blood cholesterol in rats. You created 14 test diets and did an experiment in which 4 randomly selected rats were independently fed each diet. Blood cholesterol was measured on each rat at the end of 30 days of feeding. Because you do this before knowing about ANOVA you decide to test for differences using the independent observations procedure for two population means, taking two diets at a time. You use alpha=0.05 for each test. Assume that the test are all independent.
1. How many different tests of $H_0: \mu_1 =\mu_2$ do you make? Consider that each time you make a test you are choosing 2 out of 14 means. The order of the means does not matter, because we are making 2-tailed tests.
2. What is the probability that you make a type I error in each test separately?
3. What is the probability that you make a type I error in just one of all the tests?
4. What is the probability that you make a type I error in none of the tests?
5. What is the probability that you make a type I error in at least one test (family-wise error rate)?
### Use of ANOVA
When you make a lot of tests, your chance of making at least one mistake can be quite large. ANOVA avoids the problem with your "all-tests" approach by doing a single test that simultaneously checks for evidence of at least one difference. Interestingly, although the test is to determine if there are differences among means, it actually is a test of equality of variances estimated in two different ways. So first we practice testing for equality (or difference) of variances. When two random variables have normal distributions with equal variance, the ratio of sample variances from each population has an F distribution (see [F Distribution](#FDist).
### Test of Equality of Variances
The data below are samples of forage yield (ton/ha) from a variety of fescue grown with and without irrigation. Test the hypothesis that irrigated and rainfed yields have the same variance.
<br>
```{r}
test.of.eq.var <- data.frame(
rainfed = c(4.55531, 3.28384,3.69763, 4.97093, 4.86167, 3.51156, 5.37135, 4.17079, 3.51320, 4.02560, 2.79041, 3.55350, 3.22606, 5.89298, 1.66526, 4.16665, 3.50212, 3.72783, 2.36477, 3.30673, 4.40777, 3.66480, 3.35171, 3.10409, 2.85976, 3.81750, 5.39221, 4.99803,4.11388, 4.11004),
irrigated = c(13.10275, 11.66548, 8.40485, 8.37005, 10.32271, 10.22372, 8.08630, 12.99069, 11.70000, 12.10000, 7.87410, 10.22707, 12.10700, 11.00452, 10.64577, 8.68308, 6.83079, 12.25721, 10.20000, 12.52602, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA))
```
<br>
```{block, type = 'rtip'}
Data frames are rectangular tables with data. All columns must have the same length. Because our two treatments do not contain the same number of samples, 'NA' values are included to make columns of equal length that can be joined into a data frame.**
```
<br>
For the following calculations, assume an $\alpha = 0.05$, and $df_{rainfed} = 29$ and $df_{irrigated} = 19$ to calculate these values.
6. Calculate the sample variances and report the "calculated" or sample F value.
7. What is the probability of observing an F-value larger than the one observed?
8. What is the critical F "from table?"
9. Interpret the results of the test of equality of variances.
### Estimation of the Variance Between and Within Samples
For this section of the exercise we will use yield (ton/ha) data about 10 fescue varieties, all grown in rainfed conditions. We assume that the variances are the same for all varieties. This is an important point to understand: we assume that the variance is the same for all varieties, regardless of their means. We are referring to the variances within each sample. Because the variances are assumed to be the same, we can follow the idea used in Case 1 of two populations means (textbook page 88) and pool the estimates to get a better overall estimate of the variance.
These are randomly simulated data. All variances are equal to 1 and the true means are below. These values are given just FYI, to see how samples differ.
```{r}
fescue.samples <- data.frame(
'var1' = c(3.10, 3.88, 2.37, 3.95, 2.42, 1.72, 3.28, 2.51, 2.44),
'var2' = c(3.84, 2.65, 3.10, 2.83, 1.81, 3.86, 4.25, 4.21, 5.44),
'var3' = c(5.80, 5.87, 4.25, 2.56, 4.32, 3.05, 4.31, 3.67, 2.36),
'var4' = c(5.03, 2.18, 3.49, 4.50, 1.99, 2.95, 3.48, 2.74, 3.89),
'var5' = c(3.90, 4.24, 4.60, 5.00, 4.04, 4.85, 4.65, 4.26, 5.47),
'var6' = c(7.57, 3.76, 6.66, 5.87, 5.69, 6.41, 5.12, 6.74, 6.04),
'var7' = c(2.47, 4.23, 2.63, 4.68, 3.22, 3.88, 3.17, 3.62, 3.22),
'var8' = c(3.23, 4.15, 4.12, 6.05, 3.66, 4.90, 3.88, 3.76, 3.15),
'var9' = c(4.01, 5.46, 4.31, 5.63, 5.22, 4.38, 4.75, 4.09, 4.60),
'var10' = c(6.96, 5.79, 6.76, 6.25, 6.83, 7.24, 7.26, 5.00, 4.66))
#Actual Population Means
pop.means <- data.frame(
'var1' = 3.0,
'var2' = 3.5,
'var3' = 4.0,
'var4' = 3.0,
'var5' = 5.0,
'var6' = 6.0,
'var7' = 3.2,
'var8' = 3.9,
'var9' = 4.5,
'var10' = 6.0)
df <- data.frame('var1' = 8, 'var2' = 8, 'var3' = 8, 'var4' = 8, 'var5' = 8, 'var6' = 8, 'var7' = 8, 'var8' = 8, 'var9' = 8, 'var10' = 8)
#Sample Averages for Each Variety
sample.avg <- sapply(fescue.samples, mean)
#Sample Variances for Each Variety
sample.var <- sapply(fescue.samples, var)
```
For the following calculations, assume an $\alpha = 0.05$, and $df_{var_i} = 8$ to calculate these values.
$S^2_{\bar{Y}} = \frac{S^2_{Y}}{r}$
$S^2 = \frac{(r_1 - 1)S^2_{1} + (r_2 - 1)S^2_{2} +(r_3 - 1)S^2_{3} + (r_4 - 1)S^2_{4} + (r_5 - 1)S^2_{5} + (r_6 - 1)S^2_{6} + (r_7 - 1)S^2_{7} + (r_8 - 1)S^2_{8} + (r_9 - 1)S^2_{9} + (r_10 - 1)S^2_{10}}{(r_1 -1) + (r_2 -1) + (r_3 -1) + (r_4 -1) + (r_5 -1) + (r_6 -1) + (r_7 -1) + (r_8 -1) + (r_9 -1) + (r_{10} -1)}$
10. Estimate the pooled variance for the 10 varieties of fescue.
Now we recall that the variance of averages for samples from a population is the variance of individual observations divided by sample size. If it is true that all means are the same for all varieties, then we can obtain another estimate of the individual variance by calculating the variance among averages and multiplying by sample size.
11. Estimate the variance of yield using the variance among averages and sample size. Refer to the most important equation for this course.
12. What is the overall average?
13. What is the pooled within variance?
14. What is the variance of averages?
15. What is the second variance estimate?
16. What is $F_{calc}$?
17. What is $F_{critical}$?
18. What is P($F$ > $F_{calc}$)?
19. What is $df_{numerator}$?
20. What is $df_{denominator}$?
## Laboratory Exercises
### Plant Sciences {#LabCh09PLS}
Prepare an .Rmd document starting with the following text, where you substitute the corresponding information for author name and date.
"---" Unquote the three dashes
title: "Lab05 CRD Anova"
author: "YourFirstName YourLastName"
date: "enter date here"
output: html_document
"---" Unquote the three dashes
```{r setupPS, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(pander)
library(ggplot2)
```
#### Instructions
For this lab you will modify this file and submit this file with the file name changed so it has your email ID (the part before @) in lower case instead of "email." Do not add spaces to the file name.
This is a markdown document. You will type your code and run it one line at a time as you add it in the lines indicated below. Add code **ONLY** in the areas between "\```{r}" and "\```". These areas are highlighted with a light grey color. Run each line and parts to learn and experiment until you get the result you want. Keep the lines that worked and move on. At any time you can see if your document "knits" or not by clicking on the Knit HTML icon at the top. Once you have completed all work, knit your document and save the html file produced with the same file name but with an html extension (Lab01email.html).
**Submit BOTH files for your lab report using the appropriate Canvas tool**
For each part and question below, type your code in the grey area below, between the sets of back-ticks (```) to perform the desired computation and get output. Type your answers **below** the corresponding grey area.
In this exercise we analyze the data resulting from an experiment at the UC Davis Plant Sciences research fields. The data and description of the experimental setup have been simplified to facilitate understanding. For the purpose of this lab we will consider that the experiment was designed as a Completely Randomized Design in which 12 treatments were applied randomly to 13 plots each. We will also assume that the variances of the errors or residuals are the same in all treatments. One medusahead plant was grown in each plot and its total seed production was recorded at maturity (seedMass.g).
Treatments resulted from combining two levels of nitrogen fertilization (n: no fertilizer added and N: nitrogen added), two levels of watering (w: no water other than rain, W: with water added) and three environments (s: areas without addition of seed representing the typical California Annual Grassland, S: areas where native perennial grasses were seeded, E: edge between seeded and unseeded areas). Based on previous experience and plant biology, we expect medusahead seed production to be lowest without water or fertilizer and when exposed to competition by native perennial grasses.
This exercise has four parts. First, we read in and explore the data with some descriptive statistics. We will use a transformation to normalize the distribution of the response variable. Second, each observation is partitioned into the components indicated by the model and then the corresponding sums of squares (SS) and degrees of freedom (df) are computed. The ANOVA table and tests are done with the resulting values of SS and dfe. Third, we repeat the analysis using pre-existing R functions to do the ANOVA tables and tests directly. Finally, we calculate confidence intervals for treatment means an back-transform them to be able to interpret results in the original units of grams of seed produced per plant.
#### Part 1. Inspection and summary of data [25 points]
Get a histogram of the data. Make a graph showing box-plots of the data for each treatment. Notice that the data have a highly skewed distribution, so a logarithmic transformation is necessary. Create a new column called logSmass that contains the log of seed mass after adding 0.3 to the mass. We add 0.3 to avoid problems with the zeroes, because log(0) is not defined. Plot histograms and box-plots of the new variable. Inspect the box-plots and determine if there appear to be any effects of treatments on the seed production of the invasive exotic weed. Create a table of averages and standard errors by treatment.
In the first R chunk we read in the data.
```{r}
# seed <- read.csv(file = "Lab05SeedMassData.txt", header = TRUE)
seed <- read.table(header = TRUE, text = "
id block Treatment seedMass.g
1 1 WNE 0.8452
2 1 WNE 1.628599896
3 1 WNE 1.71330012
5 1 WNE 0.605599925
63 2 WNE 1.478700073
64 2 WNE 1.655799925
65 2 WNE 3.579000285
66 2 WNE 2.52810012
67 2 WNE 0.676500068
119 3 WNE 0.84060003
120 3 WNE 0.455200004
121 3 WNE 0.642700076
122 3 WNE 0
6 1 wNE 2.39799978
7 1 wNE 3.630199812