-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path12.0_TrtStructure.Rmd
2391 lines (1579 loc) · 123 KB
/
12.0_TrtStructure.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
html_document:
fig_caption: yes
number_sections: yes
theme: readable
toc: yes
toc_depth: 3
---
```{r setup0, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE, include=FALSE}
# load packages for chapter
options(digits = 10)
library(bookdown)
library(emmeans)
library(ggplot2)
library(dplyr)
library(kableExtra)
library(knitr)
library(tables)
library(pander)
library(multcomp)
library(agricolae)
library(nlme)
library(car)
library(tidyr)
```
# Treatment Structures and Comparisons {#chTrtstr}
## Learning Objectives for Chapter {#LearnObj12}
1. Identify treatments and factor levels in a factorial experiment.
1. State the null and alternative hypotheses of a factorial experiment.
1. Define and compare simple and main effects in a factorial.
1. Given a graph of factorial experiment results, determine if there are main effects and/or interaction effects.
1. Calculate PLSD confidence intervals for a pairwise comparison of means and determine if two means are significantly different based on the CIs.
1. Design factorial treatment combinations to answer a research question.
1. Transform a real-world question into a null hypothesis about means.
1. Create linear combinations of means that address practical real-world questions.
1. Define statistical interaction and explain it graphically.
1. Test hypotheses involving more than one and more than 2 means.
1. Interpret result of ANOVA with factorial treatments.
1. Interpret a graph of a factorial experiment, including description of L, Q and C trends.
1. Given a set of treatments, formulate and test useful contrasts.
1. Determine if a linear combination of means is a contrast.
1. Create contrasts to estimate the value and variance of interactions.
1. Estimate the variance of a linear combination of means.
1. Make confidence intervals for linear combination of estimated means.
1. Calculate and use the Least Significant Difference (LSD) to test for differences among treatments.
1. Apply the Protected LSD and explain how it ameliorates the inflation of alpha?
1. Create linear combinations of estimated treatment means.
1. Create contrasts to answer research questions such as effects of treatments, linear and quadratic effects of continuous treatment variables.
## Pairwise Comparisons {#PairComp}
ANOVA performs one test for the whole experiment, and one for each effect or line item in the model, including treatments. The single F test for treatments is used to test the null hypothesis that there are no treatment effects. If the F-value is greater than the critical value "from table" we reject Ho and accept the alternative, but we still do not know which treatments or factor levels are significantly different from each other. Estimated means need to be compared to determine which ones are different.
### Least Significant Difference
Means are compared by subtracting one from the other and testing if the difference is significantly different form zero.
$$H_0: \mu_1 = \mu_2 \\
H_0: \mu_1 - \mu_2 = 0$$
Because the means are not known we use the estimates. Estimates of means are statistics, and because they are calculated as linear functions of normally distributed random variables, they are themselves normally distributed random variables.
```{block, type = 'stattip'}
- Recall: Linear combinations of normal random variables are normal random variables.
- The mean of a linear combination is the linear combination of the means.
- The variance of the difference between two RV's is the sum of the individual variances plus twice their covariance.
- Estimated means of balanced experiments are independent random variables (i.e. their covariance is zero).
```
The variance of an estimated mean depends on the number of observations used to estimate the mean (number of observations averaged) and on the variance of individual observations, which comes fro the random errors. The relationship between the variance of an average and the variance of individual observations is given by the **PRIME DIRECTIVE** of this course.
### Experiment-wise and Family-wise Error Rates
When multiple means are compared sequentially, a new problem arises. To understand the solution, first we need to fully understand the problem. In testing hypotheses we are particularly interested in avoiding problems of "false positives" where two averages appear to be significantly different while they really are not. This is called "Type I" error. The nominal probability of making a Type I error in a test is $\alpha$. This is referred to as the "test-wise" or "comparison-wise" error rate, because it is the error rate in each single comparison or test.
The probability of making at least one error Type I in a set of tests comparing treatments or testing other contrasts is called "family-wise" or "experiment-wise" error rate. Family-wise error rate tends to increase as the number of tests in the family increases, because an error in any of the tests is an error for the family. If tests are independent and test-wise error rate is kept constant at $\alpha$, the number of errors in a family has a Binomial distribution. The probability of not making **any** errors in n comparisons, assuming independence, is $(1-\alpha)^n$. Thus, the probability of making *at least* one error is $1-(1-\alpha)^n$.
As an example, consider an experiment where there are 6 treatments. The total number of pairwise comparisons is
$$\left(
\begin{array}{c}
6 \\
2
\end{array}
\right) = \frac{6!}{2!(6-2)!} = 15$$
With 15 comparisons, the probability of making at least one error in the set is `r 1-(1-0.05)^15`. Obviously, this error is much greater than the nominal rate for each comparison. Several methods are available to make sure that the family-wise error rate remains close to the nominal of 5%.
## Factorials {#Factorials}
Factorials are experiments where treatments have a special structure resulting from combining levels of different factors. The treatment structure allows us to assess the effects of several treatment factors as well as the interaction between factors. The concept of **interaction** is fundamental in statistics and in life in general and it is treated in detail below. Let's concentrate on the factorial structure for now.
For a factorial set of treatments we need at least two factors or variables, each of which will take at least two values. Suppose you are trying to invite someone to have a meal with you and you want to maximize the chances of being accepted. Two main factors come into play: the type of cuisine you propose and the meal. In your town you have Mexican, Thai, Chinese, American and Japanese restaurants. You can have lunch or dinner. Therefore, you have 5 x 2 options for the invitation. For another example, imagine that you want to determine the best combination of speed and tire pressure to maximize your fuel efficiency while commuting. The car manufacturer gives a recommendation for tire pressure, but that is not specifically adjusted for the roads in your commute, or for your typical range of speeds. You can choose several tire pressures around the manufacturers recommendation, and several speeds that are reasonable around your average. Most likely, the effect of tire pressure will depend on speed because the relative importance of tire resistance and bouncing depend on speed. Say you choose 32, 34 and 36 psi for tire pressure and 65, 70 and 75 mph for speed. You will have to test 3 x 3 combinations. These are called complete factorials because each level of each factor appears with each level of the other factors. Sometimes we say that the factors are *crossed* as in a table where rows are levels of one factor and columns are levels of the other factor (Table \@ref(tab:CrossedFactors))
<br><br>
Table: (\#tab:CrossedFactors) Treatments resulting from the factorial combination of three levels of tire pressure (columns) and three levels of speed (rows). The treatment structure is a 3x3 factorial.
| | 32 psi | 34 psi | 36 psi |
|--------:|:--------:|:--------:|:--------:|
| 65 mph | 65-32 | 65-34 | 65-36 |
| 70 mph | 70-32 | 70-34 | 70-36 |
| 75 mph | 75-32 | 75-34 | 75-36 |
<br><br>
### Interactions
Statistically, a two way interaction (or simply an interaction) is present when the effect of one factor depends on the level of another. The combined effects of changes in two factors is larger (synergistic or positive interaction) or smaller (antagonistic or negative interaction) than the sum of individual effects obtained when each factors is changed at a time. Interactions are common and we are familiar with them in daily life, but we typically do not take the time to think of them as interactions. Take for example the effect of putting on a nice warm coat on your level of comfort. It is a desirable effect if it is cold and not so good if it is really hot. There is an interaction between clothing and weather on comfort.
Three variables are involved in a two-way interaction, two predictor variables or treatment factors and one response variable. We say that predictor variable 1 (clothing) interacts with predictor variable 2 (ambient temperature) to determine the value of the response variable (level of comfort). Typical examples of interactions are:
- environment by genotype interaction on phenotype;
- drug interaction of alcohol with opiods on alertness;
- nitrogen and water supply interaction on crop yield.
Interactions are scale dependent, which is a specific effect of the fact that interactions depend on the response variable. An interaction that is significant in one scale may become non-significant after a logarithmic transformation of the response variable.
Hamilton and Saito [@HamiltonSaito2014] describe an interaction between level of education and political affiliation on belief in anthropogenic climate change (\@ref(fig:HamiltonSaito2014)). The effect of increasing education on the proportion of people that believe that climate change is mostly anthropogenic depends on their political affiliation: while for Democrats belief increases with increasing education, for Republicans there is no effect and for Tea Party members it.
<br>
```{r HamiltonSaito2014, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '50%', fig.align='center', echo=FALSE, fig.cap ="Example of an interaction graph where political party and level of education interact in determining the probability that a person believes that climate change is mostly anthropogenic. See citation in text."}
knitr::include_graphics("images/HamiltonSaito2014.png")
```
<br>
Olson et al. [@OlsonEtAl1991] reported a typical genotype by environment on calf birth weight showing that different cattle breed have specific adaptation to environmental conditions. Brahman is a breed of *Bos indicus* that evolved in Asia and has been selected for tropical conditions, whereas Angus and Hereford are breeds that were selected for colder climates with higher quality forages. The effect of location on calf birth weight (kg) depends on the breed, therefore, there is an interaction.
<br>
```{r OlsonEtAl1991, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '60%', fig.align='center', echo=FALSE, fig.cap ="Example of genotype by environment interaction on calf birth weight. British crosses (Angus-Hereford) perform better in Nebraska, whereas Brahman crosses do better in the semitropical climate of Florida. Vertical bars are standard errors of estimated means"}
calfBW <- read.table(header = TRUE, text = "
Location BreedType weight SE
Florida Angus-Hereford 26.9 0.7
Florida Brahman-Hereford 28.8 0.7
Nebraska Angus-Hereford 37.4 0.6
Nebraska Brahman-Hereford 36.0 0.5
")
ggplot(data = calfBW, aes(x = Location, y = weight, group = BreedType)) +
geom_line(size = 1.5,
aes(linetype = BreedType,
color = BreedType)) +
geom_point(size = 4,
aes(color = BreedType),
position = position_dodge(0.05)) +
geom_errorbar(aes(ymin = weight - SE,
ymax = weight + SE,
color = BreedType),
width = 0.2,
position = position_dodge(0.05)) +
labs(x = "Location", y = "Calf birth weight (kg)") +
theme(legend.position = c(0.7, 0.2))
```
<br>
Factorial treatment combinations can result from combining levels of more than two factors. There is no theoretical limit to the number of factors that can be included, but the number of treatments grows very fast with number of treatments and levels per factor. For example, a factorial of three factors, each with three levels has 27 different treatments.
When there are more than two factors, new levels of interactions appear in the partition of variation and each observation. With three factors, say A, B and C there are three 2-way interactions (AB, AC and BC) and one 3-way interaction (ABC). A 3-way interaction means that the two way interaction between two of the three factors is different in the two levels of the third factor.
Photosynthesis and plant growth are known to require nitrogen (mostly for enzymes, nucleic acids and structural proteins), light and carbon dioxide. Growth rate typically responds nonlinearly to each of these factors and is limited by all of them. Figure \@ref(fig:threeWayA) shows a representation of a typical pattern of response.
<br>
```{r threeWayA, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '60%', fig.align='center', echo=FALSE, fig.cap ="Representation of effects of light intensity, nitrogen and carbon dioxide concentration on photosynthesis rate by a well watered plant. $N_0$ and $N_1$ represents low and high nitrogen fertilization. $CO_{2_0}$ and $CO_{2_1}$ are low and high concentrations of carbon dioxide. Effects of increasing $CO_2$ concentration with high light intensity are labeled $e_1$ and $e_2$ for low and high N. Effects of increasing $CO_2$ concentration with low light intensity are labeled $e_3$ and $e_4$ for low and high N."}
knitr::include_graphics("images/3WayFactorialA.png")
```
<br>
To work our way up to the 3-way interaction we start with "effects" of increasing $CO_2$ concentration. The effect of increasing $CO_2$ concentration in with high light and N is calculated on the red curve in the top graph. The four curves are separated into different panels to facilitate the visualization of effects (\@ref(fig:threeWayB))
<br>
```{r threeWayB, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '100%', fig.align='center', echo=FALSE, fig.cap ="Same information as in previous figure, but presented in one separate set of axes for each combination of light intensity and N fertilizer addition. Top row has high N, and right colunm has high light intensity. The effect of increasing $CO_2$ concentration is depiced by a vertical bar and labeled $e_1$ to $e_4$."}
knitr::include_graphics("images/3WayFactorialB.png")
```
<br>
To calculate the effect of increasing $CO_2$ concentration in with high light and N, we move vertically on $CO_{2_0}$ and $CO_{2_1}$ until we reach the red curve. At each of the two points reached on the red curve we move horizontally to the left until we reach the Y axis, were we read the photosynthesis rate (PR) for low and high $CO_2$. The difference between the two PR's is the effect of increasing $CO_2$ for high light intensity and high level of N fertilization, and it is labeled $e_1$. The process is repeated for each one of the graphs to get the effect of increasing $CO_2$ in each of the other three combinations of light and N, which are labeled $e_2$ through $e_4$.
<br>
First, consider only the two graphs with high light intensity (right column of Figure \@ref(fig:threeWayB). We have a large effect of $CO_2$ with high N ($e_1$, red vertical bar) and a smaller effect of $CO_2$ with low N ($e_2$, yellow vertical bar). Therefore, the effect of $CO_2$ depends on the level of N, which fits the definition of a significant 2-way interaction. The value of this interaction is the difference in length between the red and the yellow bars or $e_1 - e_2$, which is represented in Figure \@ref(fig:threeWayC) with a green vertical bar.
Now consider only the two graphs with low light intensity (left column of Figure \@ref(fig:threeWayB)). By repeating the same procedure applied in the right column we get two new effects of increasing $CO_2$ ($e_3$ with high and $e_4$ with low N) and their difference, which is the value of the 2-way interaction between $CO_2$ and N with low light (dark red or maroon vertical bar).
Finally, just like the difference in effects of one factor between levels or a second one is the 2-way interaction, the difference between 2-way interactions between levels of a third factor is the 3-way interaction, which is represented by the dark pink vertical bar.
<br>
```{r threeWayC, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '100%', fig.align='center', echo=FALSE, fig.cap ="Two-way interactions between $CO_2$ and nitrogen are calculated as differences of effects of $CO_2$ between levels of N. There is a large difference in the high light condition and a smaller difference in the low light condifion. The difference between the 2-way $CO_2$ x N interactions in the two light conditions constitute the 3-way light x $CO_2$ x N interaction."}
knitr::include_graphics("images/3WayFactorialC.png")
```
<br>
Four factors produce six 2-way interactions (AB, AC, AD, BC, BD, CD), four 3-way interactions (ABC, ABD, ACD, BCD) and one 4-way interaction (ABCD), and so on. Two-way interactions are common and easy to understand intuitively; 3-way interactions may be common and are a bit harder to understand intuitively; 4-way interactions are almost impossible to understand in general terms for most people. However, I would guess that in the "real world" there are all sorts of complex interactions. Consider for example that you study plant growth rate and you use a factorial of four factors, each with 2 levels: with or without water, with or without carbon dioxide, with or without light and at -20 and 20 C. The plant will only grow in one of the 16 treatments, the one that includes water, carbon dioxide, light and a positive temperature. That is a 4-way interaction.
### Interactions and Main Effects
When a factorial has a significant interaction, the **main effects may have little meaning**, because they average over conditions that have different consequences for the factor in question. Take for example a typical (realistic but fictitious data) experiment in which a strong ecotype by environment is present. Two ecotypes (Montanus and Desertorum) are grown in two environments (desert and mountain) in what is called a reciprocal transplant experiment, where each ecotype is grown in its native environment and in the other's environment, with replication. Mature size of the organisms (could be plants or animals or anything else that is alive and has a mature size) is measured and analyzed.
<br>
```{r interxVsMain, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '60%', fig.align='center', echo=FALSE, fig.cap ="Interaction plot showing the effects of ecotpye and environment on mature size. Although there are obsious responses to the treatments, main effects show no response for either factor."}
knitr::include_graphics("images/interxVsMain.png")
```
<br>
Each ecotype does better than the other when grown in its original environment (Panel A, Figure \@ref(fig:interxVsMain)). However, on average over ecotypes and over environments, there are no differences in mature size (Panels B and C, Figure \@ref(fig:interxVsMain)). The main effects hide the clear pattern of response exhibited by the individual treatment means. It would not make sense to say that either factor has no effect, however it is true that averaged over the levels of ecotype, environment had not effect, and vice versa. Thus, whenever interactions are present and significant, interpretation of results should start with an inspection of interaction graphs.
<br>
### Model and Calculations in Factorials
Factorial treatments can be used with different experimental designs, because the treatment structure is independent of the experimental design that determines how treatments are assigned to experimental units. Consider an experiment in which two factors (say nitrogen fertilization and amount of irrigation) are used to create a factorial treatment structure by combining all levels of both factors. Each combination is a treatment, and we can (but some times we may not want to) ignore the treatment structure when assigning treatments to experimental units. We start with the simplest experimental design, the completely randomized design (CRD) where treatments are assigned to experimental units completely at random.
To make the equations general, we will call
- nitrogen *factor A* with levels $a_1, \dots, a_{k_a}$ and
- irrigation will be *factor B* with levels $b_1, \dots, b_{k_b}$.
For example, the third nitrogen fertilization level is called $a_3$, and the first level of irrigation is $b_1$, thus, the treatment with level 3 of nitrogen and level 1 of irrigation is called $a_3b_1$. The symbol $k_a$ represent the number of levels of factor A, and $k_b$ is the number of levels of factor B. The statistical model for this experiment is:
<br><br>
\begin{equation}
Y_{ijh} = \mu_{ij.} + \epsilon_{ijh} \\[20pt]
= \mu_{...} + \tau_{ij} + \epsilon_{ijh} \\[20pt]
= \mu_{...} + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijh} \\[20pt]
\epsilon_{ijh} \sim N(0,\sigma) \\[20pt]
\text{where: } \quad \mu_{...} \quad \text{is the overall mean} \\[20pt]
\mu_{ij.} = \mu_{...} + \alpha_i + \beta_j + \alpha\beta_{ij} \quad \text{is the mean for treatment} \ a_ib_j\\[20pt]
\tau_{ij} = \alpha_i + \beta_j + \alpha\beta_{ij} \quad \text{is the effect of treatment} \ a_ib_j\\[20pt]
\alpha_i \quad \text{is the effect of level } a_i \text{ of factor } A\\[20pt]
\beta_j \quad \text{is the effect of level } b_j \text{ of factor } B \\[20pt]
\alpha\beta_{ij} \quad \text{is the interaction between } a_i \text{ and } b_j
(\#eq:crdFactorial)
\end{equation}
<br><br>
The model implies that each observation is partitioned into the five components present in the model: overall mean, effect of A, effect of B, interaction AB and random error. As usual, the model has a counterpart that uses estimates instead of the true but unknown parameter values. When the design is complete and balanced such that all treatments have the same number of replications,
<br><br>
\begin{equation}
Y_{ijh} = \hat{\mu}_{ij.} + \hat{\epsilon}_{ijh} \\[20pt]
= \hat{\mu}_{...} + \hat{\tau}_{ij} + \hat{\epsilon}_{ijh} \\[20pt]
= \hat{\mu}_{...} + \hat{\alpha}_i + \hat{\beta}_j + \hat{\alpha\beta}_{ij} + e_{ijh} \\[20pt]
\text{where: } \quad \hat{\mu}_{...} = \bar{Y}_{...} \quad \text{is the overall average} \\[20pt]
\hat{\tau}_{ij} = \bar{Y}_{ij.} - \bar{Y}_{...} = \hat{\alpha}_i + \hat{\beta}_j + \hat{\alpha\beta}_{ij} \quad \text{is the estimated effect of treatment} \ a_ib_j\\[20pt]
\hat{\alpha}_i = \bar{Y}_{i..} - \bar{Y}_{...} \quad \text{is the estimated effect of level } a_i \text{ of factor } A\\[20pt]
\hat{\beta}_j = \bar{Y}_{.j.} - \bar{Y}_{...} \quad \text{is the estimated effect of level } b_j \text{ of factor } B \\[20pt]
\hat{\alpha\beta}_{ij} = \bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.} + \bar{Y}_{...} \quad \text{is the estimated interaction between } a_i \text{ and } b_j
(\#eq:crdFactorialHat)
\end{equation}
<br><br>
If the experiment is laid out in a randomized complete block design, we simply add a term for blocks to the model (Equation \@ref(eq:rcbdFactorial)). Block effects are calculated as the difference of block averages and overall average, as usual
<br><br>
\begin{equation}
Y_{ijh} = \mu_{ij.} + B_h + \epsilon_{ijh} \\[20pt]
= \mu_{...} + \tau_{ij} + B_h + \epsilon_{ijh} \\[20pt]
\text{where: } \quad B_h \quad \text{is the effect of block h} \\[20pt]
B_h = \mu_{..j} - \mu_{...} \\[20pt]
\text{which is estimated as} \\[20pt]
\hat{B}_h = \bar{Y}_{..j} - \bar{Y}_{...} \\[20pt]
(\#eq:rcbdFactorial)
\end{equation}
<br><br>
#### R Code for Factorial in RCBD
The equations and calculations are illustrated with a numerical example. We conducted an experiment to determine the competitive ability of two naturalized grasses of the California Annual Grassland (Laca, unpublished data), *Lolium multiflorum* (Lam.) and *Bromus hordeaceus* (L.) which we will call Lomu and Brho for short. The experiment was laid out in an RCBD with 10 blocks and 6 treatments. Blocks were used because the field had fertility gradients caused by differences in soil types. Treatment structure was a 2 x 3 factorial of **target** plant (the species of the plant measured, which was Lomu or Brho) and **competitor** (self, both and none), which was the species surrounding the target individual. "Self" competitor was a competitor of the same species as the target (intraspecific competition), "both" was a mix of both species, and "none" meant that the target plant was grown in an area that was kept clear of any other species within a radius of 10 cm of the target plant. All plots were circular and had a radius of 10 cm around the target plant (Figure \@ref(fig:LomuBrhoTrtFig)). Areas around plots were kept clear of vegetation. Target plant mass and height were measured at maturity. Plant mass was transformed to meet the assumption of homogeneity of variance, but the range of values is comparable to the original units in mg per plant. We analyze plant mass as a function of blocks and the factorial set of treatments.
**Goal and hypothesis** The goal of the study (for this section) was to determine if intra and interspecific competition were different for Lomu and Brho. The null hypothesis is that the effect of adding competitors depends on whether the competitor is of the same species or not. For logistic reasons, instead of using treatments with only interspecific competitors, we used a mixture of both species. Therefore, the main reason for the experiment was to determine if there was a significant statistical interaction between target and competitor species. We also had more specific hypotheses about which species was the superior competitor and the relative effects of intra and interspecific competition. Those are addressed in the [Section about Contrasts](#Contrasts).
<br>
```{r LomuBrhoTrtFig, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '80%', fig.align='center', echo=FALSE, fig.cap ="Treatments used in the experiment to study competition between ryegrass (Lomu) and softchess (Brho). Rows are for target plants measured and columns are for competitor. Plots were circles of 20 cm diameter established in a field."}
knitr::include_graphics("images/LomuBrhoTrtFig.png")
```
<br>
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE, include=FALSE}
lb.compDat <- read.table(header = TRUE, text = "
block species competitor ht.cm mass.mg
1 Brho 1.Self 23.9 203
1 Brho 2.Both 18 133
1 Brho 3.None 31.5 269
1 Lomu 1.Self 23.8 205
1 Lomu 2.Both 32.3 245
1 Lomu 3.None 27.8 391
2 Brho 1.Self 23.4 268
2 Brho 2.Both 25.6 221
2 Brho 3.None 30.8 340
2 Lomu 1.Self 42.6 369
2 Lomu 2.Both 19.9 226
2 Lomu 3.None 26.5 268
4 Brho 1.Self 36.7 413
4 Brho 2.Both 32.7 317
4 Brho 3.None 25.5 279
4 Lomu 1.Self 39.8 365
4 Lomu 2.Both 48.8 453
4 Lomu 3.None 63.7 649
5 Brho 1.Self 29 346
5 Brho 2.Both 23.4 287
5 Lomu 1.Self 25.9 240
5 Lomu 2.Both 20.5 205
5 Lomu 3.None 43 475
7 Brho 1.Self 15.2 136
7 Brho 2.Both 12.4 132
7 Brho 3.None 27 427
7 Lomu 1.Self 18.8 254
7 Lomu 2.Both 16.4 181
7 Lomu 3.None 26.9 425
8 Brho 1.Self 16.8 212
8 Brho 2.Both 12.3 109
8 Lomu 1.Self 13.8 252
8 Lomu 2.Both 14.8 207
8 Lomu 3.None 16 347
10 Brho 1.Self 15.9 205
10 Brho 2.Both 18.3 243
10 Brho 3.None 17.8 313
10 Lomu 1.Self 11.1 166
10 Lomu 2.Both 10.5 143
10 Lomu 3.None 21.6 410
11 Brho 1.Self 20.4 242
11 Brho 2.Both 12.2 129
11 Brho 3.None 24.5 345
11 Lomu 1.Self 12.3 186
11 Lomu 2.Both 14.2 177
11 Lomu 3.None 49.8 563
13 Brho 1.Self 35.9 401
13 Brho 2.Both 17.6 135
13 Brho 3.None 29.4 394
13 Lomu 1.Self 26 229
13 Lomu 2.Both 25.1 320
13 Lomu 3.None 60.8 508
14 Brho 1.Self 25.7 326
14 Brho 2.Both 17.9 162
14 Lomu 1.Self 15.5 190
14 Lomu 2.Both 26.7 222
14 Lomu 3.None 54.3 540
")
lb.compDat$block <- factor(lb.compDat$block)
# Impute data for Brho in blocks 5-3.None, 8-3.None and 14-3.None
library(nlme)
library(car)
lb.imp.m1 <- lme(fixed = mass.mg ~ species * competitor,
random = ~ 1 | block,
data = lb.compDat)
lb.imp.m2 <- lme(fixed = log(ht.cm) ~ species * competitor,
random = ~ 1 | block,
data = lb.compDat)
newdata <- data.frame(block = factor(c(5, 8, 14)),
species = "Brho",
competitor = "3.None")
set.seed(890)
newdata$mass.mg <- round(predict(lb.imp.m1, newdata = newdata) +
rnorm(dim(newdata)[1], mean = 0, sd = lb.imp.m1$sigma), 0)
newdata$ht.cm <- round(exp(predict(lb.imp.m2, newdata = newdata) +
rnorm(dim(newdata)[1], mean = 0, sd = lb.imp.m2$sigma)), 1)
lb.compDat <- rbind(lb.compDat, newdata)
lb.compDat <- lb.compDat[
with(lb.compDat,
order(as.numeric(as.character(block)),
species,
competitor)),
]
```
```{r LomuBrhoCompDat, message=FALSE, warning=FALSE, paged.print=TRUE, echo=FALSE}
list(lb.compDat[1:30, c(1:3,5)], lb.compDat[31:60, c(1:3,5)]) %>%
kable(caption = "Data from experiment on competition between ryegrass and softchess (Laca, unpublished data)",
row.names = FALSE,
digits = c(0, 0, 0, 1, 1)) %>%
kable_styling(font_size = 10,
bootstrap_options = c("condensed"))
```
First we analyze the data using high-level R functions for linear models. Then, we will recreate the calculations "by hand" using basic functions and applying the formulas directly derived from the model. Prior to looking at results we check for homogeneity of residuals and normality using graphics. Residuals should be distributed symmetrically about zero, and should not exhibit any patterns when plotted against the predicted values for each observation. The quantile-quantile scatterplot should be closely grouped around a straight line.
<br>
```{r resdlLomuBrhoComp, message=FALSE, warning=FALSE, fig.cap="Residual plots for factorial of target and competitor in RCBD.", out.width='50%', fig.show='hold', fig.align='center'}
lb.m1 <- lm(mass.mg ~ block + species * competitor, data = lb.compDat)
plot(residuals(lb.m1) ~ fitted(lb.m1),
xlab = "Fitted values (Y hats)",
ylab = "Residuals (e's)",
pch = 16,
col = "blue")
abline(h = 0, lty = 2, col = "red")
qqp(residuals(lb.m1), pch = 16, col = "blue")
```
<br>
Observations 12 and 15 are a little out of the assumed distribution (Figure \@ref(fig:resdlLomuBrhoComp)) because they have extremely large residuals compared to the rest of the observations. In a real analysis, we would explore the reasons why those observations have large residuals by double checking data entry and field notes, as well as integrity of lab samples. We note but ignore the outliers in the rest of the explanation, as the extreme residuals do not change the way calculations are done.
<br>
```{r aovLomuBrhoComp, message=FALSE, warning=FALSE, echo=FALSE}
kable(anova(lb.m1),
caption = "Analysis of variance table for Lomu vs. Brho competition experiment.",
digits = c(0, 0, 0, 2, 5)) %>% kable_styling(full_width = FALSE)
```
<br>
All effects in the model were significant. The line items in the ANOVA table (Table \@ref(tab:aovLomuBrhoComp)) correspond to specific hypotheses, and the fact that the probabilities, listed in the last column, are all smaller than 0.05 means that all null hypotheses are rejected. In general, hypotheses tested are that effects are all zero, but the origin of the hypotheses is a set of questions about intra and interspecific competition. Our main question whether intra and interspecific competition is the same in both species. This is convenient because it focuses first on the interaction. In general, the interactions should be tested first in factorials. If they are significant, then specific comparisons of cell means should be performed and interpreted, because marginal means may no be interpretable (see [Section about Contrasts](#Contrasts)).
**Interaction between species and competitor:** Is the effect of competition the same in both species? If competitive effects are equal, the **effect of adding competitors** will be the same in all cases. This effect is the difference between the control with no competition and each of the treatments with competitors, within species. To say that competitive strengths differs between species is the same as stating that the effects of competition (factor B) differ between levels of species (factor A), which is the definition of interaction. So our first interesting hypothesis refers directly to the interaction.
<br>
$$\text{ Null hypothesis: competition effects are the same for Lomu and Brho } \\[15pt]
H_0: \alpha\beta_{ij} = 0 \quad \text{for all pairs } \ i,j \\[15pt]
\text{which is equivalent to: } \\[15pt]
\begin{equation}
H_0: \quad
\begin{cases}
\mu_{LomuNone} - \mu_{LomuSelf} = \mu_{BrhoNone} - \mu_{BrhoSelf} & \quad \text{and} \\
\mu_{LomuNone} - \mu_{LomuBoth} = \mu_{BrhoNone} - \mu_{BrhoBoth} & \quad \text{and} \\
\mu_{LomuBoth} - \mu_{LomuSelf} = \mu_{BrhoBoth} - \mu_{BrhoSelf}
\end{cases} \\[45pt]
\text{Alternative hypothesis: at least one of the three pairs of effects is not equal} \\[15pt]
\end{equation}$$
<br>
The hypothesis about interactions is tested in the line of the ANOVA labeled `species:competitor`. The calculated F value is `r round(anova(lb.m1)$'F value'[4], 2)`, which is rather large. The probability of observing a calculated F of such value or larger when $H_0$ is true (called "p-value") is `r round(anova(lb.m1)$'Pr(>F)'[4], 4)`, which is smaller than the $\alpha = 0.05$ required for rejecting the null hypothesis. Thus, we reject $H_0$ and accept $H_a$, which means that competition effects differ between the species. Because the interaction is "significant," we can proceed to more detailed comparisons of cell means to answer the question more specifically: which species is the superior competitor, and how does competition against both competitors differ from competing against self? Those specific hypotheses are addressed below (see [Section about Contrasts](#Contrasts)).
**Marginal competition effect:** is there an effect of competition on mature mass? We pose the null hypothesis that leads to a strong inference by its rejection: on average over species, competition does not affect mature size. If there were no competition, then plant mass at maturity for each species should not be affected by the competitor treatment. Lomu and Brho may differ in mean size, but their size would not be affected by the competitor. Keep in mind that a significant interaction was detected, so the interpretation of marginal results may be misleading and have to be considered with caution. The line corresponding to `competitor` in the ANOVA table (Table \@ref(tab:aovLomuBrhoComp)) shows a p-value of `r round(anova(lb.m1)$'Pr(>F)'[3], 4)`, which is significant. The null hypothesis that competition has no effect, or that mature size is the same, regardless of competition, is rejected.
<br>
$$\text{ Null hypothesis: there is no competition effect on mature size } \\[15pt]
H_0: \mu_{.None} = \mu_{.Self} = \mu_{.Both} \\[15pt]
\text{Alternative hypothesis: at least one of the three pairs is not equal} \\[15pt]
\begin{equation}
H_a: \quad
\begin{cases}
\mu_{.None} \neq \mu_{.Self} & \quad \text{or} \\
\mu_{.None} \neq \mu_{.Both} & \quad \text{or} \\
\mu_{.Self} \neq \mu_{.Both}
\end{cases}
\end{equation}$$
<br>
Note that the hypotheses were posed as means or effects being equal or not equal. It would be equivalent to state that the differences between the terms on each side of the "=" sign are zero. For example:
<br>
$$H_0: \mu_{.None} = \mu_{.Self} = \mu_{.Both} \\[15pt]$$
is equivalent to
<br>
$$H_0:\mu_{.None} - \mu_{.Self} = 0 \quad \text{AND} \quad \mu_{.None} - \mu_{.Both} = 0 \quad \text{AND} \quad \mu_{.Self} - \mu_{.Both} = 0 $$
<br>
**Marginal species effect:** do species differ in mature size, averaged over competition treatments? Because of the significant interaction detected, the same note about caution applies here. The interaction means that the difference in mature size between species depends on competition. Ignoring the interaction for now, we test the marginal effect of species. Based on the line for `species` in the ANOVA table, we reject the null hypothesis that mature plant mass is the same for both species.
<br>
$$\text{ Null hypothesis: there is no species effect on mature size } \\[15pt]
H_0: \mu_{Lomu.} = \mu_{Brho.} \\[15pt]
\text{Alternative hypothesis: mature size differs between species} \\[15pt]
\begin{equation}
H_a: \quad
\mu_{Lomu.} \neq \mu_{Brho.}
\end{equation}$$
<br>
Results can be presented using a table or a figure. A figure facilitates the interpretation of interactions. A table also includes standard errors and marginal means, which is more complete but harder to interpret quickly. Both are presented below for illustration.
<br>
```{r LomuBrhoCompFig1, message=FALSE, warning=FALSE, paged.print=FALSE, out.width = '60%', fig.align='center', echo=FALSE, fig.cap ="Effects of species and competitor on mature plant size. Vertical bars are 95% confidence intervals. Plant mass is in transformed units, but values are commensurate with mass in mg."}
LomuBrhoMeans <- as.data.frame(emmeans(lb.m1, c("competitor", "species")))
ggplot(data = LomuBrhoMeans, aes(x = competitor, y = emmean, group = species)) +
geom_line(size = 1.5,
aes(linetype = species,
color = species)) +
geom_point(size = 4,
aes(color = species),
position = position_dodge(0.05)) +
geom_errorbar(aes(ymin = lower.CL,
ymax = upper.CL,
color = species),
width = 0.2,
position = position_dodge(0.05)) +
labs(x = "Competitor", y = "Mature plant mass") +
theme(legend.position = c(0.7, 0.2))
```
<br>
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
cmm <- as.data.frame(emmeans(lb.m1, "competitor"))
smm <- as.data.frame(emmeans(lb.m1, "species"))
cspmm <- as.data.frame(emmeans(lb.m1, "species", "competitor"))
```
<br>
Table: (\#tab:SppCompTable) Estimated marginal and cell means of mature plant mass for competition experiment. Mature plant mass is in transformed units ^[$Y = 1000 \ \frac{[4 + \log(PlantMass.g + 0.03)]} {6}$]. Standard errors of cell means, marginal competitor and marginal species means are `r round(cspmm[1, 4], 2)`, `r round(cmm[1, 3], 2)` and `r round(smm[1, 3], 2)`.
| | **Competitor** | | | |
|---------------:|:-------------------------:|:-------------------------:|----------------------------|-------------------------|
|**Species** | **Self** | **Both** | **None** | **Spp means** |
| Brho | `r round(cspmm[1, 3], 1)` | `r round(cspmm[3, 3], 1)` | `r round(cspmm[5, 3], 1)` | `r round(smm[1, 2], 1)` |
| Lomu | `r round(cspmm[2, 3], 1)` | `r round(cspmm[4, 3], 1)` | `r round(cspmm[6, 3], 1)` | `r round(smm[2, 2], 1)` |
|**Comp. means** | `r round(cmm[1, 2], 1)` | `r round(cmm[2, 2], 1)` | `r round(cmm[3, 2], 1)` | |
<br>
**R Code Summary**
The analysis of the factorial experiment, without the code for making the pretty figures and table was very simple. It only required 5 short lines of code:
```{r FactorialCode1, message=FALSE, warning=FALSE}
# Fit model to data. lb.compDat is the data frame with data
lb.m1 <- lm(mass.mg ~ block + species * competitor, data = lb.compDat)
anova(lb.m1) # Perform ANOVA based on model
plot(residuals(lb.m1) ~ fitted(lb.m1)) # Check homogeneity of variance
qqp(residuals(lb.m1)) # Check normality
# Get marginal means, S.E. and CI's for competitor
emmeans(lb.m1, "competitor")
# Get marginal means, S.E. and CI's for species
emmeans(lb.m1, "species")
# Get cell means, S.E. and CI's for treatments
emmeans(lb.m1, "species", "competitor")
```
#### Detailed Calculations for Factorial in RCBD
In this section we show the basic calculations to obtain parameter estimates and complete the ANOVA table for a factorial in a Randomized Complete Block Design. First, we obtain the parameter estimates by using overall and marginal averages. Then we partition each observation using the parameters and calculate the corresponding sums of squares. Finally, degrees of freedom are computed and the ANOVA table is constructed.
Treatment averages:
```{r message=FALSE, warning=FALSE, include=FALSE}
brho <- which(lb.compDat$species == "Brho")
self <- which(lb.compDat$competitor == "1.Self")
brho.self <- intersect(brho, self)
lb.compDat[brho.self, "mass.mg"]
paste(as.character(lb.compDat[brho.self, "mass.mg"]), collapse = " + ")
mean(lb.compDat[brho.self, "mass.mg"])
```
<br>
$$\hat{\mu}_{ij.} = \bar{Y}_{ij.} = \frac{1}{r}\sum^{h=r}_{h=1}Y{ijh} \\[25pt]
\text{For example: } \\[25pt]
\hat{\mu}_{Brho,1.Self} = \frac{203 + 268 + 413 + 346 + 136 + 212 + 205 + 242 + 401 + 326}{10} \\[25pt]
= `r mean(lb.compDat[brho.self, "mass.mg"])`$$
<br>
We can calculate treatment or cell averages using the `aggregate` function in R. Averages are merged with the data set so we can calculate treatment effects and later partition them into main effects and interactions.
```{block, type='rtip'}
Writing code in R can be greatly facilitated by carefully planning the names of objects. Note how the names below are organized such that it is easy to modify the code for species to get the code for competitors. The first chunk for cell averages was copied and quickly edited to get the chunks for marginal and block averages.
```
```{r}
(cell.avgs <- aggregate(mass.mg ~ species + competitor,
data = lb.compDat,
FUN = mean))
names(cell.avgs)[3] <- "cell.avg"
lb.compDat <- merge(lb.compDat,
cell.avgs,
all = TRUE)
```
Marginal averages are calculated as the averages of the corresponding cell averages.
$$\hat{\mu}_{i..} = \bar{Y}_{i..} = \frac{1}{k_b}\sum^{j=k_b}_{j=1}\bar{Y}_{ij.} = \frac{1}{k_b}\sum^{j=k_b}_{j=1}\hat{\mu}_{ij.} \\[25pt]
\text{where } k_b = 3 \text{ is the number of levels of factor B (competitor). For example: } \\[25pt]
\hat{\mu}_{Brho} = \frac{275.2 + 186.8 + 342.9}{3} = `r mean(cell.avgs[cell.avgs$species == "Brho", "cell.avg"])`$$
<br>
```{r}
(s.avgs <- aggregate(cell.avg ~ species, # get species averages
data = cell.avgs,
FUN = mean))
names(s.avgs)[2] <- "sp.avg" # change column name to facilitate merge
(comp.avgs <- aggregate(cell.avg ~ competitor, # get competitor averages
data = cell.avgs,
FUN = mean))
names(comp.avgs)[2] <- "comp.avg" # change column name to facilitate merge
```
<br>
$$\hat{\mu}_{.j.} = \bar{Y}_{.j.} = \frac{1}{k_a}\sum^{i=k_a}_{i=1}\bar{Y}_{ij.} = \frac{1}{k_a}\sum^{i=k_a}_{i=1}\hat{\mu}_{ij.} \\[25pt]
\text{where } k_a = 2 \text{ is the number of levels of factor A (target species). For example: } \\[25pt]
\hat{\mu}_{Self} = \frac{275.2 + 245.6}{2} = `r mean(comp.avgs[comp.avgs$competitor == "1.Self", "comp.avg"])`\\[30pt]$$
We follow an analogous procedure to estimate overall^[In this experiment, the overall average is the same as the estimated overall mean because the data are balanced (all treatments appear once in each block). When the number of observations differs among treatments, the estimated overall mean is not the overall average, but the average of all estimated treatment means.] and block means with block averages. As an exercise, write down the equations to calculate block averages for this data set before read any further. The equations are shown further down (Equation \@ref(eq:FactBlockAvg)).
```{r}
# merge data set with marginal species and competitor averages
lb.compDat <- merge(lb.compDat,
s.avgs,
all = TRUE)
lb.compDat <- merge(lb.compDat,
comp.avgs,
all = TRUE)
# calculate block averages and merge with data
b.avgs <- aggregate(mass.mg ~ block,
data = lb.compDat,
FUN = mean)
names(b.avgs)[2] <- "block.avg" # change column name to facilitate merge
lb.compDat <- merge(lb.compDat,
b.avgs,
all = TRUE)
print(lb.compDat[ , -4], digits = 5) # data with averages
```
<br>
\begin{equation}
\hat{\mu}_{..h} = \bar{Y}_{..h} = \frac{1}{k_a k_b} \sum^{i=k_a}_{i=1}\sum^{j=k_b}_{j=1}Y_{ijh}
(\#eq:FactBlockAvg)
\end{equation}
<br>
\begin{equation}
\hat{\mu}_{...} = \frac{1}{k_a k_b} \sum^{i=k_a}_{i=1}\sum^{j=k_b}_{j=1}\hat{\mu}_{ij.}
(\#eq:FactAllAvg)
\end{equation}
<br>
**Sums of Squares** Now that we have all averages we can calculate sums of squares. Sums of squares (SS) correspond to the effects in the model. Sums of squares are
- TSS or total sum of squares that is the sum of squares of total differences between observed value ($Y_{ijh}$) and overall average ($\hat{\mu}_{...}$).
- SST or treatment sum of squares is the sum of squared differences between treatment (cell) averages$\hat{\mu}_{ij.}$ and overall average ($\hat{\mu}_{...}$).
- SSB or block sum of squares is the sum of squared differences between block average ($\hat{\mu}_{..h}$) and overall average ($\hat{\mu}_{...}$).
- SSE or residual (error) sum of squares of differences between observed value ($Y_{ijh}$) and predicted value ($\hat{Y}_{ijh}$), where predicted value is the sum of overall average and all effects.
Because the treatments have a factorial structure and each treatment effect $\tau_{ij}$ is partitioned into its components $\alpha_i$, $\beta_j$ and $\alpha\beta_{ij}$.Accordingly, SST is partitioned into SSA, SSB and SSAxB.
*Total sum of squares*
We use R to implement the equations that define each of the effects and corresponding sum of squares. First we calculate the overall average and use it to compute TSS. In the code below `all.avg` = $\mu_{...}$ and the column `lb.compDat$mass.mg` is the set of all $Y_{ijh}$. Thus, `sum((lb.compDat\$mass.mg - all.avg) ^ 2)`= $\sum_{i=1}^{k_a}\sum_{j=1}^{k_b}\sum_{h=1}^{r}(Y_{ijh} - \hat{\mu}_{...})^2$
```{r}
(all.avg <- mean(cell.avgs$cell.avg)) # get overall average
# Total sum of squares
(TSS <- sum((lb.compDat$mass.mg - all.avg) ^ 2))
```
*Sum of squares of Blocks*
Now we compute SSB. In the code below, `lb.compDat\$block.avg` is $\hat{\mu}_{..h}$, thus, `sum((lb.compDat\$block.avg - all.avg) ^ 2)` = $\sum_{i=1}^{k_a}\sum_{j=1}^{k_b}\sum_{h=1}^{r}(\hat{\mu}_{..h} - \hat{\mu}_{...})^2 = k_a k_b \sum_{h=1}^{r}(\hat{\mu}_{..h} - \hat{\mu}_{...})^2$
```{r}
# Sum of squares of blocks
(SSB <- sum((lb.compDat$block.avg - all.avg) ^ 2))
```
*Sum of squares of Treatments*
For the SST we use the column `lb.compDat$cell.avg`, which contains the estimated mean for each combination of target species and competitor assigned to each observation.
```{block, type='stattip'}
Keep in mind that each observation is partitioned into an estimate of the overall mean plus all effects: block, factor A, factor B, interactions AxB and residual. All sums for SS are over all observations, although some estimates are repeated. For example, the estimated effect of Brho is the same for all plots where Brho was the target species.
```
### Advantages of factorials
Consider that you have to study the effects of irrigation and fertilization with N on corn yield. You could take a simple approach and do one experiment for irrigation and another for nitrogen, or you could study both factor at the same time in a factorial experiment. Note that "factorial" refers to the way treatments are organized and it says nothing about the experimental design in terms of how treatments are assigned to experimental units.
Here we demonstrate two main advantages of factorials:
- ability to detect interactions between factors, and
- much greater efficiency than separate experiments.
To show the advantage of a factorial, we first simulate two separate experiments and then a single combined experiment with both factors. By comparing the results we will be able to see that the factorial approach has advantages.
<br>
#### Effects of Irrigation alone
Using the simple approach, the experiment for irrigation includes two treatments: one or two irrigations. For this experiment, a moderate level of nitrogen of 160 lb/ac is selected. Treatments are repeated three times in each of two blocks (Table \@ref(tab:tblIrrigDat)). In this case we include replicates of treatments within blocks to have a total number of observations that will be comparable to case when we test both factors at once. The original data set has 20 observations, so we will simulate 12 observations ^[We use 12 instead of 10 for simplicity, to have the same number of replicates equal in all treatments. This is not absolutely necessary, but different sample sizes per treatment makes things a bit more complicated.] for the irrigation experiment and 10 observations for the nitrogen experiment.
<br>
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE}
# Wheat yield data from page 137 of SG book
# IxN <- read.csv("IrrigationNitroFactorial.csv", header = TRUE)
IxN <- read.table(header = TRUE, text = "
Block Irrigation Nitrogen Yield
1 1 0 31.7
1 2 160 68.5
1 1 160 56.8
1 1 80 50
1 1 240 57.3
1 2 0 38.9
1 2 240 73.5
1 2 80 61.4
1 2 320 70.6
1 1 320 53.1
2 2 240 72.9
2 2 160 66.7
2 1 240 59.4
2 2 0 40
2 1 160 60.6
2 1 80 48.6
2 2 320 72.3
2 1 320 54.1
2 2 80 53.1
2 1 0 39.1
")
IxN$Block <- factor(IxN$Block)
IxN$Irrigation <- factor(IxN$Irrigation)
IxN$Nitrogen <- factor(IxN$Nitrogen)
IxN <- IxN[order(IxN$Block, IxN$Irrigation, IxN$Nitrogen), ]
# Get estimated parameters and use them to generate simulated data for single experiments.
set.seed(123)
ixn.m1 <- lm(Yield ~ Irrigation * Nitrogen + Block, data = IxN)
IrrigDat <- expand.grid(Block = levels(IxN$Block),
Irrigation = factor(c(1,2)),
Nitrogen = factor(160),
rep = c(1:3))
IrrigDat$Yield <- predict(object = ixn.m1,
newdata = IrrigDat) +
rnorm(dim(IrrigDat)[1], mean = 0, sd = summary(ixn.m1)$sigma)
```
```{r tblIrrigDat, echo= FALSE}
kable(IrrigDat,
caption = "Simulated data for a simple experiment to test the effect of irrigation on wheat yield.") %>% kable_styling(full_width = FALSE)
```
<br>
We analyze these data using the following model (Equation \@ref(eq:irrigMod1)), which is known to have the correct form because the data were simulated based on the same model. Note that the model does not mention the effect of nitrogen because the same dose of nitrogen was applied to all plots.
<br>
\begin{equation}
Y_{ijh} = \mu_{...} + \tau_i + B_j + \epsilon_{ijh} \\[20pt]
\tau_i = \text{ effect of irrigation treatment with level i} \\
B_j = \text{ effect of block j} \\
\epsilon_{ijh} = \text{ random error in treatment i, block j and replicate h}
(\#eq:irrigMod1)
\end{equation}
<br>
```{r IrrigAnova, message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE}
irrig.m1 <- lm(Yield ~ Irrigation + Block, data = IrrigDat)
kable(anova(irrig.m1),
caption = "ANOVA table for irrigation effects on yield.",
digits = c(0, 1, 1, 2, 5)) %>%
kable_styling(full_width = FALSE)
```
<br>
```{r IrrigEmmeans, message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE}
kable(emmeans(irrig.m1, "Irrigation"),
caption = "Effect of irrigation on wheat yield",
digits = c(0, 1, 2, 0, 2, 2)) %>%
kable_styling(full_width = FALSE)
```
<br>
The ANOVA table shows that there is a significant difference in yield between treatments. Table \@ref(tab:IrrigEmmeans) contains the estimated means for both irrigations and shows that yield is significantly greater with two than with a single irrigation. The precision of the estimates is reflected in both the MSE in the ANOVA table and in the width of the confidence intervals of the table of means, which is about `r round(2 * qt(p = 0.975, df = irrig.m1$df.residual) * sqrt(anova(irrig.m1)[3, 3] / 6), 2)` bushels/ac (1 bushel of wheat is equivalent to 60 lb). The width of the confidence interval is calculated as twice the product of the critical t value, which depends on the degrees of freedom of the residuals (dfe = 9), and the standard error, which is the square root of the MSE divided by the square root of the number of experimental units averaged to get each estimated mean. There were 12 experimental units equally divided between the two treatments in the experiment, so six experimental units were used for the average of each treatment.
```{r message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE, include=FALSE}
tcrit <- qt(p = 0.975, df = irrig.m1$df.residual)
MSE <- anova(irrig.m1)[3, 3]
```
\begin{equation}
\text{CI width } = 2 \ t_{dfe, \alpha} \ \sqrt{\frac{MSE}{r}} \\[20pt]
\text{where the number of plots averaged per treatment is } r = 6 \\[20pt]
dfe = `r irrig.m1$df.residual` \\[20pt]
t_{dfe, \alpha} = `r round(tcrit, 4)` \\[20pt]
\text{therefore, CI width } = 2 \times `r round(tcrit, 4)` \times `r round(sqrt(MSE / 6), 3)` = `r round(2 * tcrit * sqrt(MSE / 6), 3)`
\end{equation}
<br>
#### Effects of Nitrogen alone
Following with the simple approach of one factor at a time, we do a second experiment to determine the effect of nitrogen fertilization on wheat yield. Five equally-spaced doses of nitrogen (0, 80, 160, 240 and 320 lb/ac) are selected. For this experiment, one irrigation is used for all plots. The five treatments are replicated in each of two blocks, such that each treatment appears in a single experimental unit per block.
```{r message=FALSE, warning=FALSE, paged.print=FALSE, include=FALSE}
NitroDat <- expand.grid(Block = levels(IxN$Block),
Irrigation = factor(1),
Nitrogen = factor(c(0, 80, 160, 240, 320)))
set.seed(123)
NitroDat$Yield <- predict(object = ixn.m1,
newdata = NitroDat) +
rnorm(dim(NitroDat)[1], mean = 0, sd = summary(ixn.m1)$sigma)
```
```{r tblNitroDat, echo= FALSE}
kable(NitroDat,
caption = "Simulated data for a simple experiment to test the effect of nitrogen on wheat yield.") %>% kable_styling(full_width = FALSE)
```
<br>
We analyze these data using the following model (Equation \@ref(eq:NitroMod1)), which is known to have the correct form because the data were simulated based on the same model. Note that the model does not mention the effect of irrigation because the same single irrigation was applied to all plots.
<br>
<<<<<<< HEAD
\begin{equation}
\begin{align}
Y_{ij} &= \mu_{...} + \tau_i + B_j + \epsilon_{ij} \\[20pt]
\tau_i &= \text{ effect of nitrogen treatment with level i} \\
B_j &= \text{ effect of block j} \\
\epsilon_{ij} &= \text{ random error in treatment i, block j}
\end{align}
=======
$$\begin{equation}
Y_{ij} = \mu_{...} + \tau_i + B_j + \epsilon_{ij} \\[20pt]
\tau_i = \text{ effect of nitrogen treatment with level i} \\
B_j = \text{ effect of block j} \\
\epsilon_{ij} = \text{ random error in treatment i, block j}
>>>>>>> 331a8e41fcb6674025911607ffa83fa8f088ca8d
(\#eq:NitroMod1)
\end{equation}$$
<br>
```{r NitroAnova, message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE}
nitro.m1 <- lm(Yield ~ Nitrogen + Block, data = NitroDat)
kable(anova(nitro.m1),
caption = "ANOVA table for nitrogen effects on yield.",
digits = c(0, 1, 1, 2, 5)) %>%
kable_styling(full_width = FALSE)
```
<br>
```{r NitroEmmeans, message=FALSE, warning=FALSE, paged.print=FALSE, echo=FALSE}
kable(emmeans(nitro.m1, "Nitrogen"),
caption = "Effect of nitrogen on wheat yield",
digits = c(0, 1, 2, 0, 2, 2)) %>%
kable_styling(full_width = FALSE)
```
<br>