-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathPS_Lab01_F2018.Rmd
200 lines (97 loc) · 6.44 KB
/
PS_Lab01_F2018.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: "Lab01: Data Exploration and Summaries"
author: "YourFirstName YourLastName"
date: "today's date here"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### Plant sciences Lab
### Instructions
For this lab you will modify this file and submit this file with the file name changed so it has your email ID (the part before @) in lower case instead of "email." Do not add spaces to the file name.
This is a markdown document. You will type your code and run it one line at a time as you add it in the lines indicated below. Add code **ONLY** in the areas between `{r}` and ` ``` `. These areas are highlighted with a light grey color. Run each line and parts to learn and experiment until you get the result you want. Keep the lines that worked and move on. At any time you can see if your document "knits" or not by clicking on the Knit HTML icon at the top. Once you have completed all work, knit your document and save the html file produced with the same file name but with an html extension (Lab01email.html).
**Submit BOTH files for your lab report using the appropriate Canvas tool**
For each part and question below, type your code in the grey area below, between the sets of back-ticks (```) to perform the desired computation and get output. Type your answers below the corresponding grey area. A completed example follows.
### Example Question with Answer
Create a vector with 5 random numbers from the standard normal distribution. Call it "x." Calculate the average and variance of the values in x. Create a second 5-number vector using the same procedure. Call it "y." Calculate the sum of the elements in y. Make a nicely formatted table showing x and y as columns. Calculate x + y and then sum(x) + sum(y). Discuss the difference between the last two calculations.
```{r}
library(pander) # to do nice tables
x = rnorm(5)
mean(x)
var(x)
y = rnorm(5)
sum(y)
pander(data.frame(x = x, y = y))
x + y
sum(x) + sum(y)
```
The x + y and sum(x) + sum(y) are totally different. The former results in a vector of five numbers, each number being the sum of the corresponding elements in x and y. The latter results in a single number that is the sum of the 5 numbers in x plus the 5 numbers in y. (Full credit!)
### Part 1. Data input [10 points]
Load the iris data that comes with R into a data frame called myiris, and get its structure. Use help() to learn what the iris data are about. When were the data first published? Hint: you need help. [5 points]
```{r iris.info, echo = TRUE, include = TRUE}
myiris <- iris # put iris data into dataframe object or container called myiris
str(myiris)
## add code to ask for information about dataframe object iris
```
When were the data first published? Your answer here:.
### Part 2. Coefficient of variation [15 points]
Do a web search and explain the use of the coefficient of variation and one of its main disadvantages. Hint: what happens when your mean is zero or close to zero? [15 points]
Your answer here:
### Part 3. Five number summary table of iris data [15 points]
Make a data frame and a nicely formatted table containing the following statistics: mean, median, range, minimum, maximum, standard deviation, coefficient of variation and sample size for each of the measurements (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) in myiris data. What variable has the most variation relative to the average? [15 points]
```{r summary.table1, echo=FALSE}
mymeans <- sapply(X = myiris[ , 1:4], FUN = mean) # put means in a column
mymedians <- sapply(X = myiris[, 1:4], FUN = median)
mymaxs <- sapply(X = myiris[, 1:4], FUN = max)
# add formula to get the column with minima here
mymins <-
myranges <- mymaxs - mymins
# add formula to get standard deviation here
mysds <-
mycvs <- mysds / mymeans # calculate coeff. of variation
myNs <- sapply(X = myiris[, 1:4], FUN = length) # calculate sample size
# Put all columns together into a data frame object called "Table1"
Table1 <- data.frame(Mean = mymeans,
Median = mymedians,
Min = mymins,
Max = mymaxs,
Range = myranges,
SD = mysds,
CV = mycvs,
N = myNs)
pander(Table1) # use pander to create a nicely formatted table.
```
What variable has the most variation relative to the average? Your answer here:
### Part 4. Frequency table and histogram [15 points]
Make and print a nicely formatted frequency table for the sepal length of the flowers. Calculate the number of classes (bins) using the formula in the LabO1 instructive. Make a histogram using your calculations "by-hand" and then a new histogram using the hist() function. [15 points]
```{r iris.hist1, echo = TRUE, include = TRUE}
(sample.size <- length(myiris$Sepal.Length))
(nbins <- max(8, (1 + log(sample.size, 2))))
myiris$bin <- cut(myiris$Sepal.Length, breaks = nbins) # create column with bin
(freq.table <- as.data.frame(xtabs( ~myiris$bin))) # creates frequency table
pander(freq.table) # format the table a little better
plot(freq.table) # make histogram "by hand"
hist(myiris$Sepal.Length) # by default it uses Sturges rule for bins.
```
### Part 5. Box and whisker plot [15 points]
Create two box and whisker plots for the sepal length. Label at least three of the elements in the plot. Create a new box and whisker plot with a *range* argument value of 0.5. [15 points]
```{r iris.box1, echo = TRUE, include = TRUE}
boxplot(myiris$Sepal.Length) # enter the data frame name, $ and the variable name to designate the data used by boxplot
text(x = 0.6, y = 5.8, labels = "Median")
# Add two more labels at the correct locations by completing the code.
# Use the x and y arguments to specify where the labels go.
text(x = , y = , labels = )
text(x = , y = , labels = )
```
### Part 6. fivenum() function [10 points]
What does the fivenum() function do? [5 points]
Your answer here:
### Part 7. Making a vector [10 points]
What function allows you to **combine** numbers to make numeric vectors? Use the function to make a vector with the numbers 1, 3, 4, 9 and call it "sa.vector." [5 points]
```{r iris.c1, echo = TRUE, include = TRUE}
```
What function allows you to **combine** numbers to make numeric vectors? Your answer here:
### Part 8. Knitting [10 points]
Knit this file into html. [10 points]
<!-- ##----------------END PLANT SCIENCES LAB-----------------### -->