diff --git a/AS_Markdown_HW1.Rmd b/AS_Markdown_HW1.Rmd new file mode 100644 index 0000000..6702e44 --- /dev/null +++ b/AS_Markdown_HW1.Rmd @@ -0,0 +1,194 @@ +--- +title: "ExtraPractice2" +author: "W. Evan Johnson (recreated by Arman Sawhney)" +date: "2024-05-08" +output a: + html_document: + code_folding: none + toc: true + toc_float: true + theme: "flatly" +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +## Introduction and Goals + +This Extra Practice exercise is to verify that you can recreate and run R markdown scripts and properly knit them. Your goal will be to recreate the .Rmd code for this document, Special thanks for to the code by [Augie Wifler](https://rpubs.com/augie-wifler/993548) from which some of this code was obtained and modified. + +## R Markdown Basics {.tabset} + +This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . A handy cheatsheet can also be found here . + +When you click the Knit button at the top of the Rstudio interface, a document will be generated that includes text (i.e., white sections of the R script), R code (gray-green sections of the script), and output from running your code. This combination provides maximum flexibility to explain what you’re doing using text chunks, to show how you did it using R code chunks, to share results of those analyses by printing the R output, and to interpret those results using additional text. + +### The Header + +At the top of your `.Rmd` you see: +```{r header, eval=FALSE} +--- +title: "Nanostring Analysis" +author: "Evan Johnson" +date: "12/5/2019" +output: + html_document: + code_folding: hide + toc: true + toc_float: true + theme: "flatly" +editor_options: + chunk_output_type: console +--- +``` + +### Code chunks + +R code can be inserted in gray sections as follows (see below). You can insert one anywhere by directly coding them into your document, by clicking the insert tab up above, or by using the hot-key combination of **Ctrl-Alt-i:** + +```{r chunks, eval=FALSE} +summary(pressure) +``` + +You can run each code chunk by clicking the green arrow in the upper right hand corner of the gray box. + +### Code chunk options (including global options) + +You can name each code chunk by adding a short description after the r. For example, an `.Rmd` file usually starts with a chunk named `{r setup}`, but each name must be unique (beware if you are copying and modifying code chunks - which is a clever thing to do - but be sure to give it a new name before you try to knit). + +The first line of code in this chunk sets default printing levels for all future code chunks. `echo=TRUE` means that your code will always be included in your knitted document, along with any output. For homework assignments, you should almost always use `echo = TRUE` so that we can evaluate your code, but if you don’t want to include the code (e.g., lots of ggplot code to generate a plot), you can use `echo = FALSE ` to prevent the code from being printed. + +The second line of code sets the output width to 80, which will fit on most monitors (including laptops). You can change this to a smaller value if you have problems viewing all output, or larger values to prevent line-wrap. + +```{r options, eval=FALSE} + +knitr::opts_chunk$set(echo = TRUE) +options(width = 80) # custom-fit this for your own monitor + +``` + +## Some other formatting tips and tricks {.tabset} + +As you can see above in the Rmd file, I included three # symbols in front of the text. This increases the size of the text when we knit. The fewer # symbols you have, the larger the text. See examples below (text size will only be altered when you knit your final document): + +### Paragraph headings + +```{r headings, eval=FALSE} +# Biggest +## Less Big +### Getting Smaller (I like this one best for sub-headings) +#### Slightly bigger than normal text +``` +**There is little point in putting 5 # symbols in front of your text. It’s barely bigger than the default text size.** + +### Text formatting + +Including an asterisk symbol in front of and behind text will *italicize* it. Using two asterisks in a row will make the text **bold**. + +Before you knit, it is important that **YOUR CODE IS ERROR FREE, COMPLETE, AND IN PROPER ORDER**. If not, you will get error messages when you try to knit that are not easy to interpret. So before you knit an .Rmd document, it’s good to start with a clean global environment and run every code chunk starting from the beginning. If there are no errors, you should be able to knit. + +### More on Markdown + +R markdown is a format for **literate programming** documents. It is based on **markdown**, a markup language that is widely used to generate html pages. You can learn more about markdown here: [click here](https://www.markdowntutorial.com/) + +## R coding basics {.tabset} + +### Load libraries + +Here is a code chunk where I load any R packages that I want to use. `dplyr` is a great package for data manipulation. `ggplot2` is a great package for plotting data. Cheatsheets for `dplyr` and `ggplot2` can be found here: + +`dplyr`: + +`ggplot2`: + +You’ll notice in the Rmd file that I included additional information at the top of the code chunk about warnings and messages. When you load a package, you’ll often have a bunch of messages and warnings pop up (usually relating to the version of R that you’re running). The **message=FALSE** and **warning=FALSE** comments will suppress this information from your knitted document, which will help it look cleaner and more professional. + +```{r libraries, message=FALSE, warning=FALSE} +require(dplyr) +require(ggplot2) +require(cowsay) +``` + +### If R seems scary + +There are many ways to get help! Google is your friend. If you’re having a coding issue, odds are someone else has had that same problem, and before you can fully type in your question you’ll find that Google autofills for you. You’ll also find that there are many ways to do the same thing. If you were in Biometry, you’re probably familar with the R package called `swirl`. This is a nice user friendly package that can teach you some basic R commands and statistical analyses while you use R. +```{r} +# you may edit this message, as needed! +# you may also choose a different animal if you don't like yoda +# try typing "sort(names(animals))" in the console to find other options +say("This is me doing the homework", "stegosaurus") +``` + +### Loading data + +This next code chunk loads data and does some basic summaries. `iris` is a default data set included in the R datasets. The `data()` function is only used if it is from an R package. If you’re using your own dataset, you often need to read in the data using other methods (we will go over this at some future time). The `head()` function shows your first six observations and `tail()` will display your last six observations. The `str()` tells you the type of data for each column (i.e., numeric for the first 4 variables and factor (categorical) for Species); `glimpse()` is a tidyverse version of `str()`. + +These are handy functions to use when you first load data to make sure it was properly imported into R. + +```{r data} +data("iris") +head(iris) +``` + +```{r tail} +tail(iris) +``` +```{r string} +str(iris) +``` + +### Data manipulation with dplyr + +Now we’ll do some simple data manipulation to showcase the dplyr package, creating a new summary dataset called `slBYspecies` to examine differences in sepal length among the three species of iris. In the code chunk below, `%>%` are called pipes, and they are tidyverse shorthand for “do all of these things in sequence”. There are no **NA** observations in the iris data, so the `na.rm = TRUE` statements aren’t needed here, but most biological data have missing values, and this code is needed to prevent errors. + +```{r manipulation} +# summarize sepal length by species +# round sd to 3 significant digits +slBYspecies <- iris %>% + group_by(Species) %>% + summarise(meanSL = mean(Sepal.Length, na.rm = TRUE), + sdSL = round(sd(Sepal.Length, na.rm = TRUE), 3), + maxSL = max(Sepal.Length, na.rm = TRUE), + minSL = min(Sepal.Length, na.rm = TRUE), + cnt=length(Species)) %>% + # calculate the standard error as standard deviation divided by square root of sample size + mutate(seSL = sdSL/sqrt(cnt)) + +# print the manipulated data set +slBYspecies +``` + +### Plotting with ggplot2 + +Here are some basic plots with `ggplot2.` This is mainly to demonstrate how you can have figures embedded within your knitted document. However, it will also give you an intro to using `ggplot2`. + +```{r plotting} +# boxplot of sepal length by species +ggplot(iris, aes(x = Species, y = Sepal.Length)) + + geom_boxplot() + + xlab("Species") + ylab("Sepal Length (cm)") # provide custom axis labels +``` + +```{r manipulation_and_plotting_i_feel_like_a_james_bond_villain} +# scatterplot of sepal length vs. sepal width by species +ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, + group = Species, color = Species, fill = Species)) + + geom_point(stat = "identity") + + geom_smooth(method = "lm") + # use linear model (lm) to provide line of best fit + xlab("Sepal width (cm)") + ylab("Sepal length (cm)") + + theme_classic() +``` + + +## Reproducibility (Session info) + +It can be helpful to end your Markdown file with a record of what versions of R and R packages you are using. This is useful if you return back to your code after a day, a week, a month…or even a year or more later and find that your code doesn’t run properly anymore (perhaps because one or more packages have been modified). To keep a record of this information, you can use the sessionInfo function. + +Once you’ve determined that you can run each code chunk in this file, try knitting the entire document by clicking the Knit icon near the top of the page. A drop down menu will give you the options of knitting to HTML, pdf, and Word. You can try all 3, but HTML is the easiest to work with in terms of formatting and is the preferred format in most cases. + +```{r} +sessionInfo() +``` \ No newline at end of file diff --git a/AS_Markdown_HW1.html b/AS_Markdown_HW1.html new file mode 100644 index 0000000..e5790bc --- /dev/null +++ b/AS_Markdown_HW1.html @@ -0,0 +1,718 @@ + + + + + + + + + + + + + + + +ExtraPractice2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +
+

Introduction and Goals

+

This Extra Practice exercise is to verify that you can recreate and +run R markdown scripts and properly knit them. Your goal will be to +recreate the .Rmd code for this document, Special thanks for to the code +by Augie Wifler from +which some of this code was obtained and modified.

+
+
+

R Markdown Basics

+

This is an R Markdown document. Markdown is a simple formatting +syntax for authoring HTML, PDF, and MS Word documents. For more details +on using R Markdown see http://rmarkdown.rstudio.com. A handy cheatsheet can +also be found here https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf.

+

When you click the Knit button at the top of the Rstudio interface, a +document will be generated that includes text (i.e., white sections of +the R script), R code (gray-green sections of the script), and output +from running your code. This combination provides maximum flexibility to +explain what you’re doing using text chunks, to show how you did it +using R code chunks, to share results of those analyses by printing the +R output, and to interpret those results using additional text.

+
+

The Header

+

At the top of your .Rmd you see:

+
---
+title: "Nanostring Analysis"
+author: "Evan Johnson"
+date: "12/5/2019"
+output:
+  html_document:
+    code_folding: hide
+    toc: true
+    toc_float: true
+    theme: "flatly"
+editor_options: 
+  chunk_output_type: console
+---
+
+
+

Code chunks

+

R code can be inserted in gray sections as follows (see below). You +can insert one anywhere by directly coding them into your document, by +clicking the insert tab up above, or by using the hot-key combination of +Ctrl-Alt-i:

+
summary(pressure)
+

You can run each code chunk by clicking the green arrow in the upper +right hand corner of the gray box.

+
+
+

Code chunk options (including global options)

+

You can name each code chunk by adding a short description after the +r. For example, an .Rmd file usually starts with a chunk +named {r setup}, but each name must be unique (beware if +you are copying and modifying code chunks - which is a clever thing to +do - but be sure to give it a new name before you try to knit).

+

The first line of code in this chunk sets default printing levels for +all future code chunks. echo=TRUE means that your code will +always be included in your knitted document, along with any output. For +homework assignments, you should almost always use +echo = TRUE so that we can evaluate your code, but if you +don’t want to include the code (e.g., lots of ggplot code to generate a +plot), you can use echo = FALSE to prevent the code from +being printed.

+

The second line of code sets the output width to 80, which will fit +on most monitors (including laptops). You can change this to a smaller +value if you have problems viewing all output, or larger values to +prevent line-wrap.

+
knitr::opts_chunk$set(echo = TRUE)
+options(width = 80) # custom-fit this for your own monitor
+
+
+
+

Some other formatting tips and tricks

+

As you can see above in the Rmd file, I included three # symbols in +front of the text. This increases the size of the text when we knit. The +fewer # symbols you have, the larger the text. See examples below (text +size will only be altered when you knit your final document):

+
+

Paragraph headings

+
# Biggest
+## Less Big
+### Getting Smaller (I like this one best for sub-headings)
+#### Slightly bigger than normal text
+

There is little point in putting 5 # symbols in front of your +text. It’s barely bigger than the default text size.

+
+
+

Text formatting

+

Including an asterisk symbol in front of and behind text will +italicize it. Using two asterisks in a row will make the text +bold.

+

Before you knit, it is important that YOUR CODE IS ERROR +FREE, COMPLETE, AND IN PROPER ORDER. If not, you will get error +messages when you try to knit that are not easy to interpret. So before +you knit an .Rmd document, it’s good to start with a clean global +environment and run every code chunk starting from the beginning. If +there are no errors, you should be able to knit.

+
+
+

More on Markdown

+

R markdown is a format for literate programming +documents. It is based on markdown, a markup language +that is widely used to generate html pages. You can learn more about +markdown here: click +here

+
+
+
+

R coding basics

+
+

Load libraries

+

Here is a code chunk where I load any R packages that I want to use. +dplyr is a great package for data manipulation. +ggplot2 is a great package for plotting data. Cheatsheets +for dplyr and ggplot2 can be found here:

+

dplyr:https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

+

ggplot2:https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

+

You’ll notice in the Rmd file that I included additional information +at the top of the code chunk about warnings and messages. When you load +a package, you’ll often have a bunch of messages and warnings pop up +(usually relating to the version of R that you’re running). The +message=FALSE and warning=FALSE +comments will suppress this information from your knitted document, +which will help it look cleaner and more professional.

+
require(dplyr)
+require(ggplot2)
+require(cowsay)
+
+
+

If R seems scary

+

There are many ways to get help! Google is your friend. If you’re +having a coding issue, odds are someone else has had that same problem, +and before you can fully type in your question you’ll find that Google +autofills for you. You’ll also find that there are many ways to do the +same thing. If you were in Biometry, you’re probably familar with the R +package called swirl. This is a nice user friendly package +that can teach you some basic R commands and statistical analyses while +you use R.

+
# you may edit this message, as needed!
+# you may also choose a different animal if you don't like yoda
+# try typing "sort(names(animals))" in the console to find other options
+say("This is me doing the homework", "stegosaurus")
+
## 
+##  ------ 
+## This is me doing the homework 
+##  ------ 
+##  \   
+##   \  
+##    \
+## 
+##    .-~~^-.
+##  .'  O    \
+## (_____,    \
+##  `----.     \
+##        \     \
+##         \     \
+##          \     `.             _ _
+##           \       ~- _ _ - ~       ~ - .
+##            \                              ~-.
+##             \                                `.
+##              \    /               /           \
+##               `. |         }     |         }    \
+##                 `|        /      |        /       \
+##                  |       /       |       /          \
+##                  |      /`- _ _ _|      /.- ~ ^-.     \
+##                  |     /         |     /          `.    \
+##                  |     |         |     |             -.   ` . _ _ _ _ _ _
+##                  |_____|         |_____|                ~ . _ _ _ _ _ _ _ >
+
+
+

Loading data

+

This next code chunk loads data and does some basic summaries. +iris is a default data set included in the R datasets. The +data() function is only used if it is from an R package. If +you’re using your own dataset, you often need to read in the data using +other methods (we will go over this at some future time). The +head() function shows your first six observations and +tail() will display your last six observations. The +str() tells you the type of data for each column (i.e., +numeric for the first 4 variables and factor (categorical) for Species); +glimpse() is a tidyverse version of str().

+

These are handy functions to use when you first load data to make +sure it was properly imported into R.

+
data("iris")
+head(iris)
+
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+## 1          5.1         3.5          1.4         0.2  setosa
+## 2          4.9         3.0          1.4         0.2  setosa
+## 3          4.7         3.2          1.3         0.2  setosa
+## 4          4.6         3.1          1.5         0.2  setosa
+## 5          5.0         3.6          1.4         0.2  setosa
+## 6          5.4         3.9          1.7         0.4  setosa
+
tail(iris)
+
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
+## 145          6.7         3.3          5.7         2.5 virginica
+## 146          6.7         3.0          5.2         2.3 virginica
+## 147          6.3         2.5          5.0         1.9 virginica
+## 148          6.5         3.0          5.2         2.0 virginica
+## 149          6.2         3.4          5.4         2.3 virginica
+## 150          5.9         3.0          5.1         1.8 virginica
+
str(iris)
+
## 'data.frame':    150 obs. of  5 variables:
+##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
+##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
+##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
+##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
+##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
+
+
+

Data manipulation with dplyr

+

Now we’ll do some simple data manipulation to showcase the dplyr +package, creating a new summary dataset called slBYspecies +to examine differences in sepal length among the three species of iris. +In the code chunk below, %>% are called pipes, and they +are tidyverse shorthand for “do all of these things in sequence”. There +are no NA observations in the iris data, so the +na.rm = TRUE statements aren’t needed here, but most +biological data have missing values, and this code is needed to prevent +errors.

+
# summarize sepal length by species
+# round sd to 3 significant digits
+slBYspecies <- iris %>%
+  group_by(Species) %>%
+  summarise(meanSL = mean(Sepal.Length, na.rm = TRUE), 
+            sdSL = round(sd(Sepal.Length, na.rm = TRUE), 3), 
+            maxSL = max(Sepal.Length, na.rm = TRUE), 
+            minSL = min(Sepal.Length, na.rm = TRUE), 
+            cnt=length(Species)) %>%
+  # calculate the standard error as standard deviation divided by square root of sample size
+  mutate(seSL = sdSL/sqrt(cnt)) 
+
+# print the manipulated data set
+slBYspecies 
+
## # A tibble: 3 × 7
+##   Species    meanSL  sdSL maxSL minSL   cnt   seSL
+##   <fct>       <dbl> <dbl> <dbl> <dbl> <int>  <dbl>
+## 1 setosa       5.01 0.352   5.8   4.3    50 0.0498
+## 2 versicolor   5.94 0.516   7     4.9    50 0.0730
+## 3 virginica    6.59 0.636   7.9   4.9    50 0.0899
+
+
+

Plotting with ggplot2

+

Here are some basic plots with ggplot2. This is mainly +to demonstrate how you can have figures embedded within your knitted +document. However, it will also give you an intro to using +ggplot2.

+
# boxplot of sepal length by species
+ggplot(iris, aes(x = Species, y = Sepal.Length)) +
+  geom_boxplot() +
+  xlab("Species") + ylab("Sepal Length (cm)") # provide custom axis labels
+

+
# scatterplot of sepal length vs. sepal width by species
+ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, 
+                 group = Species, color = Species, fill = Species)) +
+  geom_point(stat = "identity") +
+  geom_smooth(method = "lm") + # use linear model (lm) to provide line of best fit
+  xlab("Sepal width (cm)") + ylab("Sepal length (cm)") +
+  theme_classic()
+
## `geom_smooth()` using formula = 'y ~ x'
+

+
+
+
+

Reproducibility (Session info)

+

It can be helpful to end your Markdown file with a record of what +versions of R and R packages you are using. This is useful if you return +back to your code after a day, a week, a month…or even a year or more +later and find that your code doesn’t run properly anymore (perhaps +because one or more packages have been modified). To keep a record of +this information, you can use the sessionInfo function.

+

Once you’ve determined that you can run each code chunk in this file, +try knitting the entire document by clicking the Knit icon near the top +of the page. A drop down menu will give you the options of knitting to +HTML, pdf, and Word. You can try all 3, but HTML is the easiest to work +with in terms of formatting and is the preferred format in most +cases.

+
sessionInfo()
+
## R version 4.3.2 (2023-10-31)
+## Platform: aarch64-apple-darwin20 (64-bit)
+## Running under: macOS Sonoma 14.1
+## 
+## Matrix products: default
+## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
+## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
+## 
+## locale:
+## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
+## 
+## time zone: America/New_York
+## tzcode source: internal
+## 
+## attached base packages:
+## [1] stats     graphics  grDevices utils     datasets  methods   base     
+## 
+## other attached packages:
+## [1] cowsay_0.9.0  ggplot2_3.4.4 dplyr_1.1.3  
+## 
+## loaded via a namespace (and not attached):
+##  [1] Matrix_1.6-5      gtable_0.3.4      jsonlite_1.8.7    highr_0.10       
+##  [5] compiler_4.3.2    crayon_1.5.2      tidyselect_1.2.0  rmsfact_0.0.3    
+##  [9] jquerylib_0.1.4   splines_4.3.2     scales_1.2.1      yaml_2.3.7       
+## [13] fastmap_1.1.1     lattice_0.22-5    R6_2.5.1          labeling_0.4.3   
+## [17] generics_0.1.3    knitr_1.45        tibble_3.2.1      munsell_0.5.0    
+## [21] bslib_0.6.0       pillar_1.9.0      rlang_1.1.1       utf8_1.2.4       
+## [25] cachem_1.0.8      xfun_0.41         sass_0.4.7        cli_3.6.1        
+## [29] mgcv_1.9-0        withr_2.5.2       magrittr_2.0.3    digest_0.6.33    
+## [33] grid_4.3.2        rstudioapi_0.15.0 fortunes_1.5-4    nlme_3.1-163     
+## [37] lifecycle_1.0.3   vctrs_0.6.4       evaluate_0.23     glue_1.6.2       
+## [41] farver_2.1.1      fansi_1.0.5       colorspace_2.1-0  rmarkdown_2.25   
+## [45] tools_4.3.2       pkgconfig_2.0.3   htmltools_0.5.7
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/YouGotThis.txt b/YouGotThis.txt new file mode 100644 index 0000000..8d6fe04 --- /dev/null +++ b/YouGotThis.txt @@ -0,0 +1,2 @@ +You got this! +Proud of the progress you've made!