Skip to content

Commit 7c42d15

Browse files
committed
data wrangling intro + another typo
1 parent 1b0aa8b commit 7c42d15

File tree

158 files changed

+1378
-988
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

158 files changed

+1378
-988
lines changed

1_08_getting_help.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ demo()
100100
When we use the `demo` function like this it only lists the demos associated with packages that have been loaded in the current session (via `library`). If we want to see all the demos we can run we need to use the somewhat cryptic `demo(package = .packages(all.available = TRUE))`.
101101

102102
In order to actually run a demo we use the `demo` function, setting the `topic` and `package` arguments. For example, to run the "colors" demo in the __grDevices__ package we would use:
103-
```{r, echo=FALSE}
103+
```{r, eval=FALSE}
104104
demo(colors, package = "grDevices", ask = FALSE)
105105
```
106106
This particular demo shows off some of the pre-defined colours we might use to customise the appearance of a plot. We've suppressed the output though because so much is produced.

2_03_tidy_data_dplyr_intro.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
[Data wrangling]
5+
Data wrangling refers to the process of manipulating raw data into the format that we want it in, for example for data visualisation or statistical analyses. There are a wide range of ways we may want to manipulate our data, for example by creating new variables, subsetting the data, or calculating summaries. Data wrangling is often a time consuming process. It is also not the most interesting part of any analysis - we are interested in answering biological questions, not in formatting data. However, it is a necessary step to go through to be able to conduct the analyses that we're really interested in. Learning how to manipulate data efficiently can save us a lot of time and trouble and is therefore a really important skill to master.
66

77
## The value of **dplyr** {#why-dplyr}
88

docs/a-quick-introduction-to-r.html

+31-31
Large diffs are not rendered by default.

docs/building-in-complexity.html

+32-33
Large diffs are not rendered by default.

docs/building-pipelines.html

+525
Large diffs are not rendered by default.

docs/customising-plots.html

+34-34
Large diffs are not rendered by default.

docs/data-frames.html

+25-25
Large diffs are not rendered by default.

docs/dplyr-and-the-tidy-data-concept.html

+21-21
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@
66
<meta charset="UTF-8">
77
<meta http-equiv="X-UA-Compatible" content="IE=edge">
88
<title>APS 135: Introduction to Exploratory Data Analysis with R</title>
9-
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
109
<meta name="description" content="Course book for Introduction to Exploratory Data Analysis with R (APS 135) in the Department of Animal and Plant Sciences, University of Sheffield.">
11-
<meta name="generator" content="bookdown 0.3 and GitBook 2.6.7">
10+
<meta name="generator" content="bookdown 0.5 and GitBook 2.6.7">
1211

1312
<meta property="og:title" content="APS 135: Introduction to Exploratory Data Analysis with R" />
1413
<meta property="og:type" content="book" />
@@ -26,7 +25,7 @@
2625
<meta name="author" content="Dylan Z. Childs">
2726

2827

29-
<meta name="date" content="2017-03-21">
28+
<meta name="date" content="2018-02-07">
3029

3130
<meta name="viewport" content="width=device-width, initial-scale=1">
3231
<meta name="apple-mobile-web-app-capable" content="yes">
@@ -35,7 +34,6 @@
3534

3635
<link rel="prev" href="working-directories-and-data-files.html">
3736
<link rel="next" href="working-with-variables.html">
38-
3937
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
4038
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
4139
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
@@ -117,6 +115,7 @@
117115
<body>
118116

119117

118+
120119
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
121120

122121
<div class="book-summary">
@@ -277,8 +276,8 @@
277276
</ul></li>
278277
<li class="chapter" data-level="15.3" data-path="grouping-and-summarising-data.html"><a href="grouping-and-summarising-data.html#removing-grouping-information"><i class="fa fa-check"></i><b>15.3</b> Removing grouping information</a></li>
279278
</ul></li>
280-
<li class="chapter" data-level="16" data-path="building-piplines.html"><a href="building-piplines.html"><i class="fa fa-check"></i><b>16</b> Building piplines</a><ul>
281-
<li class="chapter" data-level="16.1" data-path="building-piplines.html"><a href="building-piplines.html#why-do-we-need-pipes"><i class="fa fa-check"></i><b>16.1</b> Why do we need ‘pipes’?</a></li>
279+
<li class="chapter" data-level="16" data-path="building-pipelines.html"><a href="building-pipelines.html"><i class="fa fa-check"></i><b>16</b> Building pipelines</a><ul>
280+
<li class="chapter" data-level="16.1" data-path="building-pipelines.html"><a href="building-pipelines.html#why-do-we-need-pipes"><i class="fa fa-check"></i><b>16.1</b> Why do we need ‘pipes’?</a></li>
282281
</ul></li>
283282
<li class="part"><span><b>III Exporing Data</b></span></li>
284283
<li class="chapter" data-level="17" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html"><i class="fa fa-check"></i><b>17</b> Exploratory data analysis</a><ul>
@@ -373,7 +372,7 @@ <h1>
373372
<h1><span class="header-section-number">Chapter 11</span> <strong>dplyr</strong> and the tidy data concept</h1>
374373
<div id="introduction-4" class="section level2">
375374
<h2><span class="header-section-number">11.1</span> Introduction</h2>
376-
<p>[Data wrangling]</p>
375+
<p>Data wrangling refers to the process of manipulating raw data into the format that we want it in, for example for data visualisation or statistical analyses. There are a wide range of ways we may want to manipulate our data, for example by creating new variables, subsetting the data, or calculating summaries. Data wrangling is often a time consuming process. It is also not the most interesting part of any analysis - we are interested in answering biological questions, not in formatting data. However, it is a necessary step to go through to be able to conduct the analyses that we’re really interested in. Learning how to manipulate data efficiently can save us a lot of time and trouble and is therefore a really important skill to master.</p>
377376
</div>
378377
<div id="why-dplyr" class="section level2">
379378
<h2><span class="header-section-number">11.2</span> The value of <strong>dplyr</strong></h2>
@@ -443,18 +442,18 @@ <h3><span class="header-section-number">11.4.1</span> Tibble (<code>tbl</code>)
443442
iris_tbl &lt;-<span class="st"> </span><span class="kw">tbl_df</span>(iris)
444443
<span class="co"># print it</span>
445444
iris_tbl</code></pre></div>
446-
<pre><code>## # A tibble: 150 × 5
445+
<pre><code>## # A tibble: 150 x 5
447446
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
448447
## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fctr&gt;
449-
## 1 5.1 3.5 1.4 0.2 setosa
450-
## 2 4.9 3.0 1.4 0.2 setosa
451-
## 3 4.7 3.2 1.3 0.2 setosa
452-
## 4 4.6 3.1 1.5 0.2 setosa
453-
## 5 5.0 3.6 1.4 0.2 setosa
454-
## 6 5.4 3.9 1.7 0.4 setosa
455-
## 7 4.6 3.4 1.4 0.3 setosa
456-
## 8 5.0 3.4 1.5 0.2 setosa
457-
## 9 4.4 2.9 1.4 0.2 setosa
448+
## 1 5.1 3.5 1.4 0.2 setosa
449+
## 2 4.9 3.0 1.4 0.2 setosa
450+
## 3 4.7 3.2 1.3 0.2 setosa
451+
## 4 4.6 3.1 1.5 0.2 setosa
452+
## 5 5.0 3.6 1.4 0.2 setosa
453+
## 6 5.4 3.9 1.7 0.4 setosa
454+
## 7 4.6 3.4 1.4 0.3 setosa
455+
## 8 5.0 3.4 1.5 0.2 setosa
456+
## 9 4.4 2.9 1.4 0.2 setosa
458457
## 10 4.9 3.1 1.5 0.1 setosa
459458
## # ... with 140 more rows</code></pre>
460459
<p>Notice that only the first 10 rows are printed. This is much nicer than trying to wade through every row of a data frame.</p>
@@ -481,8 +480,9 @@ <h3><span class="header-section-number">11.4.2</span> The <code>glimpse</code> f
481480
</div>
482481
</div>
483482
<a href="working-directories-and-data-files.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
484-
<a href="working-with-variables.html" class="navigation navigation-next " aria-label="Next page""><i class="fa fa-angle-right"></i></a>
485-
483+
<a href="working-with-variables.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
484+
</div>
485+
</div>
486486
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
487487
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
488488
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
@@ -491,7 +491,7 @@ <h3><span class="header-section-number">11.4.2</span> The <code>glimpse</code> f
491491
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
492492
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
493493
<script>
494-
require(["gitbook"], function(gitbook) {
494+
gitbook.require(["gitbook"], function(gitbook) {
495495
gitbook.start({
496496
"sharing": {
497497
"github": false,
@@ -526,7 +526,7 @@ <h3><span class="header-section-number">11.4.2</span> The <code>glimpse</code> f
526526
(function () {
527527
var script = document.createElement("script");
528528
script.type = "text/javascript";
529-
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
529+
script.src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
530530
if (location.protocol !== "file:" && /^https?:/.test(script.src))
531531
script.src = script.src.replace(/^https?:/, '');
532532
document.getElementsByTagName("head")[0].appendChild(script);

docs/exploratory-data-analysis.html

+15-15
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@
66
<meta charset="UTF-8">
77
<meta http-equiv="X-UA-Compatible" content="IE=edge">
88
<title>APS 135: Introduction to Exploratory Data Analysis with R</title>
9-
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
109
<meta name="description" content="Course book for Introduction to Exploratory Data Analysis with R (APS 135) in the Department of Animal and Plant Sciences, University of Sheffield.">
11-
<meta name="generator" content="bookdown 0.3 and GitBook 2.6.7">
10+
<meta name="generator" content="bookdown 0.5 and GitBook 2.6.7">
1211

1312
<meta property="og:title" content="APS 135: Introduction to Exploratory Data Analysis with R" />
1413
<meta property="og:type" content="book" />
@@ -26,16 +25,15 @@
2625
<meta name="author" content="Dylan Z. Childs">
2726

2827

29-
<meta name="date" content="2017-03-21">
28+
<meta name="date" content="2018-02-07">
3029

3130
<meta name="viewport" content="width=device-width, initial-scale=1">
3231
<meta name="apple-mobile-web-app-capable" content="yes">
3332
<meta name="apple-mobile-web-app-status-bar-style" content="black">
3433

3534

36-
<link rel="prev" href="building-piplines.html">
35+
<link rel="prev" href="building-pipelines.html">
3736
<link rel="next" href="introduction-to-ggplot2.html">
38-
3937
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
4038
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
4139
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
@@ -117,6 +115,7 @@
117115
<body>
118116

119117

118+
120119
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
121120

122121
<div class="book-summary">
@@ -277,8 +276,8 @@
277276
</ul></li>
278277
<li class="chapter" data-level="15.3" data-path="grouping-and-summarising-data.html"><a href="grouping-and-summarising-data.html#removing-grouping-information"><i class="fa fa-check"></i><b>15.3</b> Removing grouping information</a></li>
279278
</ul></li>
280-
<li class="chapter" data-level="16" data-path="building-piplines.html"><a href="building-piplines.html"><i class="fa fa-check"></i><b>16</b> Building piplines</a><ul>
281-
<li class="chapter" data-level="16.1" data-path="building-piplines.html"><a href="building-piplines.html#why-do-we-need-pipes"><i class="fa fa-check"></i><b>16.1</b> Why do we need ‘pipes’?</a></li>
279+
<li class="chapter" data-level="16" data-path="building-pipelines.html"><a href="building-pipelines.html"><i class="fa fa-check"></i><b>16</b> Building pipelines</a><ul>
280+
<li class="chapter" data-level="16.1" data-path="building-pipelines.html"><a href="building-pipelines.html#why-do-we-need-pipes"><i class="fa fa-check"></i><b>16.1</b> Why do we need ‘pipes’?</a></li>
282281
</ul></li>
283282
<li class="part"><span><b>III Exporing Data</b></span></li>
284283
<li class="chapter" data-level="17" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html"><i class="fa fa-check"></i><b>17</b> Exploratory data analysis</a><ul>
@@ -381,7 +380,7 @@ <h2><span class="header-section-number">17.1</span> Introduction</h2>
381380
<li><p>to provide a foundation for further data collection.</p></li>
382381
</ul>
383382
<p>EDA involves a mix of both numerical and visual methods of analysis. Statistical methods are sometimes used to supplement EDA, but its main purpose is to facilitate understanding before diving into formal statistical modelling.</p>
384-
<p>Even if we think we already know what kind of analysis we need to pursue, it’s always a good idea to <strong>explore a data set before diving into the analysis</strong>. At the very least, this will help us to determine whether or not our plans are sensible. Very often it uncovers new patterns and insights. In this chapter we’re going to examine some basic concepts that underpin EDA: 1) classifying different types data, and 2) distinguishing between populations and samples. This will set us up to learn how to explore our data in later chapters.</p>
383+
<p>Even if we think we already know what kind of analysis we need to pursue, it’s always a good idea to <strong>explore a data set before diving into the analysis</strong>. At the very least, this will help us to determine whether or not our plans are sensible. Very often it uncovers new patterns and insights. In this chapter we’re going to examine some basic concepts that underpin EDA: 1) classifying different types of data, and 2) distinguishing between populations and samples. This will set us up to learn how to explore our data in later chapters.</p>
385384
</div>
386385
<div id="variables" class="section level2">
387386
<h2><span class="header-section-number">17.2</span> Statistical variables and data</h2>
@@ -421,10 +420,10 @@ <h3><span class="header-section-number">17.2.2</span> Ratio vs. interval scales
421420
</div>
422421
<div id="populations-samples" class="section level2">
423422
<h2><span class="header-section-number">17.3</span> Populations, samples and distributions</h2>
424-
<p>When we collect data of any kind, we are working a sample of objects (e.g. trees, insects, fields) from a wider population. We usually want to know something about the wider population, but since it’s impossible to study every member of the population, we study the properties of one or more samples instead.</p>
423+
<p>When we collect data of any kind, we are working with a sample of objects (e.g. trees, insects, fields) from a wider population. We usually want to know something about the wider population, but since it’s impossible to study every member of the population, we study the properties of one or more samples instead.</p>
425424
<p>The problem with samples is that they are ‘noisy’. If we were repeat the same data collection protocol more than once we should expect to end up with a different sample each time, even if the wider population never changes. This results purely from chance variation in the sampling of different units. Picking apart the relationship between samples and populations is the basis of much of statistics. This topic is best dealt with in a dedicated statistics book, so we won’t develop these ideas in much detail here.</p>
426425
<p>The reason we mention the distinction between a population and a sample is because EDA is primarily concerned with properties of samples—it aims to characterise the sample in hand without trying to say too much about the wider population from which it is derived.</p>
427-
<p>When we talk about “exploring a variable” what we are really doing is exploring is the <strong>sample distribution</strong> of that variable. What is this? The sample distribution is a statement about the frequency with which different values occur in a particular sample. Imagine we took a a sample of undergraduates and measured their height. The majority of students would be round about 1.7m tall, even though there would obviously be some variation among students. Men would tend to be slightly taller than women, and very small or very tall people would be rare. We know from experience that no one in in this sample would be over 3 meters tall. These are all statements about a (hypothetical) sample distribution of undergraduate heights.</p>
426+
<p>When we talk about “exploring a variable” what we are really doing is exploring is the <strong>sample distribution</strong> of that variable. What is this? The sample distribution is a statement about the frequency with which different values occur in a particular sample. Imagine we took a sample of undergraduates and measured their height. The majority of students would be round about 1.7m tall, even though there would obviously be some variation among students. Men would tend to be slightly taller than women, and very small or very tall people would be rare. We know from experience that no one in this sample would be over 3 meters tall. These are all statements about a (hypothetical) sample distribution of undergraduate heights.</p>
428427
<p>Our goal when exploring the sample distribution of a variable is to answer questions such as, What are the most common values of the variable; and How much do observations differ from one another? Rather than simply describing these properties in verbal terms, as we did above, we want to describe in a more informative way. There are two ways to go about this:</p>
429428
<ol style="list-style-type: decimal">
430429
<li><p><strong>Calculate descriptive statistics</strong>. Descriptive statistics are used to quantify the basic features of a sample distribution. They provide simple summaries about the sample that can be used to make comparisons and draw preliminary conclusions. For example, we often use ‘the mean’ to summarise the ‘most likely’ values of a variable in a sample.</p></li>
@@ -442,9 +441,10 @@ <h2><span class="header-section-number">17.4</span> Relationships</h2>
442441
</div>
443442
</div>
444443
</div>
445-
<a href="building-piplines.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
446-
<a href="introduction-to-ggplot2.html" class="navigation navigation-next " aria-label="Next page""><i class="fa fa-angle-right"></i></a>
447-
444+
<a href="building-pipelines.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
445+
<a href="introduction-to-ggplot2.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
446+
</div>
447+
</div>
448448
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
449449
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
450450
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
@@ -453,7 +453,7 @@ <h2><span class="header-section-number">17.4</span> Relationships</h2>
453453
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
454454
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
455455
<script>
456-
require(["gitbook"], function(gitbook) {
456+
gitbook.require(["gitbook"], function(gitbook) {
457457
gitbook.start({
458458
"sharing": {
459459
"github": false,
@@ -488,7 +488,7 @@ <h2><span class="header-section-number">17.4</span> Relationships</h2>
488488
(function () {
489489
var script = document.createElement("script");
490490
script.type = "text/javascript";
491-
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
491+
script.src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
492492
if (location.protocol !== "file:" && /^https?:/.test(script.src))
493493
script.src = script.src.replace(/^https?:/, '');
494494
document.getElementsByTagName("head")[0].appendChild(script);

0 commit comments

Comments
 (0)