Skip to content

Commit

Permalink
CH01 Spelling Corrections
Browse files Browse the repository at this point in the history
spelling corrections for CH01.
  • Loading branch information
sszullo committed Oct 1, 2018
1 parent 99c5dd4 commit 10ad1b9
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions 01.0_Intro2Stats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The latter statement shows the use of units for quantitative variables. The popu

<br>
```{block, type='stattip'}
- Statistics and statistical tests cannot and do not attempt to categorically prove or disprove statements and hypotheses about the real world. They simply provide methods to determine how consistent, in a proabilitstic way, statements and hypotheses are with data.
- Statistics and statistical tests cannot and do not attempt to categorically prove or disprove statements and hypotheses about the real world. They simply provide methods to determine how consistent, in a probabilistic way, statements and hypotheses are with data.
```
<br>

Expand Down Expand Up @@ -187,7 +187,7 @@ Chapter 14: Chi-Square
```
<br>

This first chapter introduces statistics and the rest of the book. It gives a general introduction to the need for formal methods that yield repeatable quantitative results to deal with uncertainty in an optimal manner. Chapter 2 is an introduction to the tool that we use for doing statistics: R and RStudio. R is the program that actually does all the computations; essentially, it is a programming language for statistics. RStudio is an integrated development environment (IDE) that handles the creation and debugging of code, as well as the input and output. Chapter 2 is necessary because this book is written using RStudio and code for R. There are examples, exercises and explanations that use R in the rest of the chapters. Chapter 3 presents the required math skills and symbols that are used in the rest of the book. Chapter 4 discusses the differences between populations and samples, and how to visualize and summarize the data collected. Chapter 5 introduces important concepts underlying probability and randomness and how it can be applied practically. Chapter 6 discusses what random variables are, how random variables are classified and sampled, and the most important distributions that random variables follow. Chapter 7 discusses the concept of hypothesis testing in theory and practice and how confidence intervals can be used to determine levels of significance. Chapter 8 presents how to test for significant differences between two population means. Chapter 9 presents how to compare if more than two treatments are significantly different from eachother using Analysis of Variance (ANOVA). Chapter 10 presents different ways to identify error throughout an experiment and important componenets related to experimental design. Chapter 11 presents the concept of "blocking" for error using the Randomized Complete Block Design and how to run an ANOVA with blocks. Chapter 12 discusses how to identify treatments and factor levels of a factorial experiment and how to determine the least significant difference (LSD) for pairwise comparisons. Chapter 13 discusses the partitioning of variance to form a simple linear regression. Chapter 14 presents how to perform a goodness of fit test using Chi-Square for discrete data.
This first chapter introduces statistics and the rest of the book. It gives a general introduction to the need for formal methods that yield repeatable quantitative results to deal with uncertainty in an optimal manner. Chapter 2 is an introduction to the tool that we use for doing statistics: R and RStudio. R is the program that actually does all the computations; essentially, it is a programming language for statistics. RStudio is an integrated development environment (IDE) that handles the creation and debugging of code, as well as the input and output. Chapter 2 is necessary because this book is written using RStudio and code for R. There are examples, exercises and explanations that use R in the rest of the chapters. Chapter 3 presents the required math skills and symbols that are used in the rest of the book. Chapter 4 discusses the differences between populations and samples, and how to visualize and summarize the data collected. Chapter 5 introduces important concepts underlying probability and randomness and how it can be applied practically. Chapter 6 discusses what random variables are, how random variables are classified and sampled, and the most important distributions that random variables follow. Chapter 7 discusses the concept of hypothesis testing in theory and practice and how confidence intervals can be used to determine levels of significance. Chapter 8 presents how to test for significant differences between two population means. Chapter 9 presents how to compare if more than two treatments are significantly different from each other using Analysis of Variance (ANOVA). Chapter 10 presents different ways to identify error throughout an experiment and important components related to experimental design. Chapter 11 presents the concept of "blocking" for error using the Randomized Complete Block Design and how to run an ANOVA with blocks. Chapter 12 discusses how to identify treatments and factor levels of a factorial experiment and how to determine the least significant difference (LSD) for pairwise comparisons. Chapter 13 discusses the partitioning of variance to form a simple linear regression. Chapter 14 presents how to perform a goodness of fit test using Chi-Square for discrete data.


In chapter 4 we start working with data. Data are summarized and descriptions of central tendency and variability are calculated. These exploratory techniques are implemented in R, which we use to create frequency tables, histograms and box-and-whisker plots.
Expand All @@ -210,7 +210,7 @@ Theme: There are compelling arguments indicating that statistics has been misund

A main problem in science is the misuse and misunderstanding of the so-called p-value. The p-value is the probability of observing the specific realized data if it came from a data-generating process as described by the null hypothesis and assumptions. In *frequentist* as opposed to *Bayesian* statistics, hypotheses are tested by determining if the p-value is too low. The logic can be a little convoluted, and that is why we will emphasize its explanation throughout the book. The problem is that many people think that the p-value has other meanings, like being the probability that the hypothesis is true (incorrect). In addition to that, p-value has been used in a way that differs from what it was intended for. These days p-values are largely used to determine what is published. Given that any hypothesis will be eventually rejected if it is tested a sufficient number of times, obviously the publication of only the times when it is rejected is not a good representation of the results of experimentation. ALL the times when it was not rejected SHOULD ALSO be considered.

<a href="https://https://youtu.be/cOgscsjUvvE" target="_blank">What is the p-value?</a>
<a href="https://https://youtube/cOgscsjUvvE" target="_blank">What is the p-value?</a>

<a href="https://www.nature.com/news/scientific-method-statistical-errors-1.14700" target="_blank">Not understanding and misusing p-value</a>

Expand Down Expand Up @@ -283,7 +283,7 @@ Students work in pairs. Each pair of students is given a set of three equal toke

The simulation is most efficient if all realize that the case where there is no switch should have an expected success rate of 1/3, if indeed the procedure achieves equal probabilities. In that case we may just assume the expected results. However, it may be good to experience the variability about the expected rate.

Solution: The car can be in any of the thre letters. For each position of the car, the contestant can pick any of the three letters. The host can only show a letter that is not the first letter and is not the second one. If the contestant chose the car, then the host can pick out of two letters. Otherwise, the host only has one choice. The final letter can be either the second one (host initial choice) or the one on the unopened door. We can calculate how many options there are when the car is in door A and then simply multiply by three, to account for the cases when the car is behind A, B, or C.
Solution: The car can be in any of the three letters. For each position of the car, the contestant can pick any of the three letters. The host can only show a letter that is not the first letter and is not the second one. If the contestant chose the car, then the host can pick out of two letters. Otherwise, the host only has one choice. The final letter can be either the second one (host initial choice) or the one on the unopened door. We can calculate how many options there are when the car is in door A and then simply multiply by three, to account for the cases when the car is behind A, B, or C.

The easiest way to think of this problem is to use the objects behind the doors to label the doors. There are three cases, where Door1 is the door the contestant chooses first:

Expand Down

0 comments on commit 10ad1b9

Please sign in to comment.