Skip to content

Second Edition of CRC book

Latest
Compare
Choose a tag to compare
@ismayc ismayc released this 27 Mar 21:40
· 1 commit to v2 since this release
c6626cc
  • Created https://moderndive.com/v2/ website to host the Second Edition (and later v2 and beyond) content

  • Removed previous data sets promotions (Chapter 9) and evals (Chapters 5, 6, and 10) and replaced with un_member_states_2024 and spotify_by_genre instead

  • Replaced pennies with almonds_bowl in Chapter 7

  • Moved some sections around in Chapters 7 and 10 to improve readability

  • Moved model selection to Chapter 10 instead of Chapter 6

  • Added coffee_quality and old_faithful_2024 examples to Chapter 10

  • Improved theory-based discussions in Chapters 8, 10, and 11

  • Added use of fit() function for simulation-based inference with multiple linear regression

  • Added infer package with fit() to Chapter 11 to discuss inference for regression

  • Added content in the Appendices

  • Used base-pipe |> instead of %>% in all code chunks since those are in other updates. Some inline functions like "*"() were kept using %>% since they are more readable than converting to the base-pipe functionality.

  • Addressed the warning message explicitly for group_by() in text and fix index.Rmd to remove options(dplyr.summarise.inform = FALSE)

  • Added relocate() to end of Chapter 3

  • Added envoy_flights and early_january_2023_weather to {moderndive} package

  • Explained that {nycflights23} is an updated version of {nycflights13} using the {anyflights} package

  • Updated code and discussion throughout the book to use {nycflights23} instead of {nycflights13}

  • Chapter 2 Data Visualization: Remove soft introduction to %>% operator (from Ch 3 Data Wrangling) since this only confused readers. Instead we now use a prepared alaska_flights and early_january_weather data frames from moderndive version 0.5.3

  • Chapter 6 Multiple Regression: Per @kmkinnaird's suggestion, we split "6.3.1 Model selection" into:

    • "6.3.1 Model selection using visualizations"
    • Added "6.3.2 Model selection using R-squared"
  • Chapter 7 Sampling: Per @kmkinnaird's suggestion, refactored as follows

    • "7.3.1 Terminology & notation": clustered definitions according to theme and connected back to sampling exercises
    • "7.3.2 Statistical definitions":
    • Moved "7.5.2 Central Limit Theorem" to its own section to make it more prominent and not an after-thought
    • Created a new "7.6.2 Theory-based standard errors" which split "8.7.2 Theory-based confidence intervals" into two parts and moved the earlier part to Chapter 7 Sampling. That way all 4 statistical inference chapters (Ch 7-11) each of their own "theory-based X" subsection at the end bridging the gap between simulation based and traditional methods.