This repository includes data and R scripts for the tutorial:
From Data to Models: classification, prediction, and synthesis
given by Carlo A. Furia at USI in October 2019.
All data is from public datasets; see files sources for links to the
original sources.
-
Install the R platform from https://stat.ethz.ch/CRAN/ by following the instructions for your operating system
-
(Optional, but recommended) Install the RStudio IDE from https://www.rstudio.com/products/rstudio/download/ by choosing the free version of RStudio Desktop and following the instructions for your operating system
-
Get a snapshot of this repository:
- Download the archive file https://github.com/bugcounting/data2models/archive/master.zip
- Unpack it into a directory in your system (we'll call it
rootin these instructions)
-
Open RStudio and issue the following commands from the pull-down menus:
- File -> Open File -> pick file
root/data2models-master/spam/pull_data.R - Session -> Set Working Directory -> To Source File Location
- Code -> Source
This will download the remaining libraries and data
- File -> Open File -> pick file
-
The book Machine learning for hackers by Drew Conway and John Myles White (O'Reilly, 2012) is a practical presentation of several machine learning techniques (including naive Bayes classifiers and linear regression) with complete code examples in R. The example of Bayesian spam classifier is based on chapter 3 of the book.
-
Dirk Schumacher, the author of the
rpicosatlibrary used in the Sudoku example, has a blog post where he describes how to build the same propositional Sudoku constraints encoding we use insat/sat.Rbut using a different approach in R.