From Data to Models

This repository includes data and R scripts for the tutorial:

From Data to Models: classification, prediction, and synthesis

given by Carlo A. Furia at USI in October 2019.

All data is from public datasets; see files sources for links to the original sources.

Requirements to run the tutorial

Install the R platform from https://stat.ethz.ch/CRAN/ by following the instructions for your operating system
(Optional, but recommended) Install the RStudio IDE from https://www.rstudio.com/products/rstudio/download/ by choosing the free version of RStudio Desktop and following the instructions for your operating system
Get a snapshot of this repository:
- Download the archive file https://github.com/bugcounting/data2models/archive/master.zip
- Unpack it into a directory in your system (we'll call it root in these instructions)
Open RStudio and issue the following commands from the pull-down menus:
- File -> Open File -> pick file root/data2models-master/spam/pull_data.R
- Session -> Set Working Directory -> To Source File Location
- Code -> Source
This will download the remaining libraries and data

The book Machine learning for hackers by Drew Conway and John Myles White (O'Reilly, 2012) is a practical presentation of several machine learning techniques (including naive Bayes classifiers and linear regression) with complete code examples in R. The example of Bayesian spam classifier is based on chapter 3 of the book.
Dirk Schumacher, the author of the rpicosat library used in the Sudoku example, has a blog post where he describes how to build the same propositional Sudoku constraints encoding we use in sat/sat.R but using a different approach in R.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
cars		cars
sat		sat
slides		slides
spam		spam
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md