Skip to content
Deniz edited this page Nov 21, 2013 · 9 revisions

###a) Cross Validation and kNN

The wine dataset is in the data folder. To learn about the dataset, see the UCI repository page:

http://archive.ics.uci.edu/ml/datasets/Wine

i) Fit a kNN algorithm. Perform cross-validation to pick the best value for k.

ii) Perform cross-validation to pick the best model between logistic regression and kNN. Does the answer change when you use a different CV technique?

###b) Bias-Variance Framework The example aims to the illustrate bias-variance framework. The ultimate goal is to achieve a respectable Eout. Hence, the model selection should be a function of availability of the data.

Consider a case where target function is given by sin( π x ) where the input distribution is uniform on [ -1, 1 ]. Assume that the training set has two examples and the learning algorithm tries to minize the mean squared error.

Assume that we have two learning models consisting of all hypothesis in the form of:

h(x) = ax + b

h(x) = b

Run 10,000 random trials and fit a line on each run.

N=2

i) What is the average hypothesis for each model?

ii) Report the bias and variance for both models. Which model generalizes better out-of-sample?

###Hint(s): i) Below the graphs on the left illustrate sample trials for each model. The charts on the right depict average hypotesis and variance of the models.

Bias and Variance - simple model Bias and Variance - complex model

ii) MSE can be defined as:

def mse(y,h):
	return(np.mean(np.square(y-h)))

where y's are the target function variables and h's are your predictions based on your hypothesis over [-1, 1].

Clone this wiki locally