05. Generalization

Jump to bottom

Antonio Erdeljac edited this page Feb 26, 2019 · 1 revision

Generalization

Topic: Generalization

Course: GMLC

Date: 16 February 2019

Professor: Not specified

Resources

Key Points

Generalization - model’s ability to adapt to new data (using the initial distribution)
Overfitting - what happens when a model is complex and tries to fit training data so closely, that it fails to adapt to new data
Training set - dataset used to train model
Test set
- a dataset used to test model after initial training or validation sets
- Must be large enough
- Must not be repeated (using the same test over and over)
Prediction - model’s output when provided with an example
Stationary - meaning that it doesn’t change within the data set
Independently & Identically (i.i.d) - examples don’t influence each other, randomness of variables

Check your understanding

Explain when does overfitting occur
How do we train models to adapt properly to new unseen data? (Which 2 sets do we use)
What are the qualities of a good test set?
What do we mean by saying that examples are i.i.d?

Summary of Notes

A good model is not too complex to cause overfitting (bad adaption to new data)
Good model is created using 2 subsets - Training set & Test set
Examples provided for test sets must be independent and identical