Skip to content

05. Generalization

Antonio Erdeljac edited this page Feb 26, 2019 · 1 revision

Generalization


Topic: Generalization

Course: GMLC

Date: 16 February 2019

Professor: Not specified


Resources


Key Points


  • Generalization - model’s ability to adapt to new data (using the initial distribution)

  • Overfitting  - what happens when a model is complex and tries to fit training data so closely, that it fails to adapt to new data

  • Training set  - dataset used to train model

  • Test set

    • a dataset used to test model after initial training or validation sets

    • Must be large enough

    • Must not be repeated (using the same test over and over)

  • Prediction - model’s output when provided with an example

  • Stationary - meaning that it doesn’t change within the data set

  • Independently & Identically (i.i.d) - examples don’t influence each other, randomness of variables

Check your understanding


  • Explain when does overfitting occur

  • How do we train models to adapt properly to new unseen data? (Which 2 sets do we use)

  • What are the qualities of a good test set?

  • What do we mean by saying that examples are i.i.d?

Summary of Notes


  • A good model is not too complex to cause overfitting (bad adaption to new data)

  • Good model is created using 2 subsets - Training set & Test set

  • Examples provided for test sets must be independent and identical

Clone this wiki locally