notes and work on deep learning.
The aim of these is to synthesize theory from several different "classical" sources and introduce insights from newer literature.
See statistical_theory_background.md
in this repo here for some of the needed math background.
Measures that we use to measure distances (ex: between images in nearest neighbors) include
From real analysis, we know a metric is a bivariate operator
$d(a,b) \geq 0$ $d(a,b) = 0 \iff a = b$ $d(a,b) = a(b,a)$ $d(a,b) \leq d(a,c) + d(b,c)$
We call
We call
Then distances we often use in vector math are
You can prove that
Notice that for any
So,
If we have two images and we want to perform a nearest neighbors classification on them, we use the
where
The Euclidian distance is
When training something like a k nearest neighbors classifier, we have a choice for what k is.
Tuning this
Usually, selecting the best parameter is a matter of trial and error. You try training the classifier on several hyperparameter combinations before deciding which configuration is best.
Recall from multivariable calculus that the gradient of a multivariate function
We can also define the gradient on vector-valued functions in an alternative way (this one was popular in problem sets for my real analysis class) which more clearly shows the gradient as a derivative.
Say
Then the gradient
Where
Then gradient descent is the method of using the gradient vector to find the local min of a (hopefully) convex loss function.
So, if we have the MSE(
where
Gradient Descent then minimizes this function step by step (where the step size is determined by the learning rate)
issue : that thing doesnt render properly on readme - ton of bugs. need to make a pdf copy of these.