You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We call this functionthe _cost function_ or _loss function_.
135
+
We call this function, $\mathcal{L}$, the _cost function_ or [loss function](https://en.wikipedia.org/wiki/Loss_function).
129
136
130
137
```{note}
131
-
This is one possible choice for the cost function, $f(A_{ij})$, but [many others exist](https://en.wikipedia.org/wiki/Loss_function).
138
+
This is called the _mean square error_ loss function, and is one possible choice for $\mathcal{L}(A_{ij})$, but [many others exist](https://en.wikipedia.org/wiki/Loss_function).
132
139
```
133
140
134
141
* Update the matrix ${\bf A}$ based on the training pair $({\bf x}^k, {\bf y^{k}})$.
where we used the fact that the $\delta_{ip}$ means that only a single term contributes to the sum.
40
46
41
-
Note that:
47
+
```{note}
48
+
Observe that:
42
49
43
50
* $e_p^k \equiv (z_p - y_p^k)$ is the error on the output layer,
44
51
and the correction is proportional to the error (as we would
45
52
expect).
46
53
47
54
* The $k$ superscripts here remind us that this is the result of
48
55
only a single pair of data from the training set.
49
-
56
+
```
57
+
50
58
Now ${\bf z}$ and ${\bf y}^k$ are all vectors of size $N_\mathrm{out} \times 1$ and ${\bf x}^k$ is a vector of size $N_\mathrm{in} \times 1$, so we can write this expression for the matrix as a whole as:
0 commit comments