-
Notifications
You must be signed in to change notification settings - Fork 150
Open
Description
Hi, my name is Ramiro, I was checking the code and I have a doubt.
When you update the parameters, related to the input layer and the hidden layer (W1,b1), you calculate the derivative of the activation function, I think that it is done in this line (ann.py file):
dZ = pY_T.dot(self.W2.T) * (1 - Z*Z) # tanh
In the particular case of the tanh I think that (1 - Z*Z) is the derivate, if this is correct so why we use Z. Recall what is stored in Z:
Z = np.tanh(X.dot(self.W1) + self.b1)
I think that we should use only X.dot(self.W1) + self.b1 to evaluate the the derivative, which is the same that use np.arctanh(Z). So the result should be (1 - np.arctanh(Z)*np.arctanh(Z)).
I'm probably wrong, just want to know why.
Thanks!
R.
Metadata
Metadata
Assignees
Labels
No labels