@@ -26,7 +26,7 @@ the initial hidden state. The output of the `cell` is considered to be:
26
26
27
27
The input `x` should be an array of size `in x len` or `in x len x batch_size`,
28
28
where `in` is the input dimension of the cell, `len` is the sequence length, and `batch_size` is the batch size.
29
- The `state` should be a valid state for the recurrent cell. If not provided, it obtained by calling
29
+ The `state` should be a valid state for the recurrent cell. If not provided, it is obtained by calling
30
30
`Flux.initialstates(cell)`.
31
31
32
32
The output is an array of size `out x len x batch_size`, where `out` is the output dimension of the cell.
@@ -107,7 +107,7 @@ See [`RNN`](@ref) for a layer that processes entire sequences.
107
107
108
108
rnncell(x, [h])
109
109
110
- The arguments of the forward pass are:
110
+ The arguments for the forward pass are:
111
111
112
112
- `x`: The input to the RNN. It should be a vector of size `in` or a matrix of size `in x batch_size`.
113
113
- `h`: The hidden state of the RNN. It should be a vector of size `out` or a matrix of size `out x batch_size`.
@@ -210,12 +210,12 @@ end
210
210
The most basic recurrent layer. Essentially acts as a `Dense` layer, but with the
211
211
output fed back into the input each time step.
212
212
213
- In the forward pass computes
213
+ The forward pass computes
214
214
215
215
```math
216
216
h_t = \s igma(W_i x_t + W_h h_{t-1} + b)
217
217
```
218
- for all `len` steps `t` in the in input sequence.
218
+ for all `len` steps `t` in the input sequence.
219
219
220
220
See [`RNNCell`](@ref) for a layer that processes a single time step.
221
221
@@ -225,7 +225,7 @@ See [`RNNCell`](@ref) for a layer that processes a single time step.
225
225
- `σ`: The non-linearity to apply to the output. Default is `tanh`.
226
226
- `return_state`: Option to return the last state together with the output. Default is `false`.
227
227
- `init_kernel`: The initialization function to use for the input to hidden connection weights. Default is `glorot_uniform`.
228
- - `init_recurrent_kernel`: The initialization function to use for the hidden to hidden connection weights. Default is `glorot_uniform`.
228
+ - `init_recurrent_kernel`: The initialization function to use for the hidden-to- hidden connection weights. Default is `glorot_uniform`.
229
229
- `bias`: Whether to include a bias term initialized to zero. Default is `true`.
230
230
231
231
# Forward
@@ -239,7 +239,7 @@ The arguments of the forward pass are:
239
239
If given, it is a vector of size `out` or a matrix of size `out x batch_size`.
240
240
If not provided, it is assumed to be a vector of zeros, initialized by [`initialstates`](@ref).
241
241
242
- Returns all new hidden states `h_t` as an array of size `out x len x batch_size`. When `return_state = true` it returns
242
+ Returns all the new hidden states `h_t` as an array of size `out x len x batch_size`. When `return_state = true` it returns
243
243
a tuple of the hidden stats `h_t` and the last state of the iteration.
244
244
245
245
# Examples
@@ -330,11 +330,13 @@ Behaves like an RNN but generally exhibits a longer memory span over sequences.
330
330
In the forward pass, computes
331
331
332
332
```math
333
- i_t = \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)
334
- f_t = \s igma(W_{xf} x_t + W_{hf} h_{t-1} + b_f)
335
- c_t = f_t \o dot c_{t-1} + i_t \o dot \t anh(W_{xc} x_t + W_{hc} h_{t-1} + b_c)
336
- o_t = \s igma(W_{xo} x_t + W_{ho} h_{t-1} + b_o)
337
- h_t = o_t \o dot \t anh(c_t)
333
+ \b egin{aligned}
334
+ i_t &= \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)\\
335
+ f_t &= \s igma(W_{xf} x_t + W_{hf} h_{t-1} + b_f)\\
336
+ c_t &= f_t \o dot c_{t-1} + i_t \o dot \t anh(W_{xc} x_t + W_{hc} h_{t-1} + b_c)\\
337
+ o_t &= \s igma(W_{xo} x_t + W_{ho} h_{t-1} + b_o)\\
338
+ h_t &= o_t \o dot \t anh(c_t)
339
+ \e nd{aligned}
338
340
```
339
341
340
342
See also [`LSTM`](@ref) for a layer that processes entire sequences.
@@ -430,14 +432,16 @@ recurrent layer. Behaves like an RNN but generally exhibits a longer memory span
430
432
See [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
431
433
for a good overview of the internals.
432
434
433
- In the forward pass, computes
435
+ In the forward pass, it computes
434
436
435
437
```math
436
- i_t = \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)
437
- f_t = \s igma(W_{xf} x_t + W_{hf} h_{t-1} + b_f)
438
- c_t = f_t \o dot c_{t-1} + i_t \o dot \t anh(W_{xc} x_t + W_{hc} h_{t-1} + b_c)
439
- o_t = \s igma(W_{xo} x_t + W_{ho} h_{t-1} + b_o)
440
- h_t = o_t \o dot \t anh(c_t)
438
+ \b egin{aligned}
439
+ i_t &= \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)\\
440
+ f_t &= \s igma(W_{xf} x_t + W_{hf} h_{t-1} + b_f)\\
441
+ c_t &= f_t \o dot c_{t-1} + i_t \o dot \t anh(W_{xc} x_t + W_{hc} h_{t-1} + b_c)\\
442
+ o_t &= \s igma(W_{xo} x_t + W_{ho} h_{t-1} + b_o)\\
443
+ h_t &= o_t \o dot \t anh(c_t)
444
+ \e nd{aligned}
441
445
```
442
446
for all `len` steps `t` in the input sequence.
443
447
See [`LSTMCell`](@ref) for a layer that processes a single time step.
@@ -447,7 +451,7 @@ See [`LSTMCell`](@ref) for a layer that processes a single time step.
447
451
- `in => out`: The input and output dimensions of the layer.
448
452
- `return_state`: Option to return the last state together with the output. Default is `false`.
449
453
- `init_kernel`: The initialization function to use for the input to hidden connection weights. Default is `glorot_uniform`.
450
- - `init_recurrent_kernel`: The initialization function to use for the hidden to hidden connection weights. Default is `glorot_uniform`.
454
+ - `init_recurrent_kernel`: The initialization function to use for the hidden-to- hidden connection weights. Default is `glorot_uniform`.
451
455
- `bias`: Whether to include a bias term initialized to zero. Default is `true`.
452
456
453
457
# Forward
@@ -536,10 +540,12 @@ This implements the variant proposed in v1 of the referenced paper.
536
540
In the forward pass, computes
537
541
538
542
```math
539
- r = \s igma(W_{xi} x + W_{hi} h + b_i)
540
- z = \s igma(W_{xz} x + W_{hz} h + b_z)
541
- h̃ = \t anh(W_{xh} x + r \o dot W_{hh} h + b_h)
542
- h' = (1 - z) \o dot h̃ + z \o dot h
543
+ \b egin{aligned}
544
+ r &= \s igma(W_{xi} x + W_{hi} h + b_i)\\
545
+ z &= \s igma(W_{xz} x + W_{hz} h + b_z)\\
546
+ h̃ &= \t anh(W_{xh} x + r \o dot W_{hh} h + b_h)\\
547
+ h' &= (1 - z) \o dot h̃ + z \o dot h
548
+ \e nd{aligned}
543
549
```
544
550
545
551
See also [`GRU`](@ref) for a layer that processes entire sequences.
@@ -635,10 +641,12 @@ the variant proposed in v1 of the referenced paper.
635
641
The forward pass computes
636
642
637
643
```math
638
- r_t = \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)
639
- z_t = \s igma(W_{xz} x_t + W_{hz} h_{t-1} + b_z)
640
- h̃_t = \t anh(W_{xh} x_t + r_t \o dot W_{hh} h_{t-1} + b_h)
641
- h_t = (1 - z_t) \o dot h̃_t + z_t \o dot h_{t-1}
644
+ \b egin{aligned}
645
+ r_t &= \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)\\
646
+ z_t &= \s igma(W_{xz} x_t + W_{hz} h_{t-1} + b_z)\\
647
+ h̃_t &= \t anh(W_{xh} x_t + r_t \o dot W_{hh} h_{t-1} + b_h)\\
648
+ h_t &= (1 - z_t) \o dot h̃_t + z_t \o dot h_{t-1}
649
+ \e nd{aligned}
642
650
```
643
651
for all `len` steps `t` in the input sequence.
644
652
See [`GRUCell`](@ref) for a layer that processes a single time step.
@@ -724,10 +732,12 @@ This implements the variant proposed in v3 of the referenced paper.
724
732
725
733
The forward pass computes
726
734
```math
727
- r = \s igma(W_{xi} x + W_{hi} h + b_i)
728
- z = \s igma(W_{xz} x + W_{hz} h + b_z)
729
- h̃ = \t anh(W_{xh} x + W_{hh̃} (r \o dot W_{hh} h) + b_h)
730
- h' = (1 - z) \o dot h̃ + z \o dot h
735
+ \b egin{aligned}
736
+ r &= \s igma(W_{xi} x + W_{hi} h + b_i)\\
737
+ z &= \s igma(W_{xz} x + W_{hz} h + b_z)\\
738
+ h̃ &= \t anh(W_{xh} x + W_{hh̃} (r \o dot W_{hh} h) + b_h)\\
739
+ h' &= (1 - z) \o dot h̃ + z \o dot h
740
+ \e nd{aligned}
731
741
```
732
742
and returns `h'`. This is a single time step of the GRU.
733
743
@@ -813,10 +823,12 @@ the variant proposed in v3 of the referenced paper.
813
823
The forward pass computes
814
824
815
825
```math
816
- r_t = \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)
817
- z_t = \s igma(W_{xz} x_t + W_{hz} h_{t-1} + b_z)
818
- h̃_t = \t anh(W_{xh} x_t + W_{hh̃} (r_t \o dot W_{hh} h_{t-1}) + b_h)
819
- h_t = (1 - z_t) \o dot h̃_t + z_t \o dot h_{t-1}
826
+ \b egin{aligned}
827
+ r_t &= \s igma(W_{xi} x_t + W_{hi} h_{t-1} + b_i)\\
828
+ z_t &= \s igma(W_{xz} x_t + W_{hz} h_{t-1} + b_z)\\
829
+ h̃_t &= \t anh(W_{xh} x_t + W_{hh̃} (r_t \o dot W_{hh} h_{t-1}) + b_h)\\
830
+ h_t &= (1 - z_t) \o dot h̃_t + z_t \o dot h_{t-1}
831
+ \e nd{aligned}
820
832
```
821
833
for all `len` steps `t` in the input sequence.
822
834
See [`GRUv3Cell`](@ref) for a layer that processes a single time step.
893
905
894
906
function Base. show (io:: IO , m:: GRUv3 )
895
907
print (io, " GRUv3(" , size (m. cell. Wi, 2 ), " => " , size (m. cell. Wi, 1 ) ÷ 3 , " )" )
896
- end
908
+ end
0 commit comments