-
Notifications
You must be signed in to change notification settings - Fork 134
Layers with special behavior on dynamic spatial axes
This is a list of layers with special behavior on dynamic spatial axes, i.e. axes with dynamic sequence lengths because considering the padding or sequence lengths is important for correct behavior. Or in general any layer where the output tensor (placeholder) depends on the sequence lengths.
-
SoftmaxOverSpatialLayer, will make sure that the padded frames are masked away. BatchSoftmaxLayer-
ReduceLayer.sum,maxetc will ignore the padded frames. -
MathNormLayer. shares code withReduceLayerinternally. -
DotLayerwhen reducing a dynamic spatial axes -
BatchNormLayer(and batch norm in general on any layer) - (
NormLayeractually should have special behavior and ignore padded frames, but incorrectly it does not currently (#575)) SliceNdLayerSeqLenMaskLayerFlattenBatchLayerPostfixInTimeLayer- (
CumsumLayerwithreverse=Trueshould ignore padded frames but currently does not (#574)) -
LossLayer(deprecated). see below for losses -
RecLayerwithdirection=-1 -
SelfAttentionLayer(deprecated) - (
LengthLayerwould return the sequence lengths)
(This list is currently incomplete.)
Sequence lengths also matter for the losses. For the framewise losses, it matters for the accumulation (it will ignore the padded frames), and obviously it also matters for all the sequence losses such as CTC.
Somewhat related is the option recurrent on each layer class (or loss). recurrent=False implies that sequence lengths do not matter as well as the ordering of frames. But this is not exactly the same. E.g. ConvLayer has recurrent=True but ConvLayer does not make use of the sequence lengths.
The obvious example of a layer where the dynamic spatial axes do not matter is LinearLayer.
(Partly related is the list of layers with special behavior for recurrent automatic optimization.)