[Bug] Include last attention layer in feature output #4

dillonalaird · 2023-11-30T19:21:59Z

The last out indice should be 7 instead of 6, at least for the SA12 architecture. On SA12 if we return index 6 as the final layer it skips the last attention layer, while 7 includes it. The Timm implementation does include the final attention layer as output. I have trained both models for segmentation tasks on ADE20k using mmsegmentation with this configuration:

model = dict(
    type='EncoderDecoder',
    data_preprocessor=data_preprocessor,
    backbone=dict(
        type='FastViTSA12',
        pretrained=True,
    ),

    neck=dict(
        type='FPN',
        in_channels=[64, 128, 256, 512],
        out_channels=256,
        num_outs=4,
    ),

    decode_head=dict(
        type='FPNHead',
        in_channels=[256, 256, 256, 256],
        in_index=[0, 1, 2, 3],
        feature_strides=[4, 8, 16, 32],
        channels=128,
        dropout_ratio=0.1,
        num_classes=1,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss',
            use_sigmoid=False,
            loss_weight=1.0,
        ),
    ),
)

Some of the differences are:

Model	Parameters	ADE20k Val mIoU
Apple FastViT SA12 FPN	8.3M	30 mIoU
Timm FastViT SA12 FPN	14.6M	39 mIoU

Using the final attention layer the performance numbers and size line up much more closely to the papers reported numbers.

include last attention layer in feature output

ce7d33a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Include last attention layer in feature output #4

[Bug] Include last attention layer in feature output #4

dillonalaird commented Nov 30, 2023

[Bug] Include last attention layer in feature output #4

Are you sure you want to change the base?

[Bug] Include last attention layer in feature output #4

Conversation

dillonalaird commented Nov 30, 2023