Skip to content

about FLOPs calculation #5

@liulei2140

Description

@liulei2140

The general way to calculate FLOPs :

  • Param = n * ( h * w * c + 1 )
  • FLOPs = H * W * n * ( h * w * c + 1 )

Besides the above way to calculate FLOPs, you pruned channels by the citeria (Channels are assumed to be pruned if their l2 norm is very small or if magnitude of gradient is very small.).

'

                    if (len(weight_size) == 4) or (len(weight_size) == 2) or (len(weight_size) == 1):
                        if not (p.grad is None):
                            # consider gradients as well and if gradient is below spesific threshold than we claim parameter to be removed
                            divider_grad = p.grad.data.pow(2).view(nunits, -1).sum(dim=1).pow(0.5)
                            eps = 1e-8
                            divider_bool_grad = divider_grad.gt(eps).view(-1).float()
                            divider_bool = divider_bool_grad * divider_bool

                            if (len(weight_size) == 4) or (len(weight_size) == 2):
                                # get gradient for input:
                                divider_grad_input = p.grad.data.pow(2).transpose(0,1).contiguous().view(p.data.size(1),-1).sum(dim=1).pow(0.5)
                                divider_bool_grad_input = divider_grad_input.gt(eps).view(-1).float()

                                divider_input = p.data.pow(2).transpose(0,1).contiguous().view(p.data.size(1), -1).sum(dim=1).pow(0.5)
                                divider_bool_input = divider_input.gt(eps).view(-1).float()
                                divider_bool_input = divider_bool_input * divider_bool_grad_input
                                # if gradient is small then remove it out

'

Channels with small norm indeed should be discarded. But channels with very small gradient could be discarded, this sentense make me confused. If channel with small gradient, i think it has achieved the optimized target of loss function on this iteration. In addition, channels with small gradient do not mean its l2 norm small as well.

Can you help me understand the criteria?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions