Is Gradient Clippping in the code as it is on the paper?

Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is  p.org.copy_(p.data.clamp_(-1,1)) in def train():


`optimizer.zero_grad()`
`loss.backward()`
`for p in list(model.parameters()):`
`   if hasattr(p,'org'):`
`                p.data.copy_(p.org)`
`        optimizer.step()`
`        for p in list(model.parameters()):`
`            if hasattr(p,'org'):`
`                p.org.copy_(p.data.clamp_(-1,1))`
 If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of  p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Gradient Clippping in the code as it is on the paper? #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is Gradient Clippping in the code as it is on the paper? #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions