Skip to content

use DDP.no_sync when grad_accum and DDP #215

@oclivegriffin

Description

@oclivegriffin

When using DDP and gradient accumulation at the same time, we should use the DDP.no_sync context manager to get some free training speed.

https://chatgpt.com/share/e/68efa990-9ee8-800c-99da-b078c3d2ac7b

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions