Skip to content

Training slowdown caused by frequent API calls #5

@DabinSheng

Description

@DabinSheng

In the training pipeline, the original implementation uses a locally deployed Qwen3-32B model for inference.
I replaced it with the Qwen API, and the training script runs correctly.
However, frequent API calls have significantly slowed down the training process.

In the paper, it’s mentioned that training the 4B model used 32 × H100 GPUs.
Did a large portion of that training time also include delays from model inference calls, or was a local acceleration or caching mechanism used to avoid this slowdown?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions