Training slowdown caused by frequent API calls

In the training pipeline, the original implementation uses a locally deployed Qwen3-32B model for inference.
I replaced it with the Qwen API, and the training script runs correctly.
However, frequent API calls have significantly slowed down the training process.

In the paper, it’s mentioned that training the 4B model used 32 × H100 GPUs.
Did a large portion of that training time also include delays from model inference calls, or was a local acceleration or caching mechanism used to avoid this slowdown?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training slowdown caused by frequent API calls #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training slowdown caused by frequent API calls #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions