-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
In the training pipeline, the original implementation uses a locally deployed Qwen3-32B model for inference.
I replaced it with the Qwen API, and the training script runs correctly.
However, frequent API calls have significantly slowed down the training process.
In the paper, it’s mentioned that training the 4B model used 32 × H100 GPUs.
Did a large portion of that training time also include delays from model inference calls, or was a local acceleration or caching mechanism used to avoid this slowdown?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels