-
Notifications
You must be signed in to change notification settings - Fork 239
Open
4 / 44 of 4 issues completedOpen
4 / 44 of 4 issues completed
Copy link
Description
Checklist
- This refactor maintains backward compatibility with all user-facing APIs.
- For large-scale refactors, I've prepared a phased implementation plan.
Current Limitations
PR #528 is too large to review and merge as a single unit.
Proposed Refactor Plan
The following changes will be implemented across separate, focused pull requests:
- Refactor
InferenceEngineAPI to accept importable strings as workflow arguments: refactor: allow passing string paths and init kwargs as rollout workflows #525 - Refactor
InferenceEngineAPI to support server process launching Refactor InferenceEngine API to support server process launching #530 - Modify the
waitmethod ofInferenceEngineto allow silentTimeoutErrorhandling [Refactor] Modify thewaitmethod ofInferenceEngineto allow silentTimeoutErrorhandling #531 - Consolidate sglang/vllm tests: refactor: merge unit tests of sglang and vllm engines #514
- Refactor training statistics collection to use inline
stats_trackerextensively and disregard returned statistics [Refactor] Refactor training statistics collection to use inline stats_tracker extensively and disregard returned statistics #532 - Implement Flask-based RPC server with serialization and corresponding tests [Refactor] Implement Flask-based RPC server with serialization and corresponding tests #533
- Implement
LocalSchedulerwith corresponding tests - Implement
TrainControllerwith an SFT example - Implement
RolloutControllerwith an eval-only example The implementation of RolloutController #469 - Add an end-to-end single-controller RL training script
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels