DeepNVMe update #7215

tjruwase · 2025-04-12T18:35:24Z

FastPersist
ZeRO-Inference+SGLang

* Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Update to master (#340) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * Versioned torch* optimizations (#341) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * fp16 fused mode * fp16 fused mode (#342) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * Support serialization versions * Support serialization of different torch versions (#343) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * distributed ckpt draft (#349) * inject parallel write * Support serialization of different torch versions (#343) (#345) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * finish split distributed write * split based-on num_bytes * resolving single node python test * remove irrelavent prints * format Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * torch serialization options * Configurable torch serialization (#350) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions * torch serialization options Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> * Distributed writer slicing on byte boundary * Fix typo * FastFileWriter Config; Parallel writer nodes * Minor fix * remove warning from fast-io-ckpt (#354) * Relocate debug print * Parallel writing through byte boundary slicing (#351) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions * torch serialization options * Distributed writer slicing on byte boundary * Fix typo * FastFileWriter Config; Parallel writer nodes * Minor fix * remove warning from fast-io-ckpt (#354) * Relocate debug print Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: Guanhua Wang <[email protected]> * fix broken mock_file_writer (#357) * Report write speed * DP writing * DP MoE checkpoints Generalize DP dense checkpoints for socket/machine options * Various improvements (#376) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions * torch serialization options * Distributed writer slicing on byte boundary * Fix typo * FastFileWriter Config; Parallel writer nodes * Minor fix * remove warning from fast-io-ckpt (#354) * Relocate debug print * Report write speed * DP writing * DP MoE checkpoints Generalize DP dense checkpoints for socket/machine options Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: Guanhua Wang <[email protected]> * Decoupled checkpointing * New MP slicing algorithm * Format fixes * Decoupled checkpointing support (#384) * Integrate NVIDIA GPUDirect Storage into nvme library * 1) Remove debug prints 2) Create write file with random data 3) Delete target file before new writes * Workaround gds perf issue by leaking buffers * DGX2 mount/unmount utililties * Formatting * Add torch save/load * Add torch save/load * Remove gds * Add torch legacy save * Update to new cli * Add function signatures Add file_offset arg to read/write apis * Remove redundant asserts * Add DeepSpeedFileWriter * Add mock and python file writers * Format fixes * More perf counters * Fix pinned_offset bug; Show as not real python file object * Buffer copy speed * Add torch_fastio option * Format fixes * Measure torch_fastio perf * Force flush * Formatting * Renamings * Fix device bug * Disable torch.distributed requirement * Renaming * Integrate fast model checkpointing * Double I/O buffer optimization * Support larger sizes * Refactoring; save_storage api * Cast to byte tensor * Handle storage object saves * Remove mysterious import * Api to save storage object list; refactor stats * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation * Yangli2/fastio double buffer pytorch optimized (#291) * Double I/O buffer optimization * add pytorch optimization * fixed some syntax errors * comment out save_storage for mock * uncomment save storage for mock * fixed indentation Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Yang Li <[email protected]> * making deepspeed/runtime/fp16/loss_scaler/dynamiclossscale serializable * Dump fast_writer stats only on rank 0 * Configuration option for fused fp16 optimizer * Update to new API * Format fixes * Support torch* optimization for version 1.12 * Formatting * Versioned torch* optimization * fp16 fused mode * Support serialization versions * torch serialization options * Distributed writer slicing on byte boundary * Fix typo * FastFileWriter Config; Parallel writer nodes * Minor fix * remove warning from fast-io-ckpt (#354) * Relocate debug print * Report write speed * DP writing * DP MoE checkpoints Generalize DP dense checkpoints for socket/machine options * Decoupled checkpointing * New MP slicing algorithm * Format fixes Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: Guanhua Wang <[email protected]> * add io multiplier for larger scale simulation (#411) * add io multiplier config for simulation * remove prints and test correctness * format * Merge with master * Format fixes * Guanhua/fast io clean v5 (#435) * Add environment variable to make nvcc compilation more verbose (#2759) * Bing/formatting correction (#2764) * modify engine.py for formatting * commit formatting changes on engine.py * Add links to new azureML examples (#2756) Co-authored-by: Jeff Rasley <[email protected]> * Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. (#2743) * Remove hardcoded instances to fp16 in log messages. * Add model_dtype to print the correct format * Respond to PR feedback --------- Co-authored-by: Olatunji Ruwase <[email protected]> * Refactor/Pydantify monitoring config (#2640) * pydantify monitoring configs --------- Co-authored-by: Olatunji Ruwase <[email protected]> * Pin minimum `packaging` requirement (#2771) Co-authored-by: Jeff Rasley <[email protected]> * Fix for diffusers v0.12.0 (#2753) Co-authored-by: Jeff Rasley <[email protected]> * update copy right in aio * type fix in ds_py_aio_handle * update year in aio/py_test * fix description in util pybind * update and remove prints in fast_file_writer * remove del print * remove dist barrier in engine.py * update year in runtime/model_ckpt * add todo in runtime/model_ckpt/util.py * update year * reverse pip3 * update opbuilder * format * modify print for python * fix print capability * fix print * some fix in flops_profiler (#2068) * bugs in profiler: 1. Tensor.bmm missed in _patch_tensor_methods function 2. missed funtions in _reload_functionals and _reload_tensor_methods functions 3. torch.mm and torch.Tensor.mm will have same __name__ in wrapFunc, my suggustion is use __str__ instead. * formatting --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Cheng Li <[email protected]> * fix upsample flops compute by skipping unused kargs (#2773) * fix upsample flops compute by skipping unused kargs * fix format * format * Fix broken kernel inject bug (#2776) * format * remove zero change * fix engine issue --------- Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Bing Xie <[email protected]> Co-authored-by: cassieesvelt <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: swli <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Molly Smith <[email protected]> * Formatting * Formatting * Debug file delete slowdown * Investigate write perf * Investigate write perf * Fix mising args * Fix microbenchmark and unit tests (#450) * Debug file delete slowdown * Investigate write perf * Investigate write perf * Fix mising args * Formatting * Rebase attempts * updates for running with newest dependencies * Pydantic fixes * Rebase fixes * Fix rebase bugs * Add DS utils for tensor casting * Fomat fixes * Fix GDS * Update with io_engine API * Continued rebase * Integrate GDS into writer factory * Add --venv_script option * Formatting fix Signed-off-by: Olatunji Ruwase <[email protected]> --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: Guanhua Wang <[email protected]> Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Bing Xie <[email protected]> Co-authored-by: cassieesvelt <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: swli <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Ubuntu <[email protected]>

Signed-off-by: Olatunji Ruwase <[email protected]>

tjruwase and others added 23 commits February 25, 2025 13:17

Rename DeepNVMe blog

4c75bb8

Signed-off-by: Olatunji Ruwase <[email protected]>

Rebase docs/blogs

b1a3ea8

Signed-off-by: Olatunji Ruwase <[email protected]>

Rebase with master

94ec9da

Signed-off-by: Olatunji Ruwase <[email protected]>

Rebase docs

24fa62a

Signed-off-by: Olatunji Ruwase <[email protected]>

New blog

fcaff17

Signed-off-by: Olatunji Ruwase <[email protected]>

Update doc

5c8e2ac

Signed-off-by: Olatunji Ruwase <[email protected]>

Update doc

b146114

Signed-off-by: Olatunji Ruwase <[email protected]>

Update blog

4343ef9

Signed-off-by: Olatunji Ruwase <[email protected]>

Update blog

772fd9d

Signed-off-by: Olatunji Ruwase <[email protected]>

Add FastPersist results

efd5712

Signed-off-by: Olatunji Ruwase <[email protected]>

Add FastPersist results

41df704

Signed-off-by: Olatunji Ruwase <[email protected]>

Fix cast bug of zero-sized tensors

924f8ef

Signed-off-by: Olatunji Ruwase <[email protected]>

More tweaks

cbf98c0

Signed-off-by: Olatunji Ruwase <[email protected]>

More tweaks

d2625cc

Signed-off-by: Olatunji Ruwase <[email protected]>

More tweaks

82fb720

Signed-off-by: Olatunji Ruwase <[email protected]>

Acknowledgements

2780c8a

Signed-off-by: Olatunji Ruwase <[email protected]>

Acknowledgements

dfccd4a

Signed-off-by: Olatunji Ruwase <[email protected]>

Fix filename dangling pointer

898ea6d

Signed-off-by: Olatunji Ruwase <[email protected]>

sglang/zero_inference support

f82fb75

Blog updates

07a9bd0

Rebase

840fa8c

Formatting

05b3bdf

tjruwase requested review from tohtana and jomayeri April 12, 2025 18:35

tjruwase requested a review from loadams as a code owner April 12, 2025 18:35

tjruwase requested a review from GuanhuaWang April 12, 2025 18:35

tjruwase added 3 commits April 12, 2025 14:53

Update news section

26db85f

Revert unneeded changes

7a30fba

rebase

2a8cbd0

tjruwase mentioned this pull request Apr 12, 2025

DeepNVMe update deepspeedai/DeepSpeedExamples#966

Open

Fix UTs

11ef763

Signed-off-by: Olatunji Ruwase <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepNVMe update #7215

DeepNVMe update #7215

tjruwase commented Apr 12, 2025

DeepNVMe update #7215

Are you sure you want to change the base?

DeepNVMe update #7215

Conversation

tjruwase commented Apr 12, 2025