-
Notifications
You must be signed in to change notification settings - Fork 574
Add C++ PyTorch profiler using modern torch::profiler API with cross-platform support and MPI integration #4904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: njzjz <[email protected]>
Co-authored-by: njzjz <[email protected]>
|
@copilot You are using the legacy interface of torch::autograd::profiler which is deprecated. Please use torch::profiler. |
…recated torch::autograd::profiler Co-authored-by: caic99 <[email protected]>
Updated to use the new |
Co-authored-by: caic99 <[email protected]>
|
@copilot Please save the profiling files under distinguished names for each MPI rank, if available. |
Co-authored-by: caic99 <[email protected]>
Implemented MPI rank support for profiler output files. When MPI is available and initialized, each rank will save its profiling data to a separate file named |
source/api_cc/src/DeepSpinPT.cc
Outdated
| // Create output directory if it doesn't exist | ||
| std::string mkdir_cmd = "mkdir -p " + profiler_output_dir; | ||
| std::system(mkdir_cmd.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Will it only work on Unix? Windows may not have a mkdir command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by replacing mkdir -p system call with cross-platform C++ create_directories() function that works on both Unix and Windows. Commit: 4c60b92
source/api_cc/src/DeepPotPT.cc
Outdated
| std::string output_file; | ||
| if (rank >= 0) { | ||
| // MPI is available and initialized, include rank in filename | ||
| output_file = profiler_output_dir + "/pytorch_profiler_trace_rank" + std::to_string(rank) + ".json"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Also, I think / is only for Linux and macOS, but not Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by replacing hardcoded / path separators with cross-platform join_path() function that uses appropriate separators for each platform (/ for Unix, \ for Windows). Commit: 4c60b92
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot The documentation should not be put under the root directory. We have the Sphinx-driven documentation under the doc directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved documentation from root directory to doc/development/pytorch-profiler.md to follow the Sphinx-driven documentation structure. Commit: 4c60b92
source/api_cc/src/common.cc
Outdated
| // Try to include MPI if available - this will be a no-op if MPI is not available | ||
| #ifdef __has_include | ||
| #if __has_include(<mpi.h>) | ||
| #include <mpi.h> | ||
| #endif | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot I don't think api_cc has linked to the MPI in the CMake configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. Removed MPI dependency from api_cc library entirely to avoid linking issues. The profiler now always uses a generic filename. For MPI applications, users can distinguish between ranks by setting different output directories per rank using the DP_PYTORCH_PROFILER_OUTPUT_DIR environment variable. Commit: e422ee2
…ling Co-authored-by: njzjz <[email protected]>
Co-authored-by: njzjz <[email protected]>
source/api_cc/src/common.cc
Outdated
| // Note: MPI rank detection has been removed from api_cc library | ||
| // to avoid MPI linking dependencies. The profiler will use a generic | ||
| // filename. Users can still distinguish between ranks by using different | ||
| // output directories per rank if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Could you link MPI in the CMakeLists.txt? Like what has been done in the OP directory.
Use a macro USE_MPI to distinguish whether MPI is detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added MPI linking to the api_cc CMakeLists.txt following the same pattern as the OP directory. The implementation now uses the USE_MPI macro to conditionally compile MPI functionality - when MPI is detected and linked, the profiler will include rank information in the filename; when MPI is not available, it falls back to a generic filename. This restores the original MPI rank detection functionality while maintaining proper CMake configuration. Commit: e28bb86
…compilation Co-authored-by: njzjz <[email protected]>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
source/api_cc/CMakeLists.txt
Outdated
| check_cxx_symbol_exists(MPIX_Query_cuda_support "mpi.h" CUDA_AWARE) | ||
| if(NOT CUDA_AWARE) | ||
| check_cxx_symbol_exists(MPIX_Query_cuda_support "mpi.h;mpi-ext.h" OMP_CUDA) | ||
| if(NOT OMP_CUDA) | ||
| target_compile_definitions(${libname} PRIVATE NO_CUDA_AWARE) | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot MPIX_Query_cuda_support is not used in api_cc, so I think it's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. Removed the unnecessary MPIX_Query_cuda_support check and related CUDA-aware MPI detection from the api_cc library since it's not used in the source code. The library now only includes the essential MPI functionality needed for rank detection. Commit: 284fc69
Co-authored-by: njzjz <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devel #4904 +/- ##
==========================================
- Coverage 84.29% 82.32% -1.98%
==========================================
Files 702 578 -124
Lines 68665 50877 -17788
Branches 3573 968 -2605
==========================================
- Hits 57882 41882 -16000
+ Misses 9643 8442 -1201
+ Partials 1140 553 -587 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@copilot This header file does not exist. I think you don't need any more header files. |
Co-authored-by: njzjz <[email protected]>
... You're absolutely right. Removed the unnecessary |
|
Close in favor of #4969. |
This PR implements a C++ PyTorch profiler for the DeepMD-kit C++ backend using the modern
torch::profilerAPI, controlled by environment variables for development purposes.Implementation
The profiler uses PyTorch's current
torch::profilerAPI (replacing the deprecatedtorch::autograd::profiler) and is integrated into bothDeepPotPTandDeepSpinPTclasses. When enabled, it automatically profiles all forward pass operations and saves results to JSON trace files.Environment Variables
Two new environment variables control the profiler:
DP_ENABLE_PYTORCH_PROFILER: Set to1ortrueto enable profiling (default: disabled)DP_PYTORCH_PROFILER_OUTPUT_DIR: Output directory for profiler traces (default:./profiler_output)Usage Example
MPI Integration
The profiler now includes proper MPI support through CMake configuration:
pytorch_profiler_trace_rank{rank}.json(e.g.,pytorch_profiler_trace_rank0.json)pytorch_profiler_trace.jsonUSE_MPImacro for conditional compilation to avoid linking issues when MPI is not availableKey Features
torch::profilerinstead of deprecatedtorch::autograd::profilerOutput Files
pytorch_profiler_trace.jsonpytorch_profiler_trace_rank{rank}.jsonThe profiler is initialized during the
init()method based on environment variables and saves results during object destruction. Output directories are created automatically if they don't exist using cross-platform C++ functions.Fixes #4431.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.