Skip to content

Conversation

@Chengqian-Zhang
Copy link
Collaborator

@Chengqian-Zhang Chengqian-Zhang commented Nov 8, 2025

In this PR:

  1. Support writing fitting stat to stat_file and loading fitting stat from stat_file
  2. Ensure the fitting stat calculate is correct when using default_fparam
  3. Support sharing fitting stat when using share_fitting in multitask mode.
  4. Print the process of calculating fitting stat to the board via log.info.

Summary by CodeRabbit

  • New Features

    • Default frame parameters auto-fill missing samples and are exposed via a new accessor.
    • Per-parameter statistics can be computed, saved to disk, and restored.
    • Multitask training supports probability-weighted parameter sharing with a protection factor; default fparam propagated into data requirements.
    • Statistics item supports scalar scaling for aggregation.
  • Refactor

    • Parameter-sharing and statistic propagation flows reorganized for consistent buffering and persistence.
  • Tests

    • Extensive new tests for stat computation, persistence, and multitask sharing (includes new test data).

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added the Python label Nov 8, 2025
@Chengqian-Zhang Chengqian-Zhang marked this pull request as draft November 8, 2025 10:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 8, 2025

📝 Walkthrough

Walkthrough

Adds default frame-parameter (fparam) accessors and automatic population; extends fitting statistics with on-disk persistence, NumPy-based aggregation, and protection; threads per-model probabilities and a protection factor into parameter-sharing across wrapper, trainer, and fitting layers; adds tests and test data for these behaviors.

Changes

Cohort / File(s) Summary
DP atomic & model API
deepmd/pt/model/atomic_model/dp_atomic_model.py, deepmd/pt/model/model/make_model.py
Added get_default_fparam() to expose default fparam; wrapped_sampler populates missing fparam from default when available.
Fitting / statistics core
deepmd/pt/model/task/fitting.py
Extended share_params() to accept model probability and protection; added stat persistence methods (save_to_file_fparam/aparam, restore_*); compute_input_stats() accepts stat_file_path, uses NumPy aggregation, optionally loads/saves per-type stat files; added get_stats() and get_default_fparam().
Training orchestration
deepmd/pt/train/training.py
Compute/normalize per-model probabilities; derive and validate common data_stat_protect; pass model_key_prob_map and data_stat_protect into share_params; propagate default fparam into DataRequirementItem for fparam.
Wrapper parameter sharing
deepmd/pt/train/wrapper.py
share_params() signature expanded to accept model_key_prob_map and data_stat_protect; computes per-link frac_prob and forwards model_prob and protection into underlying share_params calls.
Stat utilities
deepmd/utils/env_mat_stat.py
StatItem constructor loosened to accept float number; added __mul__(self, scalar: float) for scalar scaling of statistics.
Tests & test data
source/tests/pt/model/water/data/... , source/tests/pt/test_fitting_stat.py
Added raw test data files and comprehensive tests for stat computation, file I/O, multi-task weighting, protection, and default fparam handling.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer
    participant Wrapper
    participant Fitting
    participant DPAtomicModel

    Trainer->>Trainer: build model_key_prob_map & data_stat_protect
    Trainer->>Wrapper: share_params(shared_links, model_key_prob_map, data_stat_protect)
    activate Wrapper
    Wrapper->>Wrapper: for each link compute frac_prob = prob_link/prob_base
    Wrapper->>Fitting: share_params(base, level, model_prob=frac_prob, protection=data_stat_protect, resume)
    deactivate Wrapper

    Trainer->>DPAtomicModel: request data requirements (asks for default fparam)
    DPAtomicModel->>Fitting: compute_input_stats(merged, protection, stat_file_path)
    activate Fitting
    alt stat files exist
        Fitting->>Fitting: restore_fparam_from_file / restore_aparam_from_file
    else
        Fitting->>Fitting: aggregate stats (NumPy), apply protection
        Fitting->>Fitting: save_to_file_fparam / save_to_file_aparam
    end
    Fitting->>DPAtomicModel: return stats and default_fparam (if any)
    deactivate Fitting
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas needing extra attention:

  • deepmd/pt/model/task/fitting.py — stat aggregation, NumPy ↔ torch conversions, file I/O format, StatItem usage, and buffer/link semantics when sharing params.
  • deepmd/pt/train/wrapper.py & deepmd/pt/train/training.py — correctness of model probability computation/normalization and propagation to share_params calls.
  • deepmd/pt/model/atomic_model/dp_atomic_model.py — sampler behavior when injecting default fparam and interplay with DataRequirementItem defaults.
  • Tests — validate test assumptions about file formats and stat protection logic.

Possibly related PRs

Suggested reviewers

  • wanghan-iapcm
  • njzjz
  • anyangml

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes a specific fix related to calculating fitting statistics with default fparam and shared fitting, which aligns with the main objectives of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
deepmd/pt/model/model/make_model.py (1)

9-9: Remove unused numpy import.

The numpy import is not used anywhere in this file.

Apply this diff:

-import numpy as np
deepmd/pt/train/training.py (1)

636-642: Fix unnecessary f-string prefix.

The assertion message on line 637 uses an f-string without any placeholders.

Apply this diff:

-            assert np.allclose(_data_stat_protect, _data_stat_protect[0]), f"Model key 'data_stat_protect' must be the same in each branch when multitask!"
+            assert np.allclose(_data_stat_protect, _data_stat_protect[0]), "Model key 'data_stat_protect' must be the same in each branch when multitask!"

The logic correctly validates consistency and propagates the protection value to parameter sharing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25fa707 and 4c3072e.

📒 Files selected for processing (6)
  • deepmd/pt/model/atomic_model/dp_atomic_model.py (3 hunks)
  • deepmd/pt/model/model/make_model.py (2 hunks)
  • deepmd/pt/model/task/fitting.py (6 hunks)
  • deepmd/pt/train/training.py (2 hunks)
  • deepmd/pt/train/wrapper.py (2 hunks)
  • deepmd/utils/env_mat_stat.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Always run ruff check . and ruff format . before committing changes to Python code

Files:

  • deepmd/pt/train/wrapper.py
  • deepmd/pt/model/atomic_model/dp_atomic_model.py
  • deepmd/utils/env_mat_stat.py
  • deepmd/pt/train/training.py
  • deepmd/pt/model/task/fitting.py
  • deepmd/pt/model/model/make_model.py
🧬 Code graph analysis (5)
deepmd/pt/train/wrapper.py (1)
deepmd/pt/model/task/fitting.py (1)
  • share_params (66-128)
deepmd/pt/model/atomic_model/dp_atomic_model.py (4)
deepmd/pt/model/model/make_model.py (2)
  • has_default_fparam (530-532)
  • get_default_fparam (535-536)
deepmd/pt/model/task/fitting.py (3)
  • has_default_fparam (599-601)
  • get_default_fparam (603-604)
  • compute_input_stats (208-269)
deepmd/pd/model/atomic_model/dp_atomic_model.py (2)
  • has_default_fparam (414-416)
  • wrapped_sampler (387-397)
deepmd/pt/model/atomic_model/base_atomic_model.py (1)
  • has_default_fparam (138-140)
deepmd/pt/train/training.py (4)
deepmd/pt/model/task/fitting.py (4)
  • share_params (66-128)
  • get_default_fparam (603-604)
  • has_default_fparam (599-601)
  • get_dim_fparam (595-597)
deepmd/pt/train/wrapper.py (1)
  • share_params (63-139)
deepmd/pt/model/atomic_model/dp_atomic_model.py (3)
  • get_default_fparam (355-356)
  • has_default_fparam (351-353)
  • get_dim_fparam (347-349)
deepmd/utils/data.py (1)
  • DataRequirementItem (745-825)
deepmd/pt/model/task/fitting.py (5)
deepmd/utils/path.py (13)
  • DPPath (28-158)
  • mkdir (149-158)
  • mkdir (270-282)
  • mkdir (472-490)
  • save_numpy (70-77)
  • save_numpy (200-211)
  • save_numpy (358-370)
  • load_numpy (50-57)
  • load_numpy (180-188)
  • load_numpy (335-343)
  • is_dir (115-116)
  • is_dir (249-251)
  • is_dir (439-445)
deepmd/utils/env_mat_stat.py (3)
  • StatItem (26-98)
  • compute_avg (58-73)
  • compute_std (75-98)
deepmd/pt/utils/utils.py (6)
  • to_numpy_array (224-224)
  • to_numpy_array (228-228)
  • to_numpy_array (231-247)
  • to_torch_tensor (251-251)
  • to_torch_tensor (255-255)
  • to_torch_tensor (258-276)
deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
  • get_default_fparam (355-356)
deepmd/pt/model/model/make_model.py (1)
  • get_default_fparam (535-536)
deepmd/pt/model/model/make_model.py (3)
deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
  • get_default_fparam (355-356)
deepmd/pt/model/task/fitting.py (1)
  • get_default_fparam (603-604)
deepmd/pt/model/network/network.py (1)
  • Tensor (36-37)
🪛 Ruff (0.14.3)
deepmd/pt/train/training.py

637-637: f-string without any placeholders

Remove extraneous f prefix

(F541)

deepmd/pt/model/task/fitting.py

269-270: Expected an indented block after if statement

(invalid-syntax)


272-272: unindent does not match any outer indentation level

(invalid-syntax)


272-272: Expected a statement

(invalid-syntax)


272-272: Expected a statement

(invalid-syntax)


272-273: Expected a statement

(invalid-syntax)


273-273: Unexpected indentation

(invalid-syntax)


297-297: unindent does not match any outer indentation level

(invalid-syntax)


298-298: Unexpected indentation

(invalid-syntax)


304-304: unindent does not match any outer indentation level

(invalid-syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (13)
deepmd/utils/env_mat_stat.py (1)

51-56: LGTM!

The scalar multiplication operator correctly scales all statistical components for probability-weighted aggregation in multitask training. The implementation properly supports the weighted averaging workflow where statistics from multiple models are combined using probability weights.

deepmd/pt/model/model/make_model.py (1)

534-536: LGTM!

The method correctly delegates to the atomic model and follows the established pattern for other similar accessors in this class.

deepmd/pt/train/wrapper.py (1)

63-63: LGTM!

The extended signature correctly supports probability-weighted parameter sharing for multitask training. The parameters align with the updated share_params implementation in the fitting net.

deepmd/pt/model/atomic_model/dp_atomic_model.py (2)

329-337: LGTM!

The logic correctly populates missing fparam with default values when available. The check for both "find_fparam" and "fparam" ensures proper handling of data loading states.


342-342: LGTM!

The stat_file_path propagation enables proper persistence of fparam/aparam statistics, and the get_default_fparam method correctly delegates to the fitting net.

Also applies to: 355-356

deepmd/pt/train/training.py (2)

619-632: LGTM!

The model probability calculation correctly supports both explicit configuration and data-driven defaults, with proper normalization and validation to ensure a valid probability distribution.


1344-1351: LGTM!

The default fparam handling correctly retrieves and converts the default value from the model, passing it to the data requirement with proper type conversion.

deepmd/pt/model/task/fitting.py (6)

66-128: LGTM!

The extended share_params correctly implements probability-weighted parameter sharing for multitask training. The logic properly accumulates weighted statistics for fparam/aparam buffers and links them to the base class.


130-206: LGTM!

The persistence methods correctly save and restore fparam/aparam statistics using numpy arrays, with proper path handling and logging.


208-266: LGTM!

The fparam statistics computation correctly implements the load-or-compute pattern with proper persistence and type conversions.


304-310: LGTM!

The get_stats method properly validates that statistics have been computed before returning them.


603-604: LGTM!

The method correctly exposes the default fparam tensor and aligns with the existing has_default_fparam accessor.


11-11: LGTM!

The new imports are properly used throughout the file for type hints and statistics handling.

Also applies to: 45-50

@codecov
Copy link

codecov bot commented Nov 8, 2025

Codecov Report

❌ Patch coverage is 88.81579% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.26%. Comparing base (da452d7) to head (d6120a0).

Files with missing lines Patch % Lines
deepmd/pt/model/task/fitting.py 87.80% 15 Missing ⚠️
deepmd/pt/train/training.py 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #5038      +/-   ##
==========================================
+ Coverage   84.24%   84.26%   +0.01%     
==========================================
  Files         709      709              
  Lines       70236    70345     +109     
  Branches     3623     3620       -3     
==========================================
+ Hits        59169    59274     +105     
- Misses       9900     9902       +2     
- Partials     1167     1169       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Chengqian-Zhang Chengqian-Zhang marked this pull request as ready for review November 9, 2025 12:20
Comment on lines +51 to +56
def __mul__(self, scalar: float) -> "StatItem":
return StatItem(
number=self.number * scalar,
sum=self.sum * scalar,
squared_sum=self.squared_sum * scalar,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some type issues here:
number is int and scalar is float, int * float = float, so it cannot be assigned to number (expected an int),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using share_fitting, the fitting input stat will be weighted by model_prob. So StatItem.number may become float.
I change the type of number from int to float. Do you think it is resonable?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/utils/env_mat_stat.py (1)

53-58: Consider adding __rmul__ for symmetric scalar multiplication.

Right now StatItem * scalar works but scalar * StatItem will not. If you expect stats to be scaled inside generic numeric code (e.g., with map/sum or broadcasting), adding __rmul__ improves ergonomics without changing behavior.

You could implement it as:

 class StatItem:
@@
     def __mul__(self, scalar: float) -> "StatItem":
         return StatItem(
             number=self.number * scalar,
             sum=self.sum * scalar,
             squared_sum=self.squared_sum * scalar,
         )
+
+    def __rmul__(self, scalar: float) -> "StatItem":
+        return self * scalar
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f18144 and d6120a0.

📒 Files selected for processing (1)
  • deepmd/utils/env_mat_stat.py (2 hunks)
🔇 Additional comments (1)
deepmd/utils/env_mat_stat.py (1)

31-42: Switching number to float aligns with scalar scaling and persistence.

Updating number to float in both the docstring and constructor is consistent with __mul__ and avoids the previous int/float type mismatch when scaling stats (e.g., by probabilities or weights). This also matches how values are saved/loaded via NumPy arrays where everything is stored as floats.

@Chengqian-Zhang
Copy link
Collaborator Author

Who please can help me rerun the UT?
I do not change code related to C library.

@iProzd iProzd enabled auto-merge November 20, 2025 11:13
auto-merge was automatically disabled November 23, 2025 09:01

Head branch was pushed to by a user without write access

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6120a0 and 07483a7.

📒 Files selected for processing (2)
  • deepmd/pt/model/atomic_model/dp_atomic_model.py (2 hunks)
  • deepmd/pt/model/model/make_model.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
deepmd/pt/model/atomic_model/dp_atomic_model.py (2)
deepmd/pt/model/model/make_model.py (2)
  • has_default_fparam (531-533)
  • get_default_fparam (535-536)
deepmd/pt/model/task/fitting.py (2)
  • has_default_fparam (639-641)
  • get_default_fparam (643-644)
deepmd/pt/model/model/make_model.py (2)
deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
  • get_default_fparam (373-374)
deepmd/pt/model/task/fitting.py (1)
  • get_default_fparam (643-644)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (2)
deepmd/pt/model/model/make_model.py (1)

535-536: LGTM! Consistent delegation pattern.

The new get_default_fparam() method correctly delegates to the atomic model, following the same pattern as existing methods like has_default_fparam() (line 531-533). This provides clean access to default frame parameters through the model hierarchy.

deepmd/pt/model/atomic_model/dp_atomic_model.py (1)

373-374: LGTM! Consistent delegation pattern.

The new get_default_fparam() method correctly delegates to the fitting net, following the same pattern as has_default_fparam() (line 369-371). This provides the necessary accessor for default frame parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants