Fix: supporting gpt-oss HF eagle #398

h-guo18 · 2025-10-01T20:35:48Z

What does this PR do?

Type of change: Bug fix

Overview:
This PR contains two minor fix to support gpt-oss eagle training:

Added head_dim in default eagle config to prevent Llama inferring the head_dim by hidden_size/num_heads. This leads to wrong head dim for models like GPT-oss, where hidden_size != num_heads * head_dim.
Refactored eagle checkpoint export to avoid passing speculative decoding model to _export_hf_checkpoint, which triggers error for offline training checkpoints.

Other changes:

Updated readme for data format instructions
update redundant wandb.init() in eagle_utils.py;

Usage

Not changed.

Testing

Tested with gpt-oss-120b with offline training, export, and tested checkpoint on spec-bench.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added a dedicated speculative-decoding export path for eagle-mode checkpoints and helpers to export speculative checkpoints.
Documentation
- Added Data Preparation and Other Datasets guidance, JSONL examples, data synthesis notes, and clarified evaluation: in-framework only for online training.
- Updated example commands to use input_conversations/daring-anteater.jsonl.
Refactor
- Removed legacy speculative-decoding post-processing; unified export control flow.
Config
- Default eagle head_dim set to 128.
Tests
- Test fixture updated to include head_dim = 64.
Chores
- Deferred wandb initialization to runtime (callback construction).

copy-pr-bot · 2025-10-01T20:35:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2025-10-01T20:35:56Z

Walkthrough

Adds an eagle-only speculative-decoding export path gated by spec_opt_only with dedicated exporters for the draft state_dict and config; export_hf_checkpoint early-returns for speculative-only models writing model.safetensors and config.json; updates examples/README dataset prep and commands; defers wandb.init() to callback init; adds head_dim fields in configs/tests.

Changes

Cohort / File(s)	Summary
Documentation / Examples `examples/speculative_decoding/README.md`	Adds "Data Preparation" section, switches dataset handling to `add_daring_anteater.py`, updates command examples to use `input_conversations/daring-anteater.jsonl`, clarifies evaluation guidance and other-datasets/data synthesis notes.
Example utilities `examples/speculative_decoding/eagle_utils.py`	Removes unconditional `wandb.init()` at import time; `wandb.init()` invoked only inside `ARValidationCallback.__init__` when available.
Example scripts `examples/speculative_decoding/train_eagle3_and_export.sh`	Updates default `DATA` path from `Daring-Anteater/train.jsonl` to `input_conversations/daring-anteater.jsonl`.
Speculative export plugin API `modelopt/torch/export/plugins/hf_spec_export.py`	Adds `spec_opt_only(model)`, `export_spec_ckpt_state_dict(model)`, `export_spec_ckpt_config(model)`; removes `rename_and_prune_if_spec_decoding` and `set_config_if_spec_decoding`; export logic now asserts eagle-only mode and exports draft state/config in official format.
Unified HF export flow `modelopt/torch/export/unified_export_hf.py`	Adds early-exit for speculative-only models: checks `spec_opt_only`, uses `export_spec_ckpt_state_dict` + `safetensors.save_file` to write `model.safetensors`, writes `config.json` via `export_spec_ckpt_config`, then returns; removes prior speculative post-processing from normal export path.
Eagle default config `modelopt/torch/speculative/eagle/default_config.py`	Adds `head_dim: 128` to `default_eagle_config`.
Tests `tests/examples/speculative_decoding/test_eagle.py`	Updates tiny EAGLE test config to include `head_dim: 64` for the test fixture.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Export as export_hf_checkpoint
  participant Plugin as hf_spec_export
  participant Saver as safetensors.save_file
  participant Default as _export_hf_checkpoint

  Caller->>Export: export_hf_checkpoint(model, out_dir)
  Export->>Plugin: spec_opt_only(model)
  alt eagle-only speculative
    Export->>Plugin: export_spec_ckpt_state_dict(model)
    Plugin-->>Export: draft_state_dict
    Export->>Saver: save_file(draft_state_dict, "out_dir/model.safetensors")
    Export->>Plugin: export_spec_ckpt_config(model)
    Plugin-->>Export: config_json
    Export->>Export: write "out_dir/config.json"
    Export-->>Caller: return (early exit)
  else non-speculative or mixed
    Export->>Default: _export_hf_checkpoint(model, out_dir)
    Default-->>Export: standard artifacts
    Export-->>Caller: return
  end

  note over Export,Plugin: Removed rename/prune and config-adjust hooks from normal path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Thump-thump, I hop with tidy cheer,
Drafts saved snug in safetensors near.
Wandb waits till callbacks start,
New data scripts play their part.
Head_dims added—hop, code, and cheer 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly indicates that the pull request fixes support for GPT-OSS HF Eagle models and directly reflects the main objective of the changeset, which is to add the `head_dim` configuration and refactor checkpoint export for speculative decoding. It is concise, specific, and highlights the primary change without extraneous details.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/gptoss-fix

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4d51eb and 80c6cad.

📒 Files selected for processing (7)

examples/speculative_decoding/README.md (8 hunks)
examples/speculative_decoding/eagle_utils.py (1 hunks)
examples/speculative_decoding/train_eagle3_and_export.sh (1 hunks)
modelopt/torch/export/plugins/hf_spec_export.py (2 hunks)
modelopt/torch/export/unified_export_hf.py (3 hunks)
modelopt/torch/speculative/eagle/default_config.py (1 hunks)
tests/examples/speculative_decoding/test_eagle.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

tests/examples/speculative_decoding/test_eagle.py
modelopt/torch/export/unified_export_hf.py
modelopt/torch/speculative/eagle/default_config.py
modelopt/torch/export/plugins/hf_spec_export.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (6)

examples/speculative_decoding/train_eagle3_and_export.sh (1)

23-23: LGTM! Path updated to use new data preparation workflow.

The default data path has been updated to use the new input_conversations/ directory structure, which aligns with the script-based data preparation approach documented in the README.

examples/speculative_decoding/eagle_utils.py (1)

398-399: LGTM! Defers wandb initialization to avoid import-time side effects.

Moving wandb.init() from module import time to the callback's __init__ method is a best practice that:

Avoids unintended side effects when importing the module

Allows the module to be imported even when wandb integration is not needed

Maintains the same functionality when the callback is actually used

examples/speculative_decoding/README.md (4)

46-54: LGTM! Clear documentation of new data preparation workflow.

The new "Data Preparation" section provides clear instructions for users to prepare the Daring-Anteater dataset using the provided script, replacing the previous Git LFS approach with a more streamlined workflow.

76-76: LGTM! Consistent path updates across all examples.

All command examples have been consistently updated to use input_conversations/daring-anteater.jsonl, aligning with the new data preparation workflow. The changes are applied uniformly across:

Online training examples

Offline training (hidden states generation with both TRT-LLM and HF)

Data synthesis examples

Vocabulary calibration examples

Also applies to: 96-96, 107-107, 226-226, 238-238

135-135: Important clarification about evaluation scope.

The updated note correctly clarifies that in-framework evaluation is supported only for online training checkpoints. For offline training checkpoints, users must export the model first and then evaluate using serving frameworks.

This is an important operational distinction that aligns with the PR's checkpoint export refactoring objectives.

188-209: LGTM! Comprehensive dataset documentation.

The new "Other Datasets" section provides:

A clear list of available dataset preparation scripts

A well-structured JSONL format example showing the required schema

Clear guidance for users who want to use custom datasets

The schema documentation (conversation_id, conversations with role/content fields) is particularly helpful for ensuring correct data formatting.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-01T20:58:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (3a76d28) to head (80c6cad).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #398   +/-   ##
=======================================
  Coverage   73.79%   73.79%           
=======================================
  Files         171      171           
  Lines       17591    17591           
=======================================
+ Hits        12981    12982    +1     
+ Misses       4610     4609    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

modelopt/torch/export/unified_export_hf.py (1)
512-518: Consider honoring save_modelopt_state parameter and adding error handling.

The early-exit path correctly prevents errors with offline training checkpoints, but consider these improvements:

The save_modelopt_state parameter (line 499) is unused in this path. If users request modelopt state preservation, should it be saved separately?

File write operations lack error handling, unlike the try-except block in the standard export path (lines 520-550).

Consider using Path operations for consistency: export_dir / "model.safetensors" instead of f-strings.

Optional refactor to use Path operations:
     if spec_opt_only(model):
-        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
-        with open(f"{export_dir}/config.json", "w") as file:
+        save_file(export_spec_ckpt_state_dict(model), export_dir / "model.safetensors")
+        with open(export_dir / "config.json", "w") as file:
             json.dump(export_spec_ckpt_config(model), file, indent=4)
         return
modelopt/torch/export/plugins/hf_spec_export.py (3)
77-79: Track the TODO for cleaner lm_head.weight handling.

The temporary fix for handling missing eagle_lm_head.weight works but should be addressed. The fallback to model.state_dict()["lm_head.weight"] could fail if the key doesn't exist in the base model either.

Do you want me to open a new issue to track this technical debt?

141-141: Fix typo in comment.

Minor typo: "load fron eagle config" should be "load from eagle config".
-            # First, we try to load fron eagle config.
+            # First, we try to load from eagle config.
62-62: Consider more descriptive assertion message.

The assertion message "Not purely eagle model." could be more helpful for debugging. Consider providing information about what optimization modes were found.
-    assert spec_opt_only(model), "Not purely eagle model."
+    opt_modes = getattr(model, "_modelopt_state", None)
+    assert spec_opt_only(model), (
+        f"Expected purely eagle model but found optimization modes: {opt_modes}. "
+        "This export path only supports models with a single 'eagle' optimization."
+    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb44c55 and b8b5007.

📒 Files selected for processing (4)

examples/speculative_decoding/README.md (1 hunks)
modelopt/torch/export/plugins/hf_spec_export.py (2 hunks)
modelopt/torch/export/unified_export_hf.py (3 hunks)
modelopt/torch/speculative/eagle/default_config.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/export/unified_export_hf.py (1)

modelopt/torch/export/plugins/hf_spec_export.py (3)

export_spec_ckpt_config (84-148)

export_spec_ckpt_state_dict (59-81)

spec_opt_only (51-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait

🔇 Additional comments (1)

modelopt/torch/speculative/eagle/default_config.py (1)

50-50: Verify head_dim in eagle default_config
In modelopt/torch/speculative/eagle/default_config.py (line 50), head_dim is set to 64. Confirm that this matches hidden_size / num_attention_heads in the same file (or document why it intentionally differs) to avoid silent mis-inference.

modelopt/torch/speculative/eagle/default_config.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

modelopt/torch/export/unified_export_hf.py (2)
515-517: Prefer Path operations over f-string concatenation.

Since export_dir is already a Path object (line 509), using Path operations would be more idiomatic and portable.

Apply this diff:
-        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
-        with open(f"{export_dir}/config.json", "w") as file:
+        save_file(export_spec_ckpt_state_dict(model), str(export_dir / "model.safetensors"))
+        with open(export_dir / "config.json", "w") as file:
512-513: Consider replacing the temporary workaround with a permanent solution.

The comment indicates this is a "temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint". Consider investigating the root cause and implementing a more robust solution that properly handles speculative decoding models within the standard export flow.

Do you want me to help identify the specific errors that occur when passing speculative decoding models to _export_hf_checkpoint, or open an issue to track the permanent fix?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36222c4 and e386c69.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_hf.py (3 hunks)

🔇 Additional comments (3)

modelopt/torch/export/unified_export_hf.py (3)

29-29: LGTM!

The save_file import from safetensors.torch is correctly added to support the new speculative-only export path.

57-57: Verify the new import functions are properly implemented.

Ensure that export_spec_ckpt_config, export_spec_ckpt_state_dict, and spec_opt_only from .plugins are correctly implemented, handle edge cases, and have appropriate test coverage.

Run the following script to verify the functions exist and review their implementation:

512-518: Verify if hf_quant_config.json should be saved for backward compatibility.

The main export path saves hf_quant_config.json for backward compatibility (line 524), but the early exit path for speculative-only models does not. Verify whether speculative-only models need this file or if downstream tools will handle its absence correctly.

Run the following script to check if any downstream code expects this file:

coderabbitai · 2025-10-07T23:02:13Z

modelopt/torch/export/unified_export_hf.py

+    # NOTE: (hg) Early exit for speculative decoding models
+    # This is a temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint
+    if spec_opt_only(model):
+        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
+        with open(f"{export_dir}/config.json", "w") as file:
+            json.dump(export_spec_ckpt_config(model), file, indent=4)
+        return


⚠️ Potential issue | 🟠 Major

Add error handling to the early exit path.

The early exit path lacks error handling that exists in the main export flow (lines 520-550). If save_file or the file write operations fail, the error will propagate without proper context or cleanup.

Apply this diff to add error handling:

export_dir.mkdir(parents=True, exist_ok=True) # NOTE: (hg) Early exit for speculative decoding models # This is a temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint if spec_opt_only(model): - save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors") - with open(f"{export_dir}/config.json", "w") as file: - json.dump(export_spec_ckpt_config(model), file, indent=4) - return + try: + save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors") + with open(f"{export_dir}/config.json", "w") as file: + json.dump(export_spec_ckpt_config(model), file, indent=4) + return + except Exception as e: + warnings.warn( + "Cannot export speculative-only model to the model_config. The modelopt-optimized model state_dict" + " can be saved with torch.save for further inspection." + ) + raise e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# NOTE: (hg) Early exit for speculative decoding models

# This is a temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint

if spec_opt_only(model):

save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")

with open(f"{export_dir}/config.json", "w") as file:

json.dump(export_spec_ckpt_config(model), file, indent=4)

return

export_dir.mkdir(parents=True, exist_ok=True)

# NOTE: (hg) Early exit for speculative decoding models

# This is a temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint

if spec_opt_only(model):

try:

save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")

with open(f"{export_dir}/config.json", "w") as file:

json.dump(export_spec_ckpt_config(model), file, indent=4)

return

except Exception as e:

warnings.warn(

"Cannot export speculative-only model to the model_config. The modelopt-optimized model state_dict"

" can be saved with torch.save for further inspection."

)

raise e

🤖 Prompt for AI Agents

In modelopt/torch/export/unified_export_hf.py around lines 512 to 518, the early exit branch that writes the speculative checkpoint lacks the error handling present in the main export path; wrap the calls to save_file(...) and writing config.json in a try/except, and on exception log a clear error message including the exception details, remove any partially written files (model.safetensors and config.json) and the export_dir if appropriate, then re-raise the exception so callers can handle it (mirror the main export flow's cleanup and logging behavior).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

modelopt/torch/export/unified_export_hf.py (1)

516-522: Add error handling to the early exit path.

The early exit path lacks error handling that exists in the main export flow (lines 524-554). If save_file or file write operations fail, errors will propagate without proper context or cleanup.

Apply this diff to add error handling:

     export_dir.mkdir(parents=True, exist_ok=True)
 
     # NOTE: (hg) Early exit for speculative decoding models
     # This is a temp workaround to avoid error with offline spec ckpt during _export_hf_checkpoint
     if spec_opt_only(model):
-        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
-        with open(f"{export_dir}/config.json", "w") as file:
-            json.dump(export_spec_ckpt_config(model), file, indent=4)
-        return
+        try:
+            save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
+            with open(f"{export_dir}/config.json", "w") as file:
+                json.dump(export_spec_ckpt_config(model), file, indent=4)
+            return
+        except Exception as e:
+            warnings.warn(
+                "Cannot export speculative-only model to the model_config. The modelopt-optimized model state_dict"
+                " can be saved with torch.save for further inspection."
+            )
+            raise e

🧹 Nitpick comments (1)

examples/speculative_decoding/README.md (1)
118-118: Clarify "in-framework evaluation" terminology.

The note states that "in-framework evaluation is supported only for online training," but the term "in-framework" may not be immediately clear to users. Consider briefly explaining what this means (e.g., evaluation using the training framework vs. serving frameworks) for better clarity.

Consider this alternative wording:
-**Note**: In-framework evaluation is supported only for online training. For offline training checkpoints, please export the model and evaluate it using serving frameworks.
+**Note**: Direct evaluation using `ar_validate.py` is supported only for online training checkpoints (which include both the base and draft models). For offline training checkpoints (draft model only), please export the model first and evaluate it using serving frameworks like TRT-LLM or SGLang.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e386c69 and 08bf1d3.

📒 Files selected for processing (4)

examples/speculative_decoding/README.md (1 hunks)
modelopt/torch/export/plugins/hf_spec_export.py (2 hunks)
modelopt/torch/export/unified_export_hf.py (3 hunks)
modelopt/torch/speculative/eagle/default_config.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/speculative/eagle/default_config.py
modelopt/torch/export/plugins/hf_spec_export.py

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/export/unified_export_hf.py (1)

modelopt/torch/export/plugins/hf_spec_export.py (3)

export_spec_ckpt_config (84-148)

export_spec_ckpt_state_dict (59-81)

spec_opt_only (51-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (3)

modelopt/torch/export/unified_export_hf.py (3)

29-29: LGTM!

The import of save_file from safetensors.torch is appropriate for the new early exit path that writes model.safetensors.

57-57: LGTM!

The updated imports align with the new speculative-only export architecture. The functions spec_opt_only, export_spec_ckpt_state_dict, and export_spec_ckpt_config properly handle the draft-only checkpoint export flow.

516-522: Confirm head_dim addition in default eagle config The entry "head_dim": 128 exists in modelopt/torch/speculative/eagle/default_config.py.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/examples/speculative_decoding/test_eagle.py (1)
30-30: LGTM! Consider verifying consistency with the base model.

The addition of head_dim: 64 to the EAGLE config aligns with the PR objectives to support models where hidden_size != num_heads * head_dim. The value appears reasonable for this ultra-tiny test configuration.

For improved clarity and maintainability, consider:

Adding a brief comment explaining the choice of head_dim: 64 and how it relates to the base model's dimensions

Verifying that this value is consistent with the tiny_llama_path model's hidden_size

Example:
         "num_attention_heads": 2,
         "num_key_value_heads": 2,
-        "head_dim": 64,
+        "head_dim": 64,  # Explicit head_dim to avoid incorrect inference from hidden_size

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08bf1d3 and c4d51eb.

📒 Files selected for processing (1)

tests/examples/speculative_decoding/test_eagle.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

Edwardf0t1

LGTM for the export code.

meenchen · 2025-10-08T16:25:39Z

examples/speculative_decoding/README.md

 ```

-Alternatively, we can export the checkpoint and run evaluation on serving frameworks. See sections below.
+**Note**: In-framework evaluation is supported only for online training. For offline training checkpoints, please export the model and evaluate it using serving frameworks.


qq, what's the difference between online training and offline training?

In online training we load both the teacher model and student model into GPU DRAM and perform teacher forward + student forward/backward in a training step.

In offline training, we inference the teacher model first (with HF or TRTLLM), dump hidden states to disks, then train the draft model only with dumped distillation signals. Then the checkpoint does not contains teacher modules. : )

Signed-off-by: h-guo18 <[email protected]>

h-guo18 self-assigned this Oct 1, 2025

h-guo18 requested a review from yeyu-nvidia October 1, 2025 21:21

h-guo18 marked this pull request as ready for review October 1, 2025 21:21

h-guo18 requested review from a team as code owners October 1, 2025 21:21

h-guo18 requested a review from cjluo-nv October 1, 2025 21:21

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

yeyu-nvidia reviewed Oct 1, 2025

View reviewed changes

modelopt/torch/speculative/eagle/default_config.py Outdated Show resolved Hide resolved

h-guo18 requested a review from yeyu-nvidia October 3, 2025 20:43

yeyu-nvidia approved these changes Oct 7, 2025

View reviewed changes

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

h-guo18 force-pushed the haoguo/gptoss-fix branch from e386c69 to 08bf1d3 Compare October 7, 2025 23:06

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

Edwardf0t1 approved these changes Oct 8, 2025

View reviewed changes

meenchen reviewed Oct 8, 2025

View reviewed changes

h-guo18 added 8 commits October 8, 2025 22:46

fix for gptoss

7ca3dbf

Signed-off-by: h-guo18 <[email protected]>

minor: update readme

85017cb

Signed-off-by: h-guo18 <[email protected]>

update default headdim to 128

1d1972f

Signed-off-by: h-guo18 <[email protected]>

update comments

fb0f020

Signed-off-by: h-guo18 <[email protected]>

udpate test_eagle.py to avoid shared mem oom

a9882fe

Signed-off-by: h-guo18 <[email protected]>

update data instructions; remove redundant wandb init

53b9466

Signed-off-by: h-guo18 <[email protected]>

minor

e66985c

Signed-off-by: h-guo18 <[email protected]>

minor

80c6cad

Signed-off-by: h-guo18 <[email protected]>

h-guo18 force-pushed the haoguo/gptoss-fix branch from c4d51eb to 80c6cad Compare October 8, 2025 23:36

h-guo18 enabled auto-merge (squash) October 9, 2025 00:48

h-guo18 merged commit 557633c into main Oct 9, 2025
27 checks passed

h-guo18 deleted the haoguo/gptoss-fix branch October 9, 2025 01:14

Fix: supporting gpt-oss HF eagle #398

Fix: supporting gpt-oss HF eagle #398

Uh oh!

Conversation

h-guo18 commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 1, 2025

Uh oh!

coderabbitai bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

meenchen Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

h-guo18 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

h-guo18 commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 1, 2025 •

edited

Loading

codecov bot commented Oct 1, 2025 •

edited

Loading