Skip to content

Handle fetch optimizer states for the KV ZCH load state dict case #4512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

q10
Copy link
Contributor

@q10 q10 commented Jul 17, 2025

Summary:
This diff updates KVZCHCachedData to hold multiple optimizer states per table in cached_optimizer_states_per_table, and updates apply_state_dict to handle writing out multiple optimizer states per table row to the cache. This is needed for enabling other optimizers to work with SSD TBE, such as Partial Rowwise Adam.

There are 4 cases to handle when attempting to fetch the split optimizer states:

  1. The no-KV ZCH case
  2. The KV ZCH case, but where self.load_state_dict is True (i.e. fall back to self._cached_kvzch_data)
  3. The KV ZCH case, where self.load_state_dict is False, and self.enable_optimizer_offloading is false
  4. The KV ZCH case, where self.load_state_dict is False, and self.enable_optimizer_offloading is True

The diff completes the handling of returning optimizer states for the KV ZCH case, but where self.load_state_dict is true (case 2).

Reviewed By: emlin

Differential Revision: D77771359

Copy link

netlify bot commented Jul 17, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 1b04531
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/6878bb16dc24cb000845a4fc
😎 Deploy Preview https://deploy-preview-4512--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Jul 17, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77771359

Summary:
X-link: facebookresearch/FBGEMM#1561

This diff updates `KVZCHCachedData` to hold multiple optimizer states per table in cached_optimizer_states_per_table, and updates apply_state_dict to handle writing out multiple optimizer states per table row to the cache.  This is needed for enabling other optimizers to work with SSD TBE, such as Partial Rowwise Adam.

There are 4 cases to handle when attempting to fetch the split optimizer states:

1. The no-KV ZCH case
1. The KV ZCH case, but where `self.load_state_dict` is `True` (i.e. fall back to `self._cached_kvzch_data`)
1. The KV ZCH case, where `self.load_state_dict` is `False`, and `self.enable_optimizer_offloading` is false
1. The KV ZCH case, where `self.load_state_dict` is `False`, and `self.enable_optimizer_offloading` is `True`

The diff completes the handling of returning optimizer states for the KV ZCH case, but where `self.load_state_dict` is true (case 2).

Reviewed By: emlin

Differential Revision: D77771359
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77771359

@q10 q10 force-pushed the export-D77771359 branch from 355313a to 1b04531 Compare July 17, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants