avoid using warning tensor in cpu tbe op #3631

842974287 · 2025-01-29T18:00:43Z

Summary:
The main reason for the change is because this can cause crashes for some models https://fburl.com/tupperware/1rt3jvj7.

I'm not entirely sure why there's crashes in this line https://fburl.com/code/x8obwbue when we call __sync_fetch_and_add. Seems like the pointer is invalid but I don't understand why it could be invalid. However, the current impl indeed has a race condition in inference. The warning tensor https://fburl.com/code/1nu3ipli is from this module https://fburl.com/code/99m9lt5e and in inference we have multiple threads calling the cpu ops. Each thread will be modifying the same tensor memory (still not sure why this could invalidate the pointer address though).

Looks like it's not necessary to use a tensor to control the logging once behavior in the cpu op. Removed the usage of warning tensor in the cpu op to fix the race condition.

Differential Revision: D68840262

Summary: The main reason for the change is because this can cause crashes for some models https://fburl.com/tupperware/1rt3jvj7. I'm not entirely sure why there's crashes in this line https://fburl.com/code/x8obwbue when we call `__sync_fetch_and_add`. Seems like the pointer is invalid but I don't understand why it could be invalid. However, the current impl indeed has a race condition in inference. The `warning` tensor https://fburl.com/code/1nu3ipli is from this module https://fburl.com/code/99m9lt5e and in inference we have multiple threads calling the cpu ops. Each thread will be modifying the same tensor memory (still not sure why this could invalidate the pointer address though). Looks like it's not necessary to use a tensor to control the logging once behavior in the cpu op. Removed the usage of `warning` tensor in the cpu op to fix the race condition. Differential Revision: D68840262

netlify · 2025-01-29T18:01:01Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`75bf4f9`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/679a6cceea45b20008bf79fe
😎 Deploy Preview	https://deploy-preview-3631--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-01-29T18:01:52Z

This pull request was exported from Phabricator. Differential Revision: D68840262

facebook-github-bot added the cla signed label Jan 29, 2025

facebook-github-bot added the fb-exported label Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

avoid using warning tensor in cpu tbe op #3631

avoid using warning tensor in cpu tbe op #3631

Uh oh!

842974287 commented Jan 29, 2025

Uh oh!

netlify bot commented Jan 29, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jan 29, 2025

Uh oh!

Uh oh!

avoid using warning tensor in cpu tbe op #3631

Are you sure you want to change the base?

avoid using warning tensor in cpu tbe op #3631

Uh oh!

Conversation

842974287 commented Jan 29, 2025

Uh oh!

netlify bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Jan 29, 2025

Uh oh!

Uh oh!

netlify bot commented Jan 29, 2025 •

edited

Loading