avoid using warning tensor in cpu tbe op #3631
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
The main reason for the change is because this can cause crashes for some models https://fburl.com/tupperware/1rt3jvj7.
I'm not entirely sure why there's crashes in this line https://fburl.com/code/x8obwbue when we call
__sync_fetch_and_add
. Seems like the pointer is invalid but I don't understand why it could be invalid. However, the current impl indeed has a race condition in inference. Thewarning
tensor https://fburl.com/code/1nu3ipli is from this module https://fburl.com/code/99m9lt5e and in inference we have multiple threads calling the cpu ops. Each thread will be modifying the same tensor memory (still not sure why this could invalidate the pointer address though).Looks like it's not necessary to use a tensor to control the logging once behavior in the cpu op. Removed the usage of
warning
tensor in the cpu op to fix the race condition.Differential Revision: D68840262