Skip to content

Updated parse_policy_info function in augment.py #13509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LakshmiKalaKadali
Copy link
Collaborator

This PR is raised to add more flexible control over the randomness in the augmentation process by changing level += tf.random.normal([], dtype=tf.float32) to level += level_std * tf.random.normal([], dtype=tf.float32) to the function parse_policy_info instead of standard deviation always 1.

This PR is raised to add more flexible control over the randomness in the augmentation process by changing `level += tf.random.normal([], dtype=tf.float32)` to `level += level_std * tf.random.normal([], dtype=tf.float32)` to the function `parse_policy_info` instead of standard deviation always 1.
@@ -1869,7 +1869,7 @@ def _parse_policy_info(name: str,
func = NAME_TO_FUNC[name]

if level_std > 0:
level += tf.random.normal([], dtype=tf.float32)
level += level_std*tf.random.normal([], dtype=tf.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the behavior completely. I am worrying about the effects of this. Have you done any tests verifying it won't break existing results using this augmentation ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to bring more flexibility, but the default behavior should keep backward compatibility.

@yeqingli yeqingli added the ready to pull ready to pull (create internal pr review and merge automatically) label Jan 14, 2025
@LakshmiKalaKadali LakshmiKalaKadali removed the ready to pull ready to pull (create internal pr review and merge automatically) label Jan 20, 2025
Copy link

@OrangeDoro OrangeDoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I'm a grad student working on a research project about using large language models to automate code review. Based on your commit 06cbb37 and the changes in official/vision/ops/augment.py, my tool generated this comment:

  1. Null Value Checks: The function _parse_policy_info does not check if the parameters are None before using them. It is advisable to add checks at the beginning of the function to ensure that these parameters are valid.
  2. Data Type and Range Validation: Ensure that level_std is validated before it is used in the multiplication. If level_std is negative or not a number, it could lead to unexpected behavior. Consider adding checks to ensure that level_std is a non-negative float.
  3. Type Checks: There are no checks to ensure that the types of the parameters are as expected. For instance, replace_value should be a list of integers, and level_std should be a float.
  4. Clipping of Level: Verify that the new value of level after scaling does not exceed _MAX_LEVEL or fall below 0, especially if level_std is large.
  5. Handling Abnormal Page Data: The code does not handle cases where level might exceed _MAX_LEVEL after the addition of noise. It is important to ensure that the input to tf.clip_by_value is valid.
  6. Functionality of level_to_arg: Ensure that the functions mapped in args can handle the new range of level values correctly. If any of these functions expect level to be within a specific range, the scaling could lead to errors or unexpected behavior.
  7. Function Argument Validation: The function level_to_arg returns a dictionary of functions based on the name parameter. There should be a check to ensure that name is valid and exists in the args dictionary.
  8. Scaling of Random Normal Value: Ensure that level_std is intended to be a scaling factor for the randomness; otherwise, this could introduce unintended behavior.
  9. Error Handling: Consider implementing error handling for cases where the random generation or subsequent calculations fail. This can prevent the application from crashing or behaving unpredictably.
  10. Testing: Implement unit tests that cover various scenarios, including edge cases where level_std is 0, very small, or very large, to ensure that the changes do not introduce any logical errors in the overall functionality.
  11. Testing for Variability: Add tests to verify the variability of level when level_std is set to different values (e.g., 0, positive values). Ensure that the output level reflects the expected range based on the input level_std.
  12. Boundary Tests: Add tests to check the behavior of the level variable when level_std is set to 0. The output should equal the input level.

As part of my research, I'm trying to understand how useful these comments are in real-world development. If you have a moment, I'd be super grateful if you could quickly reply to these two yes/no questions:

  1. Does this comment provide suggestions from a dimension you hadn’t considered?

  2. Do you find this comment helpful?

Thanks a lot for your time and feedback! And sorry again if this message is a bother.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:official models that come under official repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants