-
Notifications
You must be signed in to change notification settings - Fork 13
refactor!: Rename filter_relationship_one_to_one to require_relationship_one_to_one and add drop_unique parameter
#67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@borchero @AndreasAlbertQC wdyt of this fix? One could argue that these functions are only to be used inside Happy to adjust the API if you have a better idea. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 50 50
Lines 2904 2908 +4
=========================================
+ Hits 2904 2908 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
drop_non_unique parameter to filter_relationship_one_to_onekeep_only_unique parameter to filter_relationship_one_to_one
keep_only_unique parameter to filter_relationship_one_to_onekeep_only_unique parameter to filter_relationship_one_to_one
|
@delsner I think your change is good, but I wonder if we need to do something else to mitigate confusing results if the dataframes are not unique on the join keys already. If I don't set the new flag, I will get nonsensical results, right?
I am hesitating between 1.3 and 2. |
Potentially, yes.
2 is easy, I can just update the docstring. Any thoughts @borchero? |
|
@delsner I'd be happy with counting option 1.3 as a bug fix bc I think the current behavior generates incorrect output data |
|
I made the current choice deliberately as additional uniqueness validations are really expensive. Hence, my initial thought would be to go for (2). Nevertheless, it potentially makes sense to make this behavior opt-in, i.e. require the user to set some kind of flag to skip the additional validation step. However, I'm not sure I love the extension of the |
|
@borchero and I decided to rename these functions to |
keep_only_unique parameter to filter_relationship_one_to_onefilter_relationship_one_to_one to require_relationship_one_to_one and add filter_unique parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks <3
filter_relationship_one_to_one to require_relationship_one_to_one and add filter_unique parameterfilter_relationship_one_to_one to require_relationship_one_to_one and add drop_unique parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 😄
Motivation
filter_relationship_one_to_onedoes only check for a 1:1 relationship in case both data frames are unique w.r.t. the provided join key. Also, the docstrings are incorrect as the join columns cannot be inferred.Changes
drop_uniqueto allow filtering for 1:1 or 1:{1,N} relationships in case join columns do not uniquely identify rows