-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new where reduction #1155
Add new where reduction #1155
Conversation
@@ -1159,6 +1223,75 @@ def _finalize(bases, **kwargs): | |||
raise NotImplementedError("mode is currently implemented only for rasters") | |||
|
|||
|
|||
class where(FloatingReduction): | |||
def __init__(self, selector: Reduction, lookup_column: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring with a good example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a docstring. It is somewhat tortuous English when written in a generic form so I have added a concrete example that should help.
I am using the argument name selector
because long-term I would like to divide the Reduction
class hierarchy into two:
Selection
reductions that use values from a column without modifying them, e.g.first
andmax
.Combination
reductions that do some form of mathematical combination of the values from a column, e.g.mean
,std
,count
.
Given that division, a where
reduction will accept a Selection
but not a Combination
to select values from the lookup_column
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. Agreed that it's tortuous, but the example does help a lot.
Thanks! Can you clarify the current status of types? I.e. can you return an integer aggregate when testing on a float condition? |
|
Ok, I guess we'll need to deal with datatype issues when we support using the Pandas index as the "column" (actually just imputed values that act like a column, hence needing special support). |
Rebased on top of main to pick up the CI fixes. |
The reduction in coverage is mostly due to changes to the CUDA |
Codecov Report
@@ Coverage Diff @@
## main #1155 +/- ##
==========================================
+ Coverage 85.39% 85.43% +0.03%
==========================================
Files 35 35
Lines 7819 7941 +122
==========================================
+ Hits 6677 6784 +107
- Misses 1142 1157 +15
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Pinging @jbednar. I'd like to merge this and add the extra functionality (such as use of a virtual integer row index) as separate PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those look like some painful changes, and it's great to have them behind us! Thanks, and it looks good to me!
datashader/reductions.py
Outdated
@@ -507,6 +531,10 @@ def __init__(self, cat_column, reduction=count()): | |||
self.categorizer = category_codes(cat_column) | |||
else: | |||
raise TypeError("first argument must be a column name or a CategoryPreprocess instance") | |||
|
|||
if isinstance(reduction, where): | |||
raise TypeError("'by' reduction cannot use a 'where' reduction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make "use" more specific? Cannot accept, does not support, etc.? It's confusing to read about reductions using reductions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1159,6 +1223,75 @@ def _finalize(bases, **kwargs): | |||
raise NotImplementedError("mode is currently implemented only for rasters") | |||
|
|||
|
|||
class where(FloatingReduction): | |||
def __init__(self, selector: Reduction, lookup_column: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. Agreed that it's tortuous, but the example does help a lot.
This partially implements issue #1126, adding a new
where
reduction that accepts either amax
ormin
reduction. Best illustrated via an example:which outputs
You can think of this using the
max('value')
reduction as normal, but then returning the corresponding values from the'other'
column rather that thevalue
column.What it currently supports:
where
takes either amin
ormax
selector reduction.summary
or categoricalby
reduction.Note that there is no support for use of
first
andlast
within awhere
because there is no advantage in doing this, you can just use thefirst
orlast
directly on their own.Future improvements:
lookup_column
is not specified, use the index of the row in the supplied DataFrame.max_n
,min_n
,first_n
,last_n
reductions.All of these are possible but fiddly to implement, so I would rather have partial functionality available for users to experiment with and I can add these improvements over time.
Currently some combinations of lines and dask give different results depending on the number of dask partitions, but this has always been the situation and is no worse here.