-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
between_time api for koalas dataframe #1968
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1968 +/- ##
=======================================
Coverage 94.60% 94.60%
=======================================
Files 49 50 +1
Lines 10890 10905 +15
=======================================
+ Hits 10302 10317 +15
Misses 588 588
Continue to review full report at Codecov.
|
@ueshin, @HyukjinKwon can you take a look at this PR? |
------- | ||
values_between_time : array of integers | ||
""" | ||
return self.index.to_pandas().indexer_between_time( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shril can we avoid calling to_pandas()
? It will bring all data from other nodes to the single client node which can easily OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon, my idea here was to convert just the index column to pandas since there is API support for Datetime index hence the indexer_between_time()
function.
If you want me to, I can also start with Datetime index support in koalas, and then work on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution , @shril . 😄
Yeah, I'd say we need DatetimeIndex
support in Koalas rather than use pandas' because only convert index column to pandas also can easily OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic , starting with a new PR for DatetimeIndex
.
@HyukjinKwon can we leave this PR open till I implement the other PR?
Thanks. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shril,
Sure, we can just leave this PR as it is and let's revisit after finishing DatetimeIndex
! :D
@vmdhhh,
Thanks for your interest to Koalas !!
DatetimeIndex
in Koalas now actually not DatetimeIndex
, but just Index
.
It's only shown as DatetimeIndex
only when calling repr as below.
>>> idx = ks.Index([datetime(2019, 1, 1, 0, 0, 0), datetime(2019, 2, 3, 0, 0, 0)])
>>> idx # It's repr is `DatetimeIndex` since we internally convert this to pandas and use pandas' repr.
DatetimeIndex(['2019-01-01', '2019-02-03'], dtype='datetime64[ns]', freq=None)
>>> type(idx) # So, Actually it's instance of `Index`, not `DatetimeIndex`
<class 'databricks.koalas.indexes.Index'>
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
is_all_dates
just check if all data included in the Index is Spark TimestampType
type.
koalas/databricks/koalas/indexes.py
Line 1973 in b81afcc
return isinstance(self.spark.data_type, TimestampType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @itholic for clearing this.
Currently |
Implement Koalas Missing APIs #1929