between_time api for koalas dataframe #1968

shril · 2020-12-12T08:14:44Z

Implement Koalas Missing APIs #1929

codecov-io · 2020-12-12T11:03:48Z

Codecov Report

Merging #1968 (b81afcc) into master (b65891d) will increase coverage by 0.00%.
The diff coverage is 96.87%.

@@           Coverage Diff           @@
##           master    #1968   +/-   ##
=======================================
  Coverage   94.60%   94.60%           
=======================================
  Files          49       50    +1     
  Lines       10890    10905   +15     
=======================================
+ Hits        10302    10317   +15     
  Misses        588      588

Impacted Files	Coverage Δ
databricks/koalas/config.py	`99.00% <ø> (ø)`
databricks/koalas/plot/plotly.py	`94.73% <94.73%> (ø)`
databricks/koalas/plot/core.py	`91.72% <95.23%> (-1.05%)`	⬇️
databricks/koalas/frame.py	`96.79% <100.00%> (+0.04%)`	⬆️
databricks/koalas/series.py	`96.92% <100.00%> (+0.01%)`	⬆️
...bricks/koalas/tests/plot/test_frame_plot_plotly.py	`100.00% <100.00%> (ø)`
...ricks/koalas/tests/plot/test_series_plot_plotly.py	`96.61% <100.00%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b65891d...8bae109. Read the comment docs.

shril · 2020-12-12T17:14:07Z

@ueshin, @HyukjinKwon can you take a look at this PR?

HyukjinKwon · 2020-12-16T05:29:28Z

databricks/koalas/frame.py

+        -------
+        values_between_time : array of integers
+        """
+        return self.index.to_pandas().indexer_between_time(


@shril can we avoid calling to_pandas()? It will bring all data from other nodes to the single client node which can easily OOM.

@HyukjinKwon, my idea here was to convert just the index column to pandas since there is API support for Datetime index hence the indexer_between_time() function.

If you want me to, I can also start with Datetime index support in koalas, and then work on this PR.

Thanks for the contribution , @shril . 😄

Yeah, I'd say we need DatetimeIndex support in Koalas rather than use pandas' because only convert index column to pandas also can easily OOM.

@itholic , starting with a new PR for DatetimeIndex.
@HyukjinKwon can we leave this PR open till I implement the other PR?
Thanks. :)

I am fairly new to this so I am sorry if I sound too naive, but doesn't is_all_dates function from koalas provide support for Datetimeindex? @itholic @shril

@shril,
Sure, we can just leave this PR as it is and let's revisit after finishing DatetimeIndex ! :D

@vmdhhh,
Thanks for your interest to Koalas !!
DatetimeIndex in Koalas now actually not DatetimeIndex, but just Index.
It's only shown as DatetimeIndex only when calling repr as below.

>>> idx = ks.Index([datetime(2019, 1, 1, 0, 0, 0), datetime(2019, 2, 3, 0, 0, 0)]) >>> idx # It's repr is `DatetimeIndex` since we internally convert this to pandas and use pandas' repr. DatetimeIndex(['2019-01-01', '2019-02-03'], dtype='datetime64[ns]', freq=None) >>> type(idx) # So, Actually it's instance of `Index`, not `DatetimeIndex` <class 'databricks.koalas.indexes.Index'> >>> type(idx.to_pandas()) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

is_all_dates just check if all data included in the Index is Spark TimestampType type.

koalas/databricks/koalas/indexes.py

Line 1973 in b81afcc

return isinstance(self.spark.data_type, TimestampType)

Thank you @itholic for clearing this.

xinrong-meng · 2021-08-03T21:35:59Z

Currently between_time is implemented https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.between_time.html. May I close the pull request?

Shril Kumar added 8 commits December 12, 2020 13:43

between_time api for koalas dataframe

4e8963a

PEP8 Formatting

717a61e

Typo

a393654

Reformatted for black test

0fe20f4

Added test_between_time

e8ee30b

PEP8 formatting

da3b1b8

Ran dev/reformat

4d525a8

removed from unsupported funtion

00012ac

AssertionError fix

8bae109

HyukjinKwon reviewed Dec 16, 2020

View reviewed changes

shril mentioned this pull request Dec 19, 2020

Support for DateTime Index in Koalas #1976

Closed

xinrong-meng closed this Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

between_time api for koalas dataframe #1968

between_time api for koalas dataframe #1968

shril commented Dec 12, 2020

codecov-io commented Dec 12, 2020 •

edited

Loading

shril commented Dec 12, 2020

HyukjinKwon Dec 16, 2020

shril Dec 16, 2020

itholic Dec 17, 2020 •

edited

Loading

shril Dec 18, 2020

vmdhhh Dec 18, 2020 •

edited

Loading

itholic Dec 18, 2020 •

edited

Loading

vmdhhh Dec 18, 2020

xinrong-meng commented Aug 3, 2021

between_time api for koalas dataframe #1968

between_time api for koalas dataframe #1968

Conversation

shril commented Dec 12, 2020

codecov-io commented Dec 12, 2020 • edited Loading

Codecov Report

shril commented Dec 12, 2020

HyukjinKwon Dec 16, 2020

Choose a reason for hiding this comment

shril Dec 16, 2020

Choose a reason for hiding this comment

itholic Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

shril Dec 18, 2020

Choose a reason for hiding this comment

vmdhhh Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

itholic Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

vmdhhh Dec 18, 2020

Choose a reason for hiding this comment

xinrong-meng commented Aug 3, 2021

codecov-io commented Dec 12, 2020 •

edited

Loading

itholic Dec 17, 2020 •

edited

Loading

vmdhhh Dec 18, 2020 •

edited

Loading

itholic Dec 18, 2020 •

edited

Loading