-
Couldn't load subscription status.
- Fork 176
Support push down sort after limit #4657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
# Conflicts: # opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchIndexScanRule.java # opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchSortIndexScanRule.java # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/PushDownContext.java
| } | ||
|
|
||
| @Test | ||
| public void testHeadThenSort() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Better to check NoPushDownIT case. We may need different branch for these two test cases
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
|
High level question: At the beginning, I thought you are targeting to push down a |
Yes, this a general PR for enhancement not only bug fix. In the process of finding solution to the issue #4570, I found it hard to only fix that specific issue because But with this general enhancement, the original issue can also been addressed. |
| // pushed down since we don't promise collation with only limit. | ||
| .predicate( | ||
| Predicate.not(AbstractCalciteIndexScan::isLimitPushed) | ||
| Predicate.not(AbstractCalciteIndexScan::isTopKPushed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a case for below ppl?
source = t | head 100 | stats count() as cnt | sort cnt
The sort cnt must not be pushed down through head 100.
Cannot for following ppl either.
source = t | head 100 | eval rand = rand() | sort rand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any sort expression evaluated after limit cannot pushed through limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both cases cannot push the sort and it's not related to whether there is limit in the PPL query. Their no push down reason are:
- We cannot push down metric sort into agg unless the agg is nullable=false
- We don't support script push down for sort currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add these cases in explain IT:
- It's not the case of pushdown sort agg metrics due to there is no bucket.
- got it. but better to add a explain IT.
* Support push down sort after limit Signed-off-by: Heng Qian <[email protected]> * Fix IT Signed-off-by: Heng Qian <[email protected]> * Fix IT after merging main Signed-off-by: Heng Qian <[email protected]> * spotless apply Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit d4a2d19) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Support push down sort after limit * Fix IT * Fix IT after merging main * spotless apply --------- (cherry picked from commit d4a2d19) Signed-off-by: Heng Qian <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... # Conflicts: # docs/user/ppl/functions/conversion.rst
* default-main: (34 commits) Enhance dynamic source clause to support only metadata filters (opensearch-project#4554) Make nested alias type support referring to outer context (opensearch-project#4673) Update big5 ppl queries and check plans (opensearch-project#4668) Support push down sort after limit (opensearch-project#4657) Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670) Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621) Fix bin nested fields issue (opensearch-project#4606) Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531) Pushdown sort aggregate metrics (opensearch-project#4603) Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648) Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646) Allow renaming group-by fields to existing field names (opensearch-project#4586) Publish internal modules separately for downstream reuse (opensearch-project#4484) Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643) Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599) Replace all dots in fields of table scan's PhysType (opensearch-project#4633) Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629) Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623) Pushdown case function in aggregations as range queries (opensearch-project#4400) Update GEOIP function to support IP types as input (opensearch-project#4613) ... Signed-off-by: Asif Bashar <[email protected]>
Description
Support push down sort after limit if there is only limit pushed, since PPL or other database all don't promise sequence for only limit operator. Thus it should be acceptable to transform
limit + sorttosort + limit.However, we should avoid pushing down
sort, if there is existingsortbeforelimit. In such case, users intense to retrieve the Top-K values on the first sort fields from our index. The order will be overridden by the secondsortif we keep pushing down it. So this PR prevents this case by detecting whethertop-kis pushed down already.Since #4501 has always introduced a limit before sort for join or subsearch, which will block sort push down. This PR will also enhance these scenarios, especially for left join -- both sides will have limit and sort pushed down then. See
CalciteExplainIT::testExplainScalarCorrelatedSubqueryInSelectRelated Issues
Resolves #4570
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.