Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Oct 24, 2025

Description

Support push down sort after limit if there is only limit pushed, since PPL or other database all don't promise sequence for only limit operator. Thus it should be acceptable to transform limit + sort to sort + limit.

However, we should avoid pushing down sort, if there is existing sort before limit. In such case, users intense to retrieve the Top-K values on the first sort fields from our index. The order will be overridden by the second sort if we keep pushing down it. So this PR prevents this case by detecting whether top-k is pushed down already.

Since #4501 has always introduced a limit before sort for join or subsearch, which will block sort push down. This PR will also enhance these scenarios, especially for left join -- both sides will have limit and sort pushed down then. See CalciteExplainIT::testExplainScalarCorrelatedSubqueryInSelect

Related Issues

Resolves #4570

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@qianheng-aws qianheng-aws added the enhancement New feature or request label Oct 24, 2025
@LantaoJin LantaoJin changed the title Support push down sort after limit Support push down sort after system limit Oct 24, 2025
Signed-off-by: Heng Qian <[email protected]>
# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchIndexScanRule.java
#	opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchSortIndexScanRule.java
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/PushDownContext.java
}

@Test
public void testHeadThenSort() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Better to check NoPushDownIT case. We may need different branch for these two test cases

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@LantaoJin
Copy link
Member

High level question: At the beginning, I thought you are targeting to push down a sort through sysLimit, so I renamed the PR title. But from the description, it seems a general purpose to push down the sort through limit?

@qianheng-aws
Copy link
Collaborator Author

High level question: At the beginning, I thought you are targeting to push down a sort through sysLimit, so I renamed the PR title. But from the description, it seems a general purpose to push down the sort through limit?

Yes, this a general PR for enhancement not only bug fix.

In the process of finding solution to the issue #4570, I found it hard to only fix that specific issue because syslimit has already been replaced with calcite's logicalsort before optimizing. So we're not able to do anything with the systemlimit operator we added.

But with this general enhancement, the original issue can also been addressed.

@qianheng-aws qianheng-aws changed the title Support push down sort after system limit Support push down sort after limit Oct 27, 2025
// pushed down since we don't promise collation with only limit.
.predicate(
Predicate.not(AbstractCalciteIndexScan::isLimitPushed)
Predicate.not(AbstractCalciteIndexScan::isTopKPushed)
Copy link
Member

@LantaoJin LantaoJin Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a case for below ppl?

source = t | head 100 | stats count() as cnt | sort cnt

The sort cnt must not be pushed down through head 100.

Cannot for following ppl either.

source = t | head 100 | eval rand = rand() | sort rand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any sort expression evaluated after limit cannot pushed through limit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both cases cannot push the sort and it's not related to whether there is limit in the PPL query. Their no push down reason are:

  1. We cannot push down metric sort into agg unless the agg is nullable=false
  2. We don't support script push down for sort currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add these cases in explain IT:

  1. It's not the case of pushdown sort agg metrics due to there is no bucket.
  2. got it. but better to add a explain IT.

LantaoJin
LantaoJin previously approved these changes Oct 27, 2025
yuancu
yuancu previously approved these changes Oct 27, 2025
@qianheng-aws qianheng-aws merged commit d4a2d19 into opensearch-project:main Oct 27, 2025
82 of 83 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 28, 2025
* Support push down sort after limit

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

* Fix IT after merging main

Signed-off-by: Heng Qian <[email protected]>

* spotless apply

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
(cherry picked from commit d4a2d19)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
yuancu pushed a commit that referenced this pull request Oct 28, 2025
* Support push down sort after limit



* Fix IT



* Fix IT after merging main



* spotless apply



---------


(cherry picked from commit d4a2d19)

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

# Conflicts:
#	docs/user/ppl/functions/conversion.rst
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

Signed-off-by: Asif Bashar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] system limit from plugins.ppl.join.subsearch_maxout prevents sort push down for sort merge join

4 participants