Spark 4.0, Core: Add Limit pushdown to Scan #14615

nastra · 2025-11-18T14:07:58Z

This pushes down the LIMIT from Spark to the underlying Scan. This is still expecting the LIMIT to be applied by Spark, but its value is pushed down through the Scan and used as min-rows-requested (introduced by #14565) for server-side scan planning. This is used as a hint during server-side scan planning to not have to return more rows than necessary. It is not required for the server to return that many rows since the scan may not produce that many rows. The server can also return more rows than requested.

singhpk234 · 2025-11-18T14:26:41Z

core/src/main/java/org/apache/iceberg/BaseScan.java

+  public ThisT minRowsRequested(int numRows) {
+    return newRefinedScan(table, schema, context.minRowsRequested(numRows));
+  }


Suggested change

public ThisT minRowsRequested(int numRows) {

return newRefinedScan(table, schema, context.minRowsRequested(numRows));

}

public ThisT minRowsRequested(Integer numRows) {

return newRefinedScan(table, schema, context.minRowsRequested(numRows));

}

why would we want to make this an Integer instead of an int?

singhpk234 · 2025-11-18T14:27:41Z

api/src/main/java/org/apache/iceberg/Scan.java

+
+  /**
+   * Create a new scan that returns files with at least the given number of rows. This is used as a
+   * hint during server-side scan planning to not have to return more rows than necessary. It is not


why only server-side scan planning, we can extend this to any scan ?

if the intention is strictly to make it for server-side scan i would recommend another interface which implementation can implement both Scan and LimitAwareScan (?)

Yeah, but also this is a lot lighter than the open PR #13451 that adds some local optimizations for non rest catalogs.

I've removed that wording to not limit this to server-side scan planning

api/src/main/java/org/apache/iceberg/Scan.java

singhpk234 · 2025-11-22T03:37:24Z

core/src/main/java/org/apache/iceberg/BaseScan.java

+  @Override
+  public ThisT minRowsRequested(long numRows) {
+    return newRefinedScan(table, schema, context.minRowsRequested(numRows));
+  }


How are we gating that this is only being called for RESTCatalog as for other things its a No-op ?

I don't think we have to gate this in any way, since this is an entirely optional optimization

make sense, can we add a test for metadata table read as it would trigger a different scan obj, wdyt ?

it would trigger a BaseMetadataTableScan which also extends BaseScan, so this code should cover scans against normal and metadata tables. Let me add a test for this

singhpk234 · 2025-11-25T23:09:12Z

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java

+    assertThat(sql("SELECT * FROM %s LIMIT 1", tableName)).containsExactly(first);
+    assertThat(sql("SELECT * FROM %s LIMIT 2", tableName)).containsExactly(first, second);
+    assertThat(sql("SELECT * FROM %s LIMIT 3", tableName)).containsExactly(first, second, third);


My understanding is this would succeed even without this change, how about inspecting the plan and extracting the plan node to check the limit if percolated deep down to the scan node

LogicalPlan logicalPlan = limitedDf.queryExecution().optimizedPlan(); Optional<Integer> limit = JavaConverters.asJavaCollection(logicalPlan.collectLeaves()).stream() .flatMap( plan -> { if (!(plan instanceof DataSourceV2ScanRelation)) { return Stream.empty(); } DataSourceV2ScanRelation scanRelation = (DataSourceV2ScanRelation) plan; if (!(scanRelation.scan() instanceof SparkBatchQueryScan)) { return Stream.empty(); } SparkBatchQueryScan batchQueryScan = (SparkBatchQueryScan) scanRelation.scan(); return Stream.ofNullable(batchQueryScan.pushedLimit()); }) .findFirst(); return limit.orElse(null);

credits: https://github.com/apache/iceberg/pull/10943/files#diff-5de7aa01c6be719c3085c1f0416ed700fb78358103c8b26c6e3ecb9f063c04dbR270

My understanding is this would succeed even without this change

Yes that's absolutely correct. The LIMIT is applied by Spark even without any of the changes introduced in this PR.

how about inspecting the plan and extracting the plan node to check the limit if percolated deep down to the scan node

I'm not sure about this, since this would effectively test that the limit is actually pushed down to the SparkScan, which means we're just making sure that implementing SupportsPushDownLimit has the right effect of pushing the limit. In the context of #10943 this check made sense, since that PR was adding a flag to enable/disable limit pushdown, but here we're always pushing it down and I've been going back and forth on the best way about properly testing this.

Let me think a bit more about what an appropriate way of testing this would be

I have added some tests to TestFilteredScan where we simulate the LIMIT pushdown and make sure that this properly gets passed down to the TableScanContext for different types of scans

I agree with how the new tests are set up; there's a clean separation between testing what actually gets pushed down, which verifies we're building the scans correctly and an expectation based off the result of the pushdown.

singhpk234 · 2025-11-25T23:15:17Z

core/src/main/java/org/apache/iceberg/BaseScan.java

+  @Override
+  public ThisT minRowsRequested(long numRows) {
+    return newRefinedScan(table, schema, context.minRowsRequested(numRows));
+  }


make sense, can we add a test for metadata table read as it would trigger a different scan obj, wdyt ?

huaxingao · 2025-12-01T01:25:25Z

I think we can probably combine this PR with #13451. After this PR is merged, we can proceed with #13451 to early-stop planning when the accumulated estimated rows ≥ minRowsRequested.

amogh-jahagirdar

Overall, this looks good to me, just some minor comments thanks @nastra!

amogh-jahagirdar · 2025-12-02T07:25:41Z

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java

+    // verify CoW scan
+    assertThat(builder.buildCopyOnWriteScan())


What about making sure it's pushed for distributed planning?

Ok nice, looks like all these tests are parameterized on planning mode which includes distributed

Though for this particular test for metadata tables, buildCopyOnWriteScan/buildMergeOnReadScan wouldn't be relevant right?

yeah distributed and local planning is both already handled by the test parameterization. For the metadata tables I just wanted to make sure that the limit is properly pushed down to the TableScanContext independent of whether buildCopyOnWriteScan/buildMergeOnReadScan actually makes sense on a metadata table or not, since technically these are callable from a user's perspective

amogh-jahagirdar · 2025-12-02T07:43:37Z

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java

+    assertThat(sql("SELECT * FROM %s LIMIT 1", tableName)).containsExactly(first);
+    assertThat(sql("SELECT * FROM %s LIMIT 2", tableName)).containsExactly(first, second);
+    assertThat(sql("SELECT * FROM %s LIMIT 3", tableName)).containsExactly(first, second, third);


I agree with how the new tests are set up; there's a clean separation between testing what actually gets pushed down, which verifies we're building the scans correctly and an expectation based off the result of the pushdown.

amogh-jahagirdar · 2025-12-02T07:47:42Z

api/src/main/java/org/apache/iceberg/Scan.java

+
+  /**
+   * Create a new scan that returns files with at least the given number of rows. This is used as a
+   * hint and is entirely optional in order to not have to return more rows than necessary. It is


minor, as it's a code comment I think we could simplify the last 2 sentences in this code comment into 1:

This may return fewer rows if the scan does not contain that many, or it may return more than requested.

github-actions bot added API spark core labels Nov 18, 2025

nastra changed the title ~~Spark, Core: Add Limit pushdown to Scan~~ Spark 4.0, Core: Add Limit pushdown to Scan Nov 18, 2025

singhpk234 reviewed Nov 18, 2025

View reviewed changes

nastra closed this Nov 19, 2025

nastra reopened this Nov 19, 2025

nastra force-pushed the limit-pushdown-from-spark branch 2 times, most recently from bc87a63 to 501993f Compare November 19, 2025 08:37

singhpk234 reviewed Nov 22, 2025

View reviewed changes

nastra force-pushed the limit-pushdown-from-spark branch from 501993f to daa72f5 Compare November 24, 2025 13:37

nastra requested review from amogh-jahagirdar and singhpk234 November 24, 2025 13:38

singhpk234 reviewed Nov 25, 2025

View reviewed changes

amogh-jahagirdar approved these changes Dec 2, 2025

View reviewed changes

nastra added 2 commits December 2, 2025 09:35

Spark, Core: Add Limit pushdown to Scan

9dda965

add test

7594186

nastra requested a review from singhpk234 December 2, 2025 08:38

review feedback

c26d174

nastra force-pushed the limit-pushdown-from-spark branch from a81000b to c26d174 Compare December 2, 2025 08:39

		// verify CoW scan
		assertThat(builder.buildCopyOnWriteScan())

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Are you sure you want to change the base?

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Conversation

nastra commented Nov 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geruh Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Dec 1, 2025

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

geruh Nov 19, 2025 •

edited

Loading