fix: Make scheduledFuture thread-safe in HttpNativeExecutionTaskResultFetcher #26649

xin-zhang2 · 2025-11-18T18:01:52Z

Description

As disscussed in #26550 (comment), we will use synchronized methods to fix the potential thread safety issues in HttpNativeExecutionTaskResultFetcher.

Motivation and Context

Impact

Test Plan

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

…tFetcher

sourcery-ai · 2025-11-18T18:01:59Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR makes scheduledFuture access in HttpNativeExecutionTaskResultFetcher thread-safe by synchronizing core methods, removes the unused completed flag, and cleans up redundant completion logic in result fetching.

Class diagram for updated HttpNativeExecutionTaskResultFetcher thread safety

classDiagram
    class HttpNativeExecutionTaskResultFetcher {
        -Object taskHasResult
        -AtomicReference<Throwable> lastException
        -ScheduledFuture<?> scheduledFuture
        -long token
        +synchronized void start()
        +synchronized void stop(boolean success)
        +boolean hasPage()
        +synchronized void throwIfFailed()
        -void doGetResults()
        +synchronized void onSuccess(PageBufferClient.PagesResponse pagesResponse)
    }
    class ScheduledFuture {
    }
    HttpNativeExecutionTaskResultFetcher --> ScheduledFuture
    class AtomicReference {
    }
    HttpNativeExecutionTaskResultFetcher --> AtomicReference
    class PageBufferClient {
    }
    class PagesResponse {
    }
    HttpNativeExecutionTaskResultFetcher --> PagesResponse
    PageBufferClient --> PagesResponse

File-Level Changes

Change	Details	Files
Synchronized scheduledFuture management and removed completed flag for thread safety	Annotated scheduledFuture with @GuardedBy("this") Made start(), stop(), throwIfFailed(), and onSuccess() synchronized Removed volatile completed field and its usage/checks Eliminated redundant completion check in doGetResults	`presto-spark-base/src/main/java/com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-spark-base/src/main/java/com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java:133` </location>
<code_context>
     }

-    private void throwIfFailed()
+    private synchronized void throwIfFailed()
     {
         if (scheduledFuture != null && scheduledFuture.isCancelled() && lastException.get() != null) {
</code_context>

<issue_to_address>
**suggestion:** Synchronizing 'throwIfFailed' may be unnecessary if only reading atomic references.

As 'lastException' is an AtomicReference and only reads are performed, you may be able to remove synchronization unless specific ordering is required.

```suggestion
    private void throwIfFailed()
```
</issue_to_address>

### Comment 2
<location> `presto-spark-base/src/main/java/com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java:158` </location>
<code_context>
     }

-    private void onSuccess(PageBufferClient.PagesResponse pagesResponse)
+    private synchronized void onSuccess(PageBufferClient.PagesResponse pagesResponse)
     {
         List<SerializedPage> pages = pagesResponse.getPages();
</code_context>

<issue_to_address>
**suggestion (performance):** Synchronizing 'onSuccess' may block other operations; review for performance impact.

If 'onSuccess' is invoked often or handles large batches, method-level synchronization may cause delays. Evaluate if finer-grained synchronization is possible.

Suggested implementation:

```java
    private void onSuccess(PageBufferClient.PagesResponse pagesResponse)
    {
        List<SerializedPage> pages = pagesResponse.getPages();
        long bytes = 0;
        // Synchronize only the critical section that modifies shared state
        synchronized (this) {
            token = nextToken;
        }
        if (pagesResponse.isClientComplete()) {
            workerClient.abortResultsAsync(taskId);
            synchronized (this) {
                if (scheduledFuture != null) {
                    scheduledFuture.cancel(false);
                }
            }
        }

```

You should review all shared mutable state accessed in `onSuccess` and ensure only those modifications are synchronized. If other variables (e.g., `token`, `scheduledFuture`) are shared across threads, synchronize their access as shown. If more shared state is present in the full method, wrap only those assignments in `synchronized` blocks. This will minimize blocking and improve performance.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java

tanjialiang · 2025-11-19T22:15:35Z

Thanks @xin-zhang2 for the improvement. Left some comments.

xin-zhang2 · 2025-11-20T10:30:34Z

Left some comments.

Thanks @tanjialiang, but I don’t seem to see the comments. Would you mind checking again?

hantangwangd · 2025-11-20T18:38:43Z

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java

    }

-    private void onSuccess(PageBufferClient.PagesResponse pagesResponse)
+    private synchronized void onSuccess(PageBufferClient.PagesResponse pagesResponse)


@xin-zhang2 thanks for this change.

I have a bit of concern about the synchronized(taskHasResult) {...} block in this method. Since taskHasResult is a shared object used beyond this class, we need to ensure this won't cause deadlocks.

For example, if another thread holds the lock on taskHasResult and then attempts to call HttpNativeExecutionTaskResultFetcher.close(), this could lead to a deadlock situation. Any thoughts? PLMK if I have misunderstood anything. @xin-zhang2 @tanjialiang

Though concerning, this shall be alright. Outside of this file the sites are only waiting on this taskHasResult object. Only this file has the responsibility on notifying. This means all synchronized(taskHasResult) outside of this file will and should have taskHasResult.wait() in the block, which releases the lock. As long as they don't call this onSuccess or onFailure methods between synchronized(taskHasResult) and taskHasResult.wait() we should be good. And we don't have such cases in the codebase.

Thanks for the explanation. Yes, on one hand you are correct, the taskHasResult lock alone isn't a problem, as we are using it correctly. But on the other hand, the issue I'm concerned about arises with the introduction of a second lock in the current context.

And in fact, after checking the code in detail, I believe that a deadlock may occur between methods PrestoSparkNativeTaskExecutorFactory.computeNext() and HttpNativeExecutionTaskResultFetcher.doGetResults(). The two methods try to acquire and hold the two different synchronized locks in conflicting orders. By adding a delay (Thread.sleep(10000)) between their acquisition of the two locks, we should be able to reproduce the deadlock. The jstack information is as follows:

Found one Java-level deadlock: ============================= "Executor task launch worker-0": waiting to lock monitor 0x000076af0c0c83f0 (object 0x00000004563ae070, a com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher), which is held by "presto-spark-scheduled-executor-3" "presto-spark-scheduled-executor-3": waiting to lock monitor 0x000076af5c162fd0 (object 0x00000004563adfd0, a java.lang.Object), which is held by "Executor task launch worker-0" Java stack information for the threads listed above: =================================================== "Executor task launch worker-0": at com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher.throwIfFailed(HttpNativeExecutionTaskResultFetcher.java:135) - waiting to lock <0x00000004563ae070> (a com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher) at com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher.hasPage(HttpNativeExecutionTaskResultFetcher.java:129) at com.facebook.presto.spark.execution.task.NativeExecutionTask.hasResult(NativeExecutionTask.java:152) at com.facebook.presto.spark.execution.task.PrestoSparkNativeTaskExecutorFactory$PrestoSparkNativeTaskOutputIterator.computeNext(PrestoSparkNativeTaskExecutorFactory.java:623) - locked <0x00000004563adfd0> (a java.lang.Object) at com.facebook.presto.spark.execution.task.PrestoSparkNativeTaskExecutorFactory$PrestoSparkNativeTaskOutputIterator.hasNext(PrestoSparkNativeTaskExecutorFactory.java:578) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$collectAsync$1$$anonfun$apply$7.apply(AsyncRDDActions.scala:60) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$collectAsync$1$$anonfun$apply$7.apply(AsyncRDDActions.scala:60) at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976) at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635) at java.lang.Thread.run([email protected]/Thread.java:842) "presto-spark-scheduled-executor-3": at com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher.onSuccess(HttpNativeExecutionTaskResultFetcher.java:197) - waiting to lock <0x00000004563adfd0> (a java.lang.Object) - locked <0x00000004563ae070> (a com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher) at com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher.doGetResults(HttpNativeExecutionTaskResultFetcher.java:151) at com.facebook.presto.spark.execution.nativeprocess.HttpNativeExecutionTaskResultFetcher$$Lambda$3881/0x000076b2413c7628.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539) at java.util.concurrent.FutureTask.runAndReset$$$capture([email protected]/FutureTask.java:305) at java.util.concurrent.FutureTask.runAndReset([email protected]/FutureTask.java) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:305) at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635) at java.lang.Thread.run([email protected]/Thread.java:842) Found 1 deadlock.

tanjialiang · 2025-11-19T22:12:32Z

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java

        if (pagesResponse.isClientComplete()) {
-            completed = true;
            workerClient.abortResultsAsync(taskId);
            if (scheduledFuture != null) {


this shall not be null at this point, right? could we force a non-null check here instead?

tanjialiang · 2025-11-19T22:13:31Z

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java

-    public void stop(boolean success)
+    public synchronized void stop(boolean success)
    {
        if (scheduledFuture != null) {


ditto. should we do a check instead? we should enforce this class to be started before calling stop()

tanjialiang · 2025-11-19T22:14:14Z

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java

-    private void throwIfFailed()
+    private synchronized void throwIfFailed()
    {
        if (scheduledFuture != null && scheduledFuture.isCancelled() && lastException.get() != null) {


tanjialiang · 2025-11-26T08:29:01Z

Left some comments.

Thanks @tanjialiang, but I don’t seem to see the comments. Would you mind checking again?

Sorry that these comments were in pending stage.. Now I've published them.

fix: Make scheduledFuture thread-safe in HttpNativeExecutionTaskResul…

9a16926

…tFetcher

xin-zhang2 requested review from a team and shrinidhijoshi as code owners November 18, 2025 18:01

prestodb-ci added the from:IBM PR from IBM label Nov 18, 2025

prestodb-ci requested a review from a team November 18, 2025 18:01

prestodb-ci requested review from Dilli-Babu-Godari and Joe-Abraham and removed request for a team November 18, 2025 18:01

sourcery-ai bot reviewed Nov 18, 2025

View reviewed changes

.../com/facebook/presto/spark/execution/nativeprocess/HttpNativeExecutionTaskResultFetcher.java Show resolved Hide resolved

xin-zhang2 requested review from hantangwangd and tanjialiang November 18, 2025 18:09

hantangwangd reviewed Nov 20, 2025

View reviewed changes

tanjialiang reviewed Nov 26, 2025

View reviewed changes

fix: Make scheduledFuture thread-safe in HttpNativeExecutionTaskResultFetcher #26649

Are you sure you want to change the base?

fix: Make scheduledFuture thread-safe in HttpNativeExecutionTaskResultFetcher #26649

Uh oh!

Conversation

xin-zhang2 commented Nov 18, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for updated HttpNativeExecutionTaskResultFetcher thread safety

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tanjialiang commented Nov 19, 2025

Uh oh!

xin-zhang2 commented Nov 20, 2025

Uh oh!

hantangwangd Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

tanjialiang Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

hantangwangd Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

tanjialiang Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tanjialiang Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tanjialiang Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tanjialiang commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sourcery-ai bot commented Nov 18, 2025 •

edited

Loading