Skip to content

Conversation

1996fanrui
Copy link
Member

Backport #26931 to 2.1

[FLINK-38267] Only call channel state rescaling logic for exchange with channel state to avoid UnsupportedOperationException

… exchange with channel state to avoid UnsupportedOperationException
@flinkbot
Copy link
Collaborator

flinkbot commented Aug 25, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@1996fanrui
Copy link
Member Author

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=69481&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=9d734c8c-6253-55e6-3bce-47e7cdf68ac4&l=8852

The CI is timed out in 2.1, and after analysis, it only happens for UnalignedCheckpointRescaleWithMixedExchangesITCase#createMultiInputDAG. And I am sure it is a test issue instead of production code issue.

The reason is : Co-Map task has 2 inputs exchanges, one is rebalance, another one :

  • is forward when source2Parallelism is equal to the parallelism of Co-Map
  • is rebalance when source2Parallelism is not equal to the parallelism of Co-Map
  • All of parallelism are generated randomly
  • Timeout only happens when it is forward, for this case, Co-Map task[1] will disable unaligned checkpoint for all inputs, it causes no inflight buffers during checkpoint. But our test expects waitForCheckpointWithInflightBuffers.

It means FLINK-38267 bug only happens for multiple outputs. For MultiInput cases, the unaligned checkpoint is disabled for all exchanges if any exchanges is disallow unaligned checkpoint, so it is not needed to distinguish specific gate.

So I'd like to delete UnalignedCheckpointRescaleWithMixedExchangesITCase#createMultiInputDAG or introducing an rebalance after forwardedStream to prevent forward exchange for Co-Map.

cc @gaborgsomogyi

[1]

@gaborgsomogyi
Copy link
Contributor

Thanks, I'm intended to have a look tomorrow morning.

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…tRescaleWithMixedExchangesITCase.testRescaleFromUnalignedCheckpoint

When one task has multiple inputs, and the unaligned checkpoint will be disabled for all inputs once one input exchange does not support unaligned checkpoint. It caused no inflight buffers, but UnalignedCheckpointRescaleWithMixedExchangesITCase.testRescaleFromUnalignedCheckpoint always wait for checkpoint with inflight buffers.

Explicitly specifying rebalance can avoid the forward exchange.
Copy link
Member Author

@1996fanrui 1996fanrui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaborgsomogyi for the review, merging...

@1996fanrui 1996fanrui merged commit 90762ab into apache:release-2.1 Aug 29, 2025
@1996fanrui 1996fanrui deleted the 38267/2.1 branch August 29, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants