Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: remove confusing update_plan_from_children call from EnforceSorting #14650

Merged
merged 3 commits into from
Feb 15, 2025

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Feb 13, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

update_sort_ctx_children only updates the data field of the context nodes, which tracks whether nodes are connected to a SortExec. update_plan_from_children() is about updating the actual execution plan structure.
The data field updates in update_sort_ctx_children don't actually require any changes to the underlying execution plan.
Therefore, the call to update_plan_from_children() in update_sort_ctx_children appears unnecessary.

So remove it to avoid confusion.

What changes are included in this PR?

remove unnecessary update_plan_from_children call

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Feb 13, 2025
@xudong963
Copy link
Member Author

This will be the first PR that I'm improving the EnforceSorting, I'll try to make the improvement as small as possible to reduce the review burden and avoid regression.

@xudong963 xudong963 marked this pull request as draft February 13, 2025 16:05
alamb
alamb previously approved these changes Feb 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting to work on this code @xudong963

BTW I believe @wiedld is also plans to work to make this code better, see

Maybe you can work together to improve things

@wiedld
Copy link
Contributor

wiedld commented Feb 14, 2025

Thank you for starting to work on this code @xudong963

BTW I believe @wiedld is also plans to work to make this code better, see

Maybe you can work together to improve things

Happy to work together @xudong963 .

Right now I'm working on:

  • put in a fix for this bug (and let reviewers correct the approach 😆 ).
  • I'm in the process of refactoring the EnforceDistribution tests since:
    • those tests are still tightly coupled with EnforceSorting from when they were a single optimization.
    • we have already demonstrated that this optimizer pass is not idempotent.
    • make it a bit easier to understand and iterate upon

I can make sure to add you as a reviewers on those. Please lmk if/how you want to coordinate.

@wiedld
Copy link
Contributor

wiedld commented Feb 14, 2025

Also, it looks like the failing CI tests are related to the sort enforcement. Such as not removing unnecessary sorts in some of the children.

@github-actions github-actions bot added the physical-expr Physical Expressions label Feb 14, 2025
@xudong963
Copy link
Member Author

Also, it looks like the failing CI tests are related to the sort enforcement. Such as not removing unnecessary sorts in some of the children.

The problem is that the code still depends on the update_plan_from_children in update_sort_ctx_children, even if children aren't changed in update_sort_ctx_children. -- This is making things weird.

I did more updates:

  1. Rename the update_plan_from_children to update_sort_ctx_children_data to show what the method does to update data.
  2. Call update_plan_from_children after the children of PlanContext is changed.
  3. After creating the PlanContext, if you can't guarantee they(PlanContext's plan's children's Execution and PlanContext's children's Execution) are consistent, call update_plan_from_children to sync.

@xudong963 xudong963 requested a review from alamb February 14, 2025 06:29
@xudong963
Copy link
Member Author

I can make sure to add you as a reviewers on those. Please lmk if/how you want to coordinate.

Sure, please add me.

@xudong963 xudong963 changed the title Minor: remove unnecessary update_plan_from_children call Minor: remove confusing update_plan_from_children call Feb 14, 2025
@alamb alamb changed the title Minor: remove confusing update_plan_from_children call Minor: remove confusing update_plan_from_children call from EnforceSorting Feb 14, 2025
@alamb alamb marked this pull request as ready for review February 14, 2025 14:47
@alamb
Copy link
Contributor

alamb commented Feb 14, 2025

@wiedld can you please review this PR?

@alamb alamb dismissed their stale review February 14, 2025 15:03

Stale review

Copy link
Contributor

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I also made a draft PR with added docs, which references exactly why this PR is ok. I'll update conflicts & merge that in after this PR.

Note that we could have also change the meaning of the function to update_ctx_and_sync_children. Either way works.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xudong963 and @wiedld -- it is really neat to see some others start working on this code. I expect it will get even better than it already is!

@@ -45,7 +45,7 @@ use itertools::izip;
pub type OrderPreservationContext = PlanContext<bool>;

/// Updates order-preservation data for all children of the given node.
pub fn update_children(opc: &mut OrderPreservationContext) {
pub fn update_children_data(opc: &mut OrderPreservationContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the methods names hard to follow -- for example update_children_data actually has a OrderPreservationContext 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I changed to update_order_preservation_ctx_children_data, longer but clearer

@@ -42,6 +42,8 @@ impl DynTreeNode for dyn ExecutionPlan {
/// A node object beneficial for writing optimizer rules, encapsulating an [`ExecutionPlan`] node with a payload.
/// Since there are two ways to access child plans—directly from the plan and through child nodes—it's recommended
/// to perform mutable operations via [`Self::update_plan_from_children`].
/// After update `children`, please do the sync updating for `plan`'s children.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could find some way to avoid having to remember to do this 🤔

If we have to remember to call a function it seems likely either our future selves or others will forget to do so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one way would be to make these fields not pub and then control access

so like add a method so the only way to update children woudl be

impl PlanContext {
  fn update_children(&mut self, new_children: Vec<Self>) -> {
    self.children = new_children;
    self.update_child_data()
}

🤔

To be clear I am talking about some future PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've tried this way to make the operation "atomic", but I found it hard to change, most places acquire mut self for PlanContext.

I have another way to avoid the mind burden. Currrently, we have two parallel tree structures:

  • The ExecutionPlan tree (through plan field and its children)
  • The PlanContext tree (through children field)

I'm trying to make them single.

@alamb
Copy link
Contributor

alamb commented Feb 14, 2025

FYI @ozankabak and @berkaysynnada

@xudong963
Copy link
Member Author

I plan to merge the PR, and then @wiedld can continue to his PR.

I also made a draft PR with added docs, which references exactly why this PR is ok. I'll update conflicts & merge that in after this PR.

@xudong963 xudong963 merged commit 9681c3b into apache:main Feb 15, 2025
24 checks passed
@xudong963 xudong963 deleted the remove_unnecessary_call branch February 15, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants