Skip to content

fix(backend): improve uniqueness constraint query #6271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: stable
Choose a base branch
from

Conversation

fatih-acar
Copy link
Contributor

@fatih-acar fatih-acar commented Apr 10, 2025

This should avoid processing useless rows when we have relationship only constraints.
This especially improves the IPAM use-case, where the uniqueness constraint is the (address/prefix, namespace) tuple. Without this change, the query would check for nodes having that same address and fetch ALL nodes present in the same namespace. With this change, the query is now FIRST checking nodes having that same address and THEN checking if these previous nodes are in the same namespace.

I'm not sure if our test coverage is enough to correctly test this change, a thorough review along with additional unit tests may be required...

EDIT: the "OPTIONAL MATCH" fix was caught only by this test: backend/tests/unit/core/constraint_validators/test_node_grouped_uniqueness.py::TestNodeGroupedUniquenessConstraint::test_subset_hfid_violated so I'm afraid of missing more unit tests to cover all cases...

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Apr 10, 2025
@fatih-acar fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch from da8c525 to 9afab1f Compare April 10, 2025 13:07
Copy link

codspeed-hq bot commented Apr 10, 2025

CodSpeed Performance Report

Merging #6271 will not alter performance

Comparing fac-perf-improve-uniqueness-constraint-ipam (03bbc11) with stable (4853eb8)

Summary

✅ 10 untouched benchmarks

@fatih-acar fatih-acar force-pushed the fac-node-attr-perf-improvement branch from 3e8083f to 9d8175b Compare April 25, 2025 07:50
@fatih-acar fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch 2 times, most recently from c99e4d2 to fb6a039 Compare April 25, 2025 13:54
@fatih-acar fatih-acar marked this pull request as ready for review April 25, 2025 14:22
@fatih-acar fatih-acar requested a review from a team as a code owner April 25, 2025 14:22
@fatih-acar
Copy link
Contributor Author

@ajtmccarty I'd like some help to test this change, I'm not sure if we have enough unit tests to cover all cases. I'm especially afraid of cases when we have multiple uniqueness constraints tuples using the same attributes.

@fatih-acar fatih-acar force-pushed the fac-node-attr-perf-improvement branch from 699288c to 688bd82 Compare April 25, 2025 17:39
@fatih-acar fatih-acar requested a review from a team as a code owner April 25, 2025 17:39
Base automatically changed from fac-node-attr-perf-improvement to stable April 28, 2025 18:38
Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit with some more unit tests that I think cover the cases you are worried about, but let me know if you think others are required

fatih-acar and others added 2 commits April 29, 2025 13:57
This should avoid processing useless rows when we have relationship only
constraints.
This especially improves the IPAM use-case, where the uniqueness
constraint is the (address/prefix, namespace) tuple.
Without this change, the query would check for nodes having that same
address and fetch ALL nodes present in the same namespace.
With this change, the query is now FIRST checking nodes having that same
address and THEN checking if these previous nodes are in the same namespace.

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch from 4158ed5 to 7dbcf43 Compare April 29, 2025 11:58
Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a commit to correctly support matching paths where there are no attribute-level matches and 1 or more relationship-level matches

@@ -103,48 +103,77 @@ async def query_init(self, db: InfrahubDatabase, **kwargs: Any) -> None: # noqa
)

attr_paths_subquery = """
MATCH attr_path = (start_node:%(node_kind)s)-[:HAS_ATTRIBUTE]->(attr:Attribute)-[r:HAS_VALUE]->(attr_value:AttributeValue)
OPTIONAL MATCH attr_path = (attr_start_node:%(node_kind)s)-[:HAS_ATTRIBUTE]->(attr:Attribute)-[r:HAS_VALUE]->(attr_value:AttributeValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the OPTIONAL here to account for cases where there are no attribute-level matches but there are relationship-level matches

""" % {"node_kind": self.query_request.kind}

relationship_attr_paths_with_value_subquery = """
MATCH rel_path = (start_node:%(node_kind)s)-[:IS_RELATED]-(relationship_node:Relationship)-[:IS_RELATED]-(related_n:Node)-[:HAS_ATTRIBUTE]->(rel_attr:Attribute)-[:HAS_VALUE]->(rel_attr_value:AttributeValue)
OPTIONAL MATCH rel_path = (rel_start_node:%(node_kind)s)-[:IS_RELATED]-(relationship_node:Relationship)-[:IS_RELATED]-(related_n:Node)-[:HAS_ATTRIBUTE]->(rel_attr:Attribute)-[:HAS_VALUE]->(rel_attr_value:AttributeValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the variable name on this line and 120 b/c if the attr_start_node is null in the first MATCH statement, then these later MATCHs will never return any results while they use the same variable name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is embarassing... that was the whole point of the change: reusing the same start_node we used to match attribute values, so that it does not have to traverse a lot of nodes when looking for rel_path without value...

We probably need to build different queries for each case then... I'm not sure we can do a single query that can handle all cases.

}
WITH DISTINCT start_node, potential_path, rel_identifier, potential_attr, potential_attr_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added DISTINCT here b/c I believe that the initial subquery that identifies attr_start_node, rel_start_node, and rel_only_start_node can return duplicates

select_subqueries_str = "UNION".join(select_subqueries)
select_subqueries_str = "".join(select_subqueries)
return_subqueries_str = ", ".join(returned_attributes)
filter_subqueries_str = "UNION".join(filter_subqueries)

# ruff: noqa: E501
query = """
// get attributes for node and its relationships
CALL {
%(select_subqueries_str)s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having all of the initial OPTIONAL MATCH statements within this single CALL subquery means that we can get duplicated rows in the results
for example, if the attr_paths_subquery matches 5 paths and the relationship_only_attr_paths_subquery matches 5 paths, then we will get 25 rows returned, one for each combination of attr and rel path
I assume that this single CALL subquery instead of 1-3 separate CALL subqueries improves performance b/c we are usually operating on a pretty small number of rows, but just wanted to make sure we're all on the same page for how this works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, having a single CALL subquery allows reusing the start_node we used when filtering attribute values so that we filter early.
The change you've done above makes this useless now though 🤔

@fatih-acar fatih-acar marked this pull request as draft May 2, 2025 12:03
@fatih-acar
Copy link
Contributor Author

This PR is on hold after internal discussions, another approach is required to solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants