fix(backend): improve uniqueness constraint query #6271

fatih-acar · 2025-04-10T13:06:22Z

This should avoid processing useless rows when we have relationship only constraints.
This especially improves the IPAM use-case, where the uniqueness constraint is the (address/prefix, namespace) tuple. Without this change, the query would check for nodes having that same address and fetch ALL nodes present in the same namespace. With this change, the query is now FIRST checking nodes having that same address and THEN checking if these previous nodes are in the same namespace.

I'm not sure if our test coverage is enough to correctly test this change, a thorough review along with additional unit tests may be required...

EDIT: the "OPTIONAL MATCH" fix was caught only by this test: backend/tests/unit/core/constraint_validators/test_node_grouped_uniqueness.py::TestNodeGroupedUniquenessConstraint::test_subset_hfid_violated so I'm afraid of missing more unit tests to cover all cases...

codspeed-hq · 2025-04-10T13:16:51Z

CodSpeed Performance Report

Merging #6271 will not alter performance

_{Comparing fac-perf-improve-uniqueness-constraint-ipam (03bbc11) with stable (4853eb8)}

Summary

✅ 10 untouched benchmarks

fatih-acar · 2025-04-25T14:23:21Z

@ajtmccarty I'd like some help to test this change, I'm not sure if we have enough unit tests to cover all cases. I'm especially afraid of cases when we have multiple uniqueness constraints tuples using the same attributes.

ajtmccarty

I pushed a commit with some more unit tests that I think cover the cases you are worried about, but let me know if you think others are required

This should avoid processing useless rows when we have relationship only constraints. This especially improves the IPAM use-case, where the uniqueness constraint is the (address/prefix, namespace) tuple. Without this change, the query would check for nodes having that same address and fetch ALL nodes present in the same namespace. With this change, the query is now FIRST checking nodes having that same address and THEN checking if these previous nodes are in the same namespace. Signed-off-by: Fatih Acar <[email protected]>

ajtmccarty

added a commit to correctly support matching paths where there are no attribute-level matches and 1 or more relationship-level matches

ajtmccarty · 2025-04-29T14:05:16Z

backend/infrahub/core/validators/uniqueness/query.py

@@ -103,48 +103,77 @@ async def query_init(self, db: InfrahubDatabase, **kwargs: Any) -> None:  # noqa
        )

        attr_paths_subquery = """
-        MATCH attr_path = (start_node:%(node_kind)s)-[:HAS_ATTRIBUTE]->(attr:Attribute)-[r:HAS_VALUE]->(attr_value:AttributeValue)
+        OPTIONAL MATCH attr_path = (attr_start_node:%(node_kind)s)-[:HAS_ATTRIBUTE]->(attr:Attribute)-[r:HAS_VALUE]->(attr_value:AttributeValue)


added the OPTIONAL here to account for cases where there are no attribute-level matches but there are relationship-level matches

ajtmccarty · 2025-04-29T14:06:12Z

backend/infrahub/core/validators/uniqueness/query.py

        """ % {"node_kind": self.query_request.kind}

        relationship_attr_paths_with_value_subquery = """
-        MATCH rel_path = (start_node:%(node_kind)s)-[:IS_RELATED]-(relationship_node:Relationship)-[:IS_RELATED]-(related_n:Node)-[:HAS_ATTRIBUTE]->(rel_attr:Attribute)-[:HAS_VALUE]->(rel_attr_value:AttributeValue)
+        OPTIONAL MATCH rel_path = (rel_start_node:%(node_kind)s)-[:IS_RELATED]-(relationship_node:Relationship)-[:IS_RELATED]-(related_n:Node)-[:HAS_ATTRIBUTE]->(rel_attr:Attribute)-[:HAS_VALUE]->(rel_attr_value:AttributeValue)


changed the variable name on this line and 120 b/c if the attr_start_node is null in the first MATCH statement, then these later MATCHs will never return any results while they use the same variable name

This is embarassing... that was the whole point of the change: reusing the same start_node we used to match attribute values, so that it does not have to traverse a lot of nodes when looking for rel_path without value...

We probably need to build different queries for each case then... I'm not sure we can do a single query that can handle all cases.

ajtmccarty · 2025-04-29T14:06:56Z

backend/infrahub/core/validators/uniqueness/query.py

        }
+        WITH DISTINCT start_node, potential_path, rel_identifier, potential_attr, potential_attr_value


added DISTINCT here b/c I believe that the initial subquery that identifies attr_start_node, rel_start_node, and rel_only_start_node can return duplicates

ajtmccarty · 2025-04-29T14:10:32Z

backend/infrahub/core/validators/uniqueness/query.py

-        select_subqueries_str = "UNION".join(select_subqueries)
+        select_subqueries_str = "".join(select_subqueries)
+        return_subqueries_str = ", ".join(returned_attributes)
+        filter_subqueries_str = "UNION".join(filter_subqueries)

        # ruff: noqa: E501
        query = """
        // get attributes for node and its relationships
        CALL {
            %(select_subqueries_str)s


having all of the initial OPTIONAL MATCH statements within this single CALL subquery means that we can get duplicated rows in the results
for example, if the attr_paths_subquery matches 5 paths and the relationship_only_attr_paths_subquery matches 5 paths, then we will get 25 rows returned, one for each combination of attr and rel path
I assume that this single CALL subquery instead of 1-3 separate CALL subqueries improves performance b/c we are usually operating on a pretty small number of rows, but just wanted to make sure we're all on the same page for how this works

You're right, having a single CALL subquery allows reusing the start_node we used when filtering attribute values so that we filter early.
The change you've done above makes this useless now though 🤔

fatih-acar · 2025-05-02T12:04:03Z

This PR is on hold after internal discussions, another approach is required to solve this.

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Apr 10, 2025

fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch from da8c525 to 9afab1f Compare April 10, 2025 13:07

fatih-acar force-pushed the fac-node-attr-perf-improvement branch from 3e8083f to 9d8175b Compare April 25, 2025 07:50

fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch 2 times, most recently from c99e4d2 to fb6a039 Compare April 25, 2025 13:54

fatih-acar marked this pull request as ready for review April 25, 2025 14:22

fatih-acar requested a review from a team as a code owner April 25, 2025 14:22

fatih-acar force-pushed the fac-node-attr-perf-improvement branch from 699288c to 688bd82 Compare April 25, 2025 17:39

fatih-acar requested a review from a team as a code owner April 25, 2025 17:39

Base automatically changed from fac-node-attr-perf-improvement to stable April 28, 2025 18:38

ajtmccarty approved these changes Apr 28, 2025

View reviewed changes

fatih-acar and others added 2 commits April 29, 2025 13:57

add more unit tests

7dbcf43

fatih-acar force-pushed the fac-perf-improve-uniqueness-constraint-ipam branch from 4158ed5 to 7dbcf43 Compare April 29, 2025 11:58

handle queries where no attrs match, but relationships do

03bbc11

ajtmccarty reviewed Apr 29, 2025

View reviewed changes

fatih-acar marked this pull request as draft May 2, 2025 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): improve uniqueness constraint query #6271

fix(backend): improve uniqueness constraint query #6271

fatih-acar commented Apr 10, 2025 •

edited

Loading

codspeed-hq bot commented Apr 10, 2025 •

edited

Loading

fatih-acar commented Apr 25, 2025

ajtmccarty left a comment

ajtmccarty left a comment

ajtmccarty Apr 29, 2025

ajtmccarty Apr 29, 2025

fatih-acar Apr 29, 2025

ajtmccarty Apr 29, 2025

ajtmccarty Apr 29, 2025

fatih-acar Apr 29, 2025

fatih-acar commented May 2, 2025

		}
		WITH DISTINCT start_node, potential_path, rel_identifier, potential_attr, potential_attr_value

fix(backend): improve uniqueness constraint query #6271

Are you sure you want to change the base?

fix(backend): improve uniqueness constraint query #6271

Conversation

fatih-acar commented Apr 10, 2025 • edited Loading

codspeed-hq bot commented Apr 10, 2025 • edited Loading

CodSpeed Performance Report

Merging #6271 will not alter performance

Summary

fatih-acar commented Apr 25, 2025

ajtmccarty left a comment

Choose a reason for hiding this comment

ajtmccarty left a comment

Choose a reason for hiding this comment

ajtmccarty Apr 29, 2025

Choose a reason for hiding this comment

ajtmccarty Apr 29, 2025

Choose a reason for hiding this comment

fatih-acar Apr 29, 2025

Choose a reason for hiding this comment

ajtmccarty Apr 29, 2025

Choose a reason for hiding this comment

ajtmccarty Apr 29, 2025

Choose a reason for hiding this comment

fatih-acar Apr 29, 2025

Choose a reason for hiding this comment

fatih-acar commented May 2, 2025

fatih-acar commented Apr 10, 2025 •

edited

Loading

codspeed-hq bot commented Apr 10, 2025 •

edited

Loading