Skip to content

handle migrated kind nodes in diff and merge logic #6401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: ajtm-05082025-rel-query-updates
Choose a base branch
from

Conversation

ajtmccarty
Copy link
Contributor

@ajtmccarty ajtmccarty commented May 2, 2025

IFC-1452

TODOs

  • add test for node migrated and then deleted
  • add test for two node kind migrations on the same nodes on a branch
  • add issue for adding test for inheritance migrations

updates to the diff and merge logic to support nodes that have had their kind or namespace update on a branch
this problematic workflow is

  • make a branch
  • update the name or namespace of a schema on the branch. this runs a migration that creates new nodes with the same UUID as existing nodes, but with a new kind
  • merge the branch

previously, the diff calculation queries completely ignored these duplicated nodes, so they were not part of the diff or the merge logic at all. the only reason no one recognized the problem was that the schema migrations run during the merge operation would create a third node with the same UUID on the default branch. this kind of worked, but the database was in an unexpected state and would only get more unexpected as time continued

the basic solution I've implemented in this PR is to track which nodes are part of a kind migration as we calculate the diff and then merge the migrated-kind nodes using their own special query at the end of the diff merge logic. we cannot keep the migrated-kind nodes completely separate from the regular updates, b/c a node can have both kinds of changes (eg I update a relationship on a node, then I update the node schema's name, then I update an attribute on that node).

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label May 2, 2025
Copy link

codspeed-hq bot commented May 2, 2025

CodSpeed Performance Report

Merging #6401 will not alter performance

Comparing ajtm-05022024-diff-kind-migrations (4638dea) with ajtm-05082025-rel-query-updates (5694b2d)

Summary

✅ 10 untouched benchmarks

Base automatically changed from ajtm-04242025-migrated-kind-diff-bug to stable May 6, 2025 13:38
@ajtmccarty ajtmccarty force-pushed the ajtm-05022024-diff-kind-migrations branch from 8e81395 to 2e7329d Compare May 6, 2025 13:44
@ajtmccarty ajtmccarty force-pushed the ajtm-05022024-diff-kind-migrations branch from 2e7329d to 8ece551 Compare May 6, 2025 20:28
@ajtmccarty ajtmccarty changed the base branch from stable to ajtm-05062025-rel-create-migrated-kind May 6, 2025 20:29
Base automatically changed from ajtm-05062025-rel-create-migrated-kind to stable May 8, 2025 06:08
@ajtmccarty ajtmccarty changed the base branch from stable to ajtm-05082025-rel-query-updates May 9, 2025 16:13
@@ -69,6 +70,56 @@ async def _run_diff_calculation_query(
has_more_data = last_result.get_as_type("has_more_data", bool)
offset += limit

async def _apply_kind_migrated_nodes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runs a new cypher query to determine which nodes are part of a node kind/inheritance migration and sets is_node_kind_migation = True on all of those nodes
this will be run any time we generate or update a diff and is_node_kind_migration will stay True on a given node across updates once it is set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all updates to use NodeIdentifier for uniqueness instead of the node's UUID b/c the UUID will not actually be unique in the case of a node kind/inheritance migration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to support using the db_id instead of labels for node uniqueness

@@ -53,6 +59,11 @@ async def merge_graph(self, at: Timestamp) -> EnrichedDiffRoot:
)
log.info(f"Diff {latest_diff.uuid} retrieved")
batch_num = 0
migrated_kinds_id_map = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this map tells the node merge queries which node with a given UUID is the new one and which nodes are part of a node kind migration on the branch

)
await merge_properties_query.execute(db=self.db)
log.info(f"Batch #{batch_num} merged")
batch_num += 1
migrated_kind_uuids = {n.identifier.uuid for n in enriched_diff.nodes if n.is_node_kind_migration}
if migrated_kind_uuids:
migrated_merge_query = await DiffMergeMigratedKindsQuery.init(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new specialized query to merge nodes with migrated kinds

self.add_to_query(query=query)


class DiffMergeMigratedKindsQuery(Query):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new query to merge nodes with a kind migration
when a node's kind or inheritance is migrated, we create a new node with the same UUID and an updated kind property and/or labels, then add "deleted" edges for all edges linked to the old node and "active" edges to the new node
this query just copies the latest edges linked to any node with a given UUID from the source to the target branch, unless they already exist, in which case they are ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to save is_node_kind_migration property and uniquely identify nodes using UUID/database ID instead of just UUID

diff_node.identifier.kind = database_path.node_kind
diff_node.db_id = database_path.node_db_id
diff_node.from_time = database_path.node_changed_at
diff_node.status = database_path.node_status
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was necessary when we were combining diffs for a node with an updated kind into 1, but we don't do that any longer

AND ALL(
r in [r_node, r_prop]
WHERE r.from < $to_time AND r.branch = top_diff_rel.branch
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently using all is less efficient than just doing the same conditional multiple times if you can

has_more_data: bool


class DiffMigratedKindNodesQuery(DiffCalculationQuery):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new lightweight (well, pretty lightweight) query to identify which nodes are part of a kind migration

@ajtmccarty ajtmccarty marked this pull request as ready for review May 9, 2025 22:18
@ajtmccarty ajtmccarty requested a review from a team as a code owner May 9, 2025 22:18
@ajtmccarty ajtmccarty changed the title WIP on including migrated kinds in diff handle migrated kind nodes in diff and merge logic May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant