-
Notifications
You must be signed in to change notification settings - Fork 25
handle migrated kind nodes in diff and merge logic #6401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ajtm-05082025-rel-query-updates
Are you sure you want to change the base?
handle migrated kind nodes in diff and merge logic #6401
Conversation
CodSpeed Performance ReportMerging #6401 will not alter performanceComparing Summary
|
8e81395
to
2e7329d
Compare
migrated kind/inheritance
2e7329d
to
8ece551
Compare
@@ -69,6 +70,56 @@ async def _run_diff_calculation_query( | |||
has_more_data = last_result.get_as_type("has_more_data", bool) | |||
offset += limit | |||
|
|||
async def _apply_kind_migrated_nodes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runs a new cypher query to determine which nodes are part of a node kind/inheritance migration and sets is_node_kind_migation = True
on all of those nodes
this will be run any time we generate or update a diff and is_node_kind_migration
will stay True
on a given node across updates once it is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all updates to use NodeIdentifier
for uniqueness instead of the node's UUID b/c the UUID will not actually be unique in the case of a node kind/inheritance migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes to support using the db_id
instead of labels
for node uniqueness
@@ -53,6 +59,11 @@ async def merge_graph(self, at: Timestamp) -> EnrichedDiffRoot: | |||
) | |||
log.info(f"Diff {latest_diff.uuid} retrieved") | |||
batch_num = 0 | |||
migrated_kinds_id_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this map tells the node merge queries which node with a given UUID is the new one and which nodes are part of a node kind migration on the branch
) | ||
await merge_properties_query.execute(db=self.db) | ||
log.info(f"Batch #{batch_num} merged") | ||
batch_num += 1 | ||
migrated_kind_uuids = {n.identifier.uuid for n in enriched_diff.nodes if n.is_node_kind_migration} | ||
if migrated_kind_uuids: | ||
migrated_merge_query = await DiffMergeMigratedKindsQuery.init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new specialized query to merge nodes with migrated kinds
self.add_to_query(query=query) | ||
|
||
|
||
class DiffMergeMigratedKindsQuery(Query): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new query to merge nodes with a kind migration
when a node's kind or inheritance is migrated, we create a new node with the same UUID and an updated kind
property and/or labels, then add "deleted"
edges for all edges linked to the old node and "active"
edges to the new node
this query just copies the latest edges linked to any node with a given UUID from the source to the target branch, unless they already exist, in which case they are ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes to save is_node_kind_migration
property and uniquely identify nodes using UUID/database ID instead of just UUID
diff_node.identifier.kind = database_path.node_kind | ||
diff_node.db_id = database_path.node_db_id | ||
diff_node.from_time = database_path.node_changed_at | ||
diff_node.status = database_path.node_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was necessary when we were combining diffs for a node with an updated kind into 1, but we don't do that any longer
AND ALL( | ||
r in [r_node, r_prop] | ||
WHERE r.from < $to_time AND r.branch = top_diff_rel.branch | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently using all
is less efficient than just doing the same conditional multiple times if you can
has_more_data: bool | ||
|
||
|
||
class DiffMigratedKindNodesQuery(DiffCalculationQuery): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new lightweight (well, pretty lightweight) query to identify which nodes are part of a kind migration
IFC-1452
TODOs
updates to the diff and merge logic to support nodes that have had their kind or namespace update on a branch
this problematic workflow is
previously, the diff calculation queries completely ignored these duplicated nodes, so they were not part of the diff or the merge logic at all. the only reason no one recognized the problem was that the schema migrations run during the merge operation would create a third node with the same UUID on the default branch. this kind of worked, but the database was in an unexpected state and would only get more unexpected as time continued
the basic solution I've implemented in this PR is to track which nodes are part of a kind migration as we calculate the diff and then merge the migrated-kind nodes using their own special query at the end of the diff merge logic. we cannot keep the migrated-kind nodes completely separate from the regular updates, b/c a node can have both kinds of changes (eg I update a relationship on a node, then I update the node schema's name, then I update an attribute on that node).