|
| 1 | += Versioning |
| 2 | +:description: See what options of graph data model versioning are commonly used in combination with Neo4j. |
| 3 | + |
| 4 | +Every time you xref:data-modeling/graph-model-refactoring.adoc[refactor] your data model, you create new versions of it. |
| 5 | +Tracking changes in the data structure or showing a current and past value can be valuable for auditing purposes, trend analysis, etc. |
| 6 | +This page gives an overview of the different ways you could model data in order to keep track of changes over time. |
| 7 | + |
| 8 | +== Versioning of entities |
| 9 | + |
| 10 | +You can keep track of changes in data by versioning relevant entities. |
| 11 | +This strategy is useful when you need to: |
| 12 | + |
| 13 | +* Access the many versions of specific entities (nodes, for instance) in a graph (e.g. the different names a product has had throughout time). |
| 14 | +* Retrieve the latest version only (e.g. the current name of a product). |
| 15 | + |
| 16 | +image::versioned-entities.svg[Example of a graph showing different versions of an entity which had its property value changed over time,width=300,role=popup] |
| 17 | + |
| 18 | +With entities versioning: |
| 19 | + |
| 20 | +* The entity `Product` is linked to its different versions by an explicit relationship. |
| 21 | +* The entity `Product` is immutable. |
| 22 | +Only the properties that are stored in the different versions (`State` nodes) change. |
| 23 | +* The `LATEST` relationship links the entity `Product` to its most recent version (`State`), which also happens to be version 2 (`V2`). |
| 24 | + |
| 25 | +=== Pros and cons |
| 26 | + |
| 27 | +[cols="<,<",options="header"] |
| 28 | +|=== |
| 29 | +| **Pros** |
| 30 | +| **Cons** |
| 31 | + |
| 32 | +| Simple in terms of modeling, querying, and maintenance. |
| 33 | +| Updating nodes requires the deletion of the `LATEST` relationship, and the creation of a new relationship between the entity and its latest version. |
| 34 | + |
| 35 | +| Explicit for end users without any transformation. |
| 36 | +| Can be limited if not using other versioning patterns, as it can be hard to know which version you want to retrieve if it’s not the latest. |
| 37 | +|=== |
| 38 | + |
| 39 | +=== Query examples |
| 40 | + |
| 41 | +These are examples of common queries that are useful with the entity versioning strategy: |
| 42 | + |
| 43 | +.Get the name of the version 2 of a `Product` with the id '1' |
| 44 | +[source,cypher] |
| 45 | +-- |
| 46 | +MATCH (:Product {id:1})-[:V2]->(s:State) |
| 47 | +RETURN s.name |
| 48 | +-- |
| 49 | + |
| 50 | +.Get the name of the latest version of a `Product` with the id '1' |
| 51 | +[source,cypher] |
| 52 | +-- |
| 53 | +MATCH (:Product {id:1})-[:LATEST]->(s:State) |
| 54 | +RETURN s.name |
| 55 | +-- |
| 56 | + |
| 57 | +== Time-based versioning of entities |
| 58 | + |
| 59 | +A variation of the entity versioning is a time-based approach. |
| 60 | +It is useful when you are interested in: |
| 61 | + |
| 62 | +* *Graph snapshot* by retrieving all valid elements (nodes and relationships) of the graph to a specific point in time (e.g. which products are available on Monday the 12.06.23). |
| 63 | +* *Graph difference* by comparing two graph snashots of different time stamps (e.g. which nodes are added, which are deleted, and which remain the same). |
| 64 | +* *Temporal traversal* by traversing only valid elements (node or relationships) of the graph to a specific point in time in order to find the chronological sequence of relationships which connect time-based events (e.g. bike sharing graph with trip relationships between stations as nodes). |
| 65 | +* *Graph history* by modeling the history of data changes. |
| 66 | + |
| 67 | +image::time-based-entities.svg[Example graph of time-based versioning of entities,width=400,role=popup] |
| 68 | + |
| 69 | +With time-based versioning of entities: |
| 70 | + |
| 71 | +* Each element has dedicated `validFrom`/`validTo` time properties. |
| 72 | +* Nodes can only share a relationship if their validity timespan overlap. |
| 73 | +* Duplication of information is possible. |
| 74 | +* Complete history of the graph is usable. |
| 75 | + |
| 76 | +=== Pros and cons |
| 77 | + |
| 78 | +[cols="<,<",options="header"] |
| 79 | +|=== |
| 80 | +| **Pros** |
| 81 | +| **Cons** |
| 82 | + |
| 83 | +| Every element has a well defined time interval in which the element is valid. |
| 84 | +| If the state of a node changes, the node has to be duplicated and a new valid time interval should be assigned. |
| 85 | + |
| 86 | +| States are bound to the specific element (no additional relationship required). |
| 87 | +| Updating nodes requires the creation of a new relationship connecting to the new node/state and the assigning of A new valid interval to the relationship. |
| 88 | + |
| 89 | +| Aggregation of all elements (or only valid ones at a certain time) is possible. |
| 90 | +| Duplications of data cannot be avoided. |
| 91 | +|=== |
| 92 | + |
| 93 | +=== Query examples |
| 94 | + |
| 95 | +These are examples of common queries that are useful with the time-based entity versioning strategy: |
| 96 | + |
| 97 | +.Get the current price of the `Product` Rice Cooker |
| 98 | +[source,cypher] |
| 99 | +-- |
| 100 | +MATCH (p:Product) |
| 101 | +WHERE p.name = “Rice Cooker” AND p.validTo = ∞ |
| 102 | +RETURN p.price |
| 103 | +-- |
| 104 | + |
| 105 | +.Get the price of the `Product` Rice Cooker in November |
| 106 | +[source,cypher] |
| 107 | +-- |
| 108 | +MATCH (p:Product) |
| 109 | +WHERE p.name = “Rice Cooker” |
| 110 | +AND datetime(p.validFrom) <= datetime(“November”) <= datetime(p.validTo) |
| 111 | +RETURN p.price |
| 112 | +-- |
| 113 | + |
| 114 | +.Get the current product catalogue and the prices |
| 115 | +[source,cypher] |
| 116 | +-- |
| 117 | +MATCH ()-[r:HAS_PRODUCT]->(p) |
| 118 | +WHERE r.validTo = ∞ |
| 119 | +RETURN p.name, p.price |
| 120 | +-- |
| 121 | + |
| 122 | +== Linked list |
| 123 | + |
| 124 | +A linked list is another modeling strategy that can be useful when the sequence of objects matters. |
| 125 | + |
| 126 | +Linked lists are useful when: |
| 127 | + |
| 128 | +* The order of events is of interest, e.g. getting the order of transactions executed on a bank account. |
| 129 | +* You need the previous and next elements in a list, based on the relationship between them (e.g. what song is the next on a playlist, or undo an action on a text document) are . |
| 130 | + |
| 131 | +image::linked-list-versioning.svg[Example graph showing a linked list model design being used for versioning,width=400,role=popup] |
| 132 | + |
| 133 | +With a linked list: |
| 134 | + |
| 135 | +* The entity `Product` is linked to the first element of the sequence, and can be linked to the last one. |
| 136 | +* As with the the xref:#_versioning_of_entities[versioning of entities], the entity `Product` is also immutable here. |
| 137 | +* Each element of the sequence is linked to the next one through a `NEXT` relationship. |
| 138 | + |
| 139 | +=== Pros and cons |
| 140 | + |
| 141 | +[cols="<,<",options="header"] |
| 142 | +|=== |
| 143 | +| **Pros** |
| 144 | +| **Cons** |
| 145 | + |
| 146 | +| Efficient by using relationships to get the next/previous element. |
| 147 | +| Limited to very specific use cases without using other versioning patterns. |
| 148 | + |
| 149 | +| Simple modeling and maintenance. |
| 150 | +| Difficult to find a specific version which is not the first or the last. |
| 151 | + |
| 152 | +| Explicit for end users. |
| 153 | +| |
| 154 | +|=== |
| 155 | + |
| 156 | +=== Query examples |
| 157 | + |
| 158 | +These are examples of common queries that are useful with the linked-list versioning strategy: |
| 159 | + |
| 160 | +.Get the next name of the product named “Professional chair” |
| 161 | +[source,cypher] |
| 162 | +-- |
| 163 | +MATCH (:State{name: “Professional chair”})-[:NEXT]->(s:State) |
| 164 | +RETURN s.name |
| 165 | +-- |
| 166 | + |
| 167 | +.Get the previous name of the product with the id '1' |
| 168 | +[source,cypher] |
| 169 | +-- |
| 170 | +MATCH (:Product {id:1})-[:LAST]->(:State)<-[:NEXT]-(s:State) |
| 171 | +RETURN s.name |
| 172 | +-- |
| 173 | + |
| 174 | +== Timeline tree |
| 175 | + |
| 176 | +As mentioned in xref:data-modeling/modeling-designs.adoc[Modeling designs], the timeline tree is a common modeling design. |
| 177 | +It can be a useful strategy when you want to track change. |
| 178 | +In this example, the timeline structure spans from years to days, and the rest of the non-time data nodes are the nodes that contain the important pieces of data in the graph: |
| 179 | + |
| 180 | +image::timeline-tree.svg[Graph with two different timeslines divided in years, months, and days and what purchases are connected to these dates,width=600,role=popup] |
| 181 | + |
| 182 | +=== Query examples |
| 183 | + |
| 184 | +If you want to find all purchases that happened in a given time period, such as every purchase in the month of December 2012, the timeline tree can be navigated from 2012, to December, and then fetch everything from the connected leaf nodes (nodes with no descendants) under that branch: |
| 185 | + |
| 186 | +[source,cypher] |
| 187 | +-- |
| 188 | +MATCH (root:Timeline)-[:IN_YEAR]->(year:Year {value:2012})-[:IN_MONTH]->(month:Month {value:12}) |
| 189 | +WITH month |
| 190 | +MATCH (month)-[:ON_DAY]->(day) |
| 191 | +MATCH (purchase:Purchase)-[:OCCURRED]->(day) |
| 192 | +RETURN purchase |
| 193 | +-- |
| 194 | + |
| 195 | +== Combined approach |
| 196 | + |
| 197 | +Some complex use-cases require the combination of one or more of the previously mentioned modeling techniques since each has advantages and disadvantages. |
| 198 | + |
| 199 | +The right combination depends on the specific use-case. |
| 200 | +Factors such as query times and the frequency of transactions should be considered as well. |
| 201 | + |
| 202 | +image::combined-approach.svg[Example graph of a more complex approach to versioning combining timeline tree, versioned entities, and more,width=600,role=popup] |
| 203 | + |
0 commit comments