|
| 1 | +--- |
| 2 | +navTitle: Column-Level Conflict Resolution |
| 3 | +title: Column-Level Conflict Detection |
| 4 | +originalFilePath: column-level-conflicts.md |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +By default, conflicts are resolved at row level. That is, when changes |
| 9 | +from two nodes conflict, we pick either the local or remote tuple and |
| 10 | +discard the other one. For example, we may compare commit timestamps for |
| 11 | +the two conflicting changes and keep the newer one. This ensures that all |
| 12 | +nodes converge to the same result, and establishes commit-order-like |
| 13 | +semantics on the whole cluster. |
| 14 | + |
| 15 | +However, in some cases it may be appropriate to resolve conflicts at |
| 16 | +the column-level rather than the row-level. |
| 17 | + |
| 18 | +Consider a simple example, where we have a table "t" with two integer |
| 19 | +columns "a" and "b", and a single row `(1,1)`. Assume that on one node |
| 20 | +we execute: |
| 21 | + |
| 22 | +```sql |
| 23 | +UPDATE t SET a = 100 |
| 24 | +``` |
| 25 | + |
| 26 | +...while on another node we concurrently (before receiving the preceding |
| 27 | +`UPDATE`) execute: |
| 28 | + |
| 29 | +```sql |
| 30 | +UPDATE t SET b = 100 |
| 31 | +``` |
| 32 | + |
| 33 | +This results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` |
| 34 | +conflict resolution, we compare the commit timestamps and keep the new |
| 35 | +row version. Assuming the second node committed last, we end up with |
| 36 | +`(1,100)`, effectively discarding the change to column "a". |
| 37 | + |
| 38 | +For many use cases this is the desired and expected behaviour, but for |
| 39 | +some this may be an issue - consider for example a multi-node cluster |
| 40 | +where each part of the application is connected to a different node, |
| 41 | +updating a dedicated subset of columns in a shared table. In that case, |
| 42 | +the different components may step on each other's toes, overwriting |
| 43 | +their changes. |
| 44 | + |
| 45 | +For such use cases, it may be more appropriate to resolve conflicts on |
| 46 | +a given table at the column-level. To achieve that, BDR will track |
| 47 | +the timestamp of the last change for each column separately, and use that |
| 48 | +to pick the most recent value (essentially `update_if_newer`). |
| 49 | + |
| 50 | +Applied to the previous example, we'll end up with `(100,100)` on both |
| 51 | +nodes, despite neither of the nodes ever seeing such a row. |
| 52 | + |
| 53 | +When thinking about column-level conflict resolution, it may be useful |
| 54 | +to see tables as vertically partitioned, so that each update affects |
| 55 | +data in only one slice. This eliminates conflicts between changes to |
| 56 | +different subsets of columns. In fact, vertical partitioning may even |
| 57 | +be a practical alternative to column-level conflict resolution. |
| 58 | + |
| 59 | +Column-level conflict resolution requires the table to have `REPLICA |
| 60 | +IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function does check |
| 61 | +that, and will fail with an error otherwise. |
| 62 | + |
| 63 | +## Enabling and Disabling Column-Level Conflict Resolution |
| 64 | + |
| 65 | +The Column-Level Conflict Resolution is managed by the |
| 66 | +[bdr.alter_table_conflict_detection()](conflicts#bdralter_table_conflict_detection) |
| 67 | +function. |
| 68 | + |
| 69 | +### Example |
| 70 | + |
| 71 | +To illustrate how the `bdr.alter_table_conflict_detection()` is used, consider |
| 72 | +this example that creates a trivial table `test_table` and then enable |
| 73 | +column-level conflict resolution on it: |
| 74 | + |
| 75 | +```sql |
| 76 | +db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); |
| 77 | +CREATE TABLE |
| 78 | + |
| 79 | +db=# ALTER TABLE my_app.test_table REPLICA IDENTITY FULL; |
| 80 | +ALTER TABLE |
| 81 | + |
| 82 | +db=# SELECT bdr.alter_table_conflict_detection( |
| 83 | +db(# 'my_app.test_table'::regclass, 'column_modify_timestamp', 'cts'); |
| 84 | + alter_table_conflict_detection |
| 85 | +-------------------------------- |
| 86 | + t |
| 87 | + |
| 88 | +db=# \d my_app.test_table |
| 89 | +``` |
| 90 | + |
| 91 | +You will see that the function adds a new `cts` column (as specified in |
| 92 | +the function call), but it also created two triggers ( `BEFORE INSERT` |
| 93 | +and `BEFORE UPDATE` ) that are responsible for maintaining timestamps |
| 94 | +in the new column before each change. |
| 95 | + |
| 96 | +Also worth mentioning is that the new column specifies `NOT NULL` |
| 97 | +with a default value, which means that `ALTER TABLE ... ADD COLUMN` |
| 98 | +does not perform a table rewrite. |
| 99 | + |
| 100 | +*Note*: We discourage using columns with the `bdr.column_timestamps` data type |
| 101 | +for other purposes as it may have various negative effects |
| 102 | +(it switches the table to column-level conflict resolution, which will |
| 103 | +not work correctly without the triggers etc.). |
| 104 | + |
| 105 | +### Listing Table with Column-Level Conflict Resolution |
| 106 | + |
| 107 | +Tables having column-level conflict resolution enabled can be listed |
| 108 | +with the following query, which detects the presence of a column of |
| 109 | +type `bdr.column_timestamp`: |
| 110 | + |
| 111 | +```sql |
| 112 | +SELECT nc.nspname, c.relname |
| 113 | +FROM pg_attribute a |
| 114 | +JOIN (pg_class c JOIN pg_namespace nc ON c.relnamespace = nc.oid) |
| 115 | + ON a.attrelid = c.oid |
| 116 | +JOIN (pg_type t JOIN pg_namespace nt ON t.typnamespace = nt.oid) |
| 117 | + ON a.atttypid = t.oid |
| 118 | +WHERE NOT pg_is_other_temp_schema(nc.oid) |
| 119 | + AND nt.nspname = 'bdr' |
| 120 | + AND t.typname = 'column_timestamps' |
| 121 | + AND NOT a.attisdropped |
| 122 | + AND c.relkind IN ('r', 'v', 'f', 'p'); |
| 123 | +``` |
| 124 | + |
| 125 | +### bdr.column_timestamps_create |
| 126 | + |
| 127 | +This function creates column-level conflict resolution. This is called within |
| 128 | +`column_timestamp_enable`. |
| 129 | + |
| 130 | +#### Synopsis |
| 131 | + |
| 132 | +```sql |
| 133 | +bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) |
| 134 | +``` |
| 135 | + |
| 136 | +#### Parameters |
| 137 | + |
| 138 | +- `p_source` - The two options are 'current' or 'commit'. |
| 139 | +- `p_timestamp` - Timestamp is dependent on the source chosen: if 'commit', |
| 140 | + then TIMESTAMP_SOURCE_COMMIT; if 'current', then TIMESTAMP_SOURCE_CURRENT. |
| 141 | + |
| 142 | +## DDL Locking |
| 143 | + |
| 144 | +When enabling or disabling column timestamps on a table, the code uses |
| 145 | +DDL locking to ensure that there are no pending changes from before the |
| 146 | +switch, to ensure we only see conflicts with either timestamps in both |
| 147 | +tuples or neither of them. Otherwise, the code might unexpectedly see |
| 148 | +timestamps in the local tuple and NULL in the remote one. It also |
| 149 | +ensures that the changes are resolved the same way (column-level or |
| 150 | +row-level) on all nodes. |
| 151 | + |
| 152 | +## Current vs Commit Timestamp |
| 153 | + |
| 154 | +An important question is what timestamp to assign to modified columns. |
| 155 | + |
| 156 | +By default, the timestamp assigned to modified columns is the current |
| 157 | +timestamp, as if obtained from `clock_timestamp`. This is simple, and |
| 158 | +for many cases it is perfectly correct (e.g. when the conflicting rows |
| 159 | +modify non-overlapping subsets of columns). |
| 160 | + |
| 161 | +It may however have various unexpected effects: |
| 162 | + |
| 163 | +- The timestamp changes during statement execution, so if an `UPDATE` |
| 164 | + affects multiple rows, each will get a slightly different timestamp. |
| 165 | + This means that the effects of concurrent changes may get "mixed" in various |
| 166 | + ways (depending on how exactly the changes performed on different |
| 167 | + nodes interleave). |
| 168 | + |
| 169 | +- The timestamp is unrelated to the commit timestamp, and using it to |
| 170 | + resolve conflicts means that the result is not equivalent to the commit order, |
| 171 | + which means it likely is not serializable. |
| 172 | + |
| 173 | +Note: We may add statement and transaction timestamps in the future, |
| 174 | +which would address issues with mixing effects of concurrent statements or |
| 175 | +transactions. Still, neither of these options can ever produce results |
| 176 | +equivalent to commit order. |
| 177 | + |
| 178 | +It is possible to also use the actual commit timestamp, although this |
| 179 | +feature is currently considered experimental. To use the commit timestamp, |
| 180 | +set the last parameter to `true` when enabling column-level conflict |
| 181 | +resolution: |
| 182 | + |
| 183 | +```sql |
| 184 | +SELECT bdr.column_timestamps_enable('test_table'::regclass, 'cts', true); |
| 185 | +``` |
| 186 | + |
| 187 | +This can also be disabled using `bdr.column_timestamps_disable`. |
| 188 | + |
| 189 | +Commit timestamps currently have a couple of restrictions that are |
| 190 | +explained in the "Limitations" section. |
| 191 | + |
| 192 | +## Inspecting Column Timestamps |
| 193 | + |
| 194 | +The column storing timestamps for modified columns is maintained |
| 195 | +automatically by triggers, and must not be modified directly. It may |
| 196 | +be useful to inspect the current timestamps value, for example while |
| 197 | +investigating how a particular conflict was resolved. |
| 198 | + |
| 199 | +There are three functions for this purpose: |
| 200 | + |
| 201 | +- `bdr.column_timestamps_to_text(bdr.column_timestamps)` |
| 202 | + |
| 203 | + This function returns a human-readable representation of the timestamp mapping, and |
| 204 | + is used when casting the value to `text`: |
| 205 | + |
| 206 | +```sql |
| 207 | +db=# select cts::text from test_table; |
| 208 | + cts |
| 209 | +----------------------------------------------------------------------------------------------------- |
| 210 | + {source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} |
| 211 | +(1 row) |
| 212 | + |
| 213 | +``` |
| 214 | + |
| 215 | +- `bdr.column_timestamps_to_jsonb(bdr.column_timestamps)` |
| 216 | + |
| 217 | + This function turns a JSONB representation of the timestamps mapping, and is used |
| 218 | + when casting the value to `jsonb`: |
| 219 | + |
| 220 | +```sql |
| 221 | +db=# select jsonb_pretty(cts::jsonb) from test_table; |
| 222 | + jsonb_pretty |
| 223 | +--------------------------------------------------- |
| 224 | + { + |
| 225 | + "map": { + |
| 226 | + "2": "2018-09-23T19:24:52.118583+02:00" + |
| 227 | + }, + |
| 228 | + "source": "current", + |
| 229 | + "default": "2018-09-23T19:24:52.118583+02:00"+ |
| 230 | + } |
| 231 | +(1 row) |
| 232 | +``` |
| 233 | + |
| 234 | +- `bdr.column_timestamps_resolve(bdr.column_timestamps, xid)` |
| 235 | + |
| 236 | + This function updates the mapping with the commit timestamp for the attributes modified |
| 237 | + by the most recent transaction (if it already committed). This only |
| 238 | + matters when using the commit timestamp. For example in this case, the last |
| 239 | + transaction updated the second attribute (with `attnum = 2`): |
| 240 | + |
| 241 | +```sql |
| 242 | +test=# select cts::jsonb from test_table; |
| 243 | + cts |
| 244 | +---------------------------------------------------------------------------------------------------------------------------------------- |
| 245 | + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00", "modified": [2]} |
| 246 | +(1 row) |
| 247 | + |
| 248 | +db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; |
| 249 | + column_timestamps_resolve |
| 250 | +----------------------------------------------------------------------------------------------------------------------- |
| 251 | + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} |
| 252 | +(1 row) |
| 253 | +``` |
| 254 | + |
| 255 | +## Handling column conflicts using CRDT Data Types |
| 256 | + |
| 257 | +By default, column-level conflict resolution simply picks the value with |
| 258 | +a higher timestamp and discards the other one. It is however possible to |
| 259 | +reconcile the conflict in different (more elaborate) ways, for example |
| 260 | +using CRDT types that allow "merging" the conflicting values without |
| 261 | +discarding any information. |
| 262 | + |
| 263 | +While pglogical does not include any such data types, it allows adding |
| 264 | +them separately and registering them in a catalog `crdt_handlers`. Aside |
| 265 | +from the usual data type functions (input/output, ...) each CRDT type |
| 266 | +has to implement a merge function, which takes exactly three arguments |
| 267 | +(local value, old remote value, new remote value) and produces a value |
| 268 | +merging information from those three values. |
| 269 | + |
| 270 | +## Limitations |
| 271 | + |
| 272 | +- The attributes modified by an `UPDATE` are determined by comparing the |
| 273 | + old and new row in a trigger. This means that if the attribute does |
| 274 | + not change a value, it will not be detected as modified even if it is |
| 275 | + explicitly set. For example, `UPDATE t SET a = a` will not mark `a` as |
| 276 | + modified for any row. Similarly, `UPDATE t SET a = 1` will not mark |
| 277 | + `a` as modified for rows that are already set to `1`. |
| 278 | + |
| 279 | +- For `INSERT` statements, we do not have any old row to compare the new |
| 280 | + one to, so we consider all attributes to be modified and assign them |
| 281 | + a new timestamp. This applies even for columns that were not included |
| 282 | + in the `INSERT` statement and received default values. We could detect |
| 283 | + which attributes have a default value, but it is not possible to decide if |
| 284 | + it was included automatically or specified explicitly by the user. |
| 285 | + |
| 286 | + This effectively means column-level conflict resolution does not work |
| 287 | + for `INSERT-INSERT` conflicts (even if the `INSERT` statements specify |
| 288 | + different subsets of columns, because the newer row will have all |
| 289 | + timestamps newer than the older one). |
| 290 | + |
| 291 | +- By treating the columns independently, it is easy to violate constraints |
| 292 | + in a way that would not be possible when all changes happen on the same |
| 293 | + node. Consider for example a table like this: |
| 294 | + |
| 295 | +```sql |
| 296 | +CREATE TABLE t (id INT PRIMARY KEY, a INT, b INT, CHECK (a > b)); |
| 297 | +INSERT INTO t VALUES (1, 1000, 1); |
| 298 | +``` |
| 299 | + |
| 300 | +...and assume one node does: |
| 301 | + |
| 302 | +```sql |
| 303 | +UPDATE t SET a = 100; |
| 304 | +``` |
| 305 | + |
| 306 | +...while another node does concurrently: |
| 307 | + |
| 308 | +```sql |
| 309 | +UPDATE t SET b = 500; |
| 310 | +``` |
| 311 | + |
| 312 | + Each of those updates is valid when executed on the initial row, and |
| 313 | + so will pass on each node. But when replicating to the other node, |
| 314 | + the resulting row violates the `CHECK (A > b)` constraint, and the |
| 315 | + replication will stop until the issue is resolved manually. |
| 316 | + |
| 317 | +- The column storing timestamp mapping is managed automatically. Do not |
| 318 | + specify or override the value in your queries, as it may result in |
| 319 | + unpredictable effects (we do ignore the value where possible anyway). |
| 320 | + |
| 321 | +- The timestamp mapping is maintained by triggers, but the order in which |
| 322 | + triggers execute does matter. So if you have custom triggers that modify |
| 323 | + tuples and are executed after the `pgl_clcd_` triggers, the modified |
| 324 | + columns will not be detected correctly. |
| 325 | + |
| 326 | +- When using regular timestamps to order changes/commits, it is possible |
| 327 | + that the conflicting changes have exactly the same timestamp (because |
| 328 | + two or more nodes happened to generate the same timestamp). This risk |
| 329 | + is not unique to column-level conflict resolution, as it may happen |
| 330 | + even for regular row-level conflict resolution, and we use node id as a |
| 331 | + tie-breaker in this situation (the higher node id wins), which ensures that |
| 332 | + same changes are applied on all nodes. |
| 333 | + |
| 334 | +- It is possible that there is a clock skew between different nodes. While it |
| 335 | + may induce somewhat unexpected behavior (discarding seemingly newer |
| 336 | + changes because the timestamps are inverted), clock skew between nodes can |
| 337 | + be managed using the parameters `bdr.maximum_clock_skew` and |
| 338 | + `bdr.maximum_clock_skew_action`. |
| 339 | + |
| 340 | +- The underlying pglogical subscription must not discard any changes, |
| 341 | + which could easily cause divergent errors (particularly for |
| 342 | + CRDT types). The subscriptions must have `ignore_redundant_updates` |
| 343 | + set to false (which is the default). |
| 344 | + |
| 345 | + Existing groups created with non-default value for `ignore_redundant_updates` |
| 346 | + can be altered like this: |
| 347 | + |
| 348 | +```sql |
| 349 | +SELECT bdr.alter_node_group_config('group', ignore_redundant_updates := false); |
| 350 | +``` |
0 commit comments