Protocol question: multiple logical files pointing to the same data file path #4021
Replies: 4 comments 15 replies
-
I tough the same, but protocol forbids it: “it is illegal for the same path to be occur twice with different dvIds within each set of add or remove actions.” |
Beta Was this translation helpful? Give feedback.
-
Yes, this is legal, though likely not desirable except in cases where the DVs are different+non-overlapping. The reason there are constraints like this across commits, is so that metadata-only updates are possible. So it's actually even legal to have:
However, the total here is not 10, it's 5, because those are the same files, so snapshot replay should pick the latter one (from a data perspective it doesn't even matter which one is picked, they point to the same data...but they could have different metadata, so query execution might end up looking different depending on which one is included in a snapshot). |
Beta Was this translation helpful? Give feedback.
-
@larsk-db |
Beta Was this translation helpful? Give feedback.
-
I have a delta table with 2 versions:
Add txn: path = "a.parquet" numRecords = 10 deletionVector = null
Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)
Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.
From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table that when queried, should return 18 rows.
Could you please confirm my understanding?
Beta Was this translation helpful? Give feedback.
All reactions