Skip to content

Commit

Permalink
[Protocol Change] Add the In-Commit Timestamps spec change proposal (#…
Browse files Browse the repository at this point in the history
…2599)

* add the in-commit timestamps spec change proposal

* add changes to the cdc section, specify timestamp as of algorithm that considers delta.inCommitTimestampEnablementTimestamp

* use long instead of timestamp, fix time travel section

* remove references to UTC

* fix table formatting and add positional info

* refine terminology around time

* add reference to DESCRIBE HISTORY, replace Unix time with Unix epoch

* update rfc proposal list
  • Loading branch information
dhruvarya-db authored Feb 8, 2024
1 parent e7959fe commit 5545f28
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 2 deletions.
3 changes: 1 addition & 2 deletions protocol_rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,

| Date proposed | RFC file | Github issue | RFC title |
|:-|:-|:-|:-|
| 2023-02-05 | [my_table_feature.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/my_table_feature.md) | https://github.com/delta-io/delta/issues/XXXX | My Table Feature | <!-- remove this when adding the first RFC -->
|...|||||
| 2023-02-02 | [in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |

### Accepted RFCs

Expand Down
63 changes: 63 additions & 0 deletions protocol_rfcs/in-commit-timestamps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# In-Commit Timestamps

This RFC proposes a new Writer table feature called In-Commit Timestamps. When enabled, commit metadata includes a monotonically increasing timestamp that allows for reliable TIMESTAMP AS OF time travel even if filesystem operations change a commit file's modification timestamp.

**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/2532**

--------


### Commit Provenance Information
> ***Change to existing section***
A delta file can optionally contain additional provenance information about what higher-level operation was being performed as well as who executed it.

Implementations are free to store any valid JSON [object literal](https://www.w3schools.com/js/js_json_objects.asp) as the `commitInfo` action <ins>unless some table feature (e.g. [In-Commit Timestamps](#in-commit-timestamps)) imposes additional requirements on the data</ins>.

<ins>When In-Commit Timestamp are enabled, writers are required to include a commitInfo action with every commit, which must include the `inCommitTimestamp` field.</ins>

#### Reader Requirements for AddCDCFile
> ***Change to existing section***
...
3. Change data readers should return the following extra columns:

Field Name | Data Type | Description
-|-|-
_commit_version|`Long`| The table version containing the change. This can be derived from the name of the Delta log file that contains actions.
_commit_timestamp|`Timestamp`| The timestamp associated when the commit was created. ~~This can be derived from the file modification time of the Delta log file that contains actions.~~ <ins>Depending on whether [In-Commit Timestamps](#in-commit-timestamps) are enabled, this is derived from either the `inCommitTimestamp` field of the `commitInfo` action of the version's Delta log, or from the Delta log's file modification time.</ins>

# In-Commit Timestamps
> ***New Section after the [Clustered Table](#clustered-table) section***
The In-Commit Timestamps writer feature strongly associates a monotonically increasing timestamp with each commit by storing it in the commit's metadata.

Enablement:
- The table must be on Writer Version 7.
- The feature `inCommitTimestamps` must exist in the table `protocol`'s `writerFeatures`.
- The table property `delta.enableInCommitTimestamps` must be set to `true`.

## Writer Requirements for In-Commit Timestamps

When In-Commit Timestamps is enabled, then:
1. Writers must write the `commitInfo` (see [Commit Provenance Information](#commit-provenance-information)) action in the commit.
2. The `commitInfo` action must be the first action in the commit.
3. The `commitInfo` action must include a field named `inCommitTimestamp`, of type `long` (see [Primitive Types](#primitive-types)), which represents the time (in milliseconds since the Unix epoch) when the commit is considered to have succeeded. It is the larger of two values:
- The time, in milliseconds since the Unix epoch, at which the writer attempted the commit
- One millisecond later than the previous commit's `inCommitTimestamp`
4. If the table has commits from a period when this feature was not enabled, provenance information around when this feature was enabled must be tracked in table properties:
- The property `delta.inCommitTimestampEnablementVersion` must be used to track the version of the table when this feature was enabled.
- The property `delta.inCommitTimestampEnablementTimestamp` must be the same as the `inCommitTimestamp` of the commit when this feature was enabled.
5. The `inCommitTimestamp` of the commit that enables this feature must be greater than the file modification time of the immediately preceding commit.

## Recommendations for Readers of Tables with In-Commit Timestamps

For tables with In-Commit timestamps enabled, readers should use the `inCommitTimestamp` as the commit timestamp for operations like time travel and [`DESCRIBE HISTORY`](https://docs.delta.io/latest/delta-utility.html#retrieve-delta-table-history).
If a table has commits from a period before In-Commit timestamps were enabled, the table properties `delta.inCommitTimestampEnablementVersion` and `delta.inCommitTimestampEnablementTimestamp` would be set and can be used to identify commits that don't have `inCommitTimestamp`.
To correctly determine the commit timestamp for these tables, readers can use the following rules:
1. For commits with version >= `delta.inCommitTimestampEnablementVersion`, readers should use the `inCommitTimestamp` field of the `commitInfo` action.
2. For commits with version < `delta.inCommitTimestampEnablementVersion`, readers should use the file modification timestamp.

Furthermore, when attempting timestamp-based time travel where table state must be fetched as of `timestamp X`, readers should use the following rules:
1. If `timestamp X` >= `delta.inCommitTimestampEnablementTimestamp`, only table versions >= `delta.inCommitTimestampEnablementVersion` should be considered for the query.
2. Otherwise, only table versions less than `delta.inCommitTimestampEnablementVersion` should be considered for the query.

0 comments on commit 5545f28

Please sign in to comment.