Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Add implementation note on current-snapshot-id #12334

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1754,6 +1754,14 @@ Snapshot summary can include metrics fields to track numeric stats of the snapsh
| **`engine-name`** | "spark" | Name of the engine that created the snapshot |
| **`engine-version`** | "3.5.4" | Version of the engine that created the snapshot |

### Assignment of Snapshot IDs and `current-snapshot-id`

Writers should produce positive values for snapshot ids in a manner that minimizes the probability of id collisions and should verify the id does not conflict with existing snapshots. Producing snapshot ids based on timestamps alone is not recommended as it increases the potential for collisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is generating a positive value a requirement here? It seems that it might be more of than an implementation note that "-1" should be reserved because it has special meaning prior to V3?

Copy link
Contributor

@danielcweeks danielcweeks Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in the spec that that explicitly prohibits negative values, but the approach by the reference implementation intentionally avoids negative snapshot ids. There is commentary on this in #57 and #2506, so I think there is an expectation even if not explicitly called out.

It's not entirely clear why other than negative snapshots is somewhat awkward. Originally, it was just System.currentTimeMillis() (which resulted in collisions), so I think just historically positive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I think -1 can't just be an implementation detail with a specific version modifier saying it is ok to produce. if we don't think negative numbers break anything then I think it seems fine to leave as an implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember it ever being System.currentTimeMillis. Was it really?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the original was just current millis, but it had issues (particularly with tests having collisions).


The reference implementation uses a type 4 uuid and XORs the 4 most significant bytes with the 4 least significant bytes then ANDs with the maximum long value to arrive at a pseudo-random snapshot id with a low probability of collision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The reference implementation uses a type 4 uuid and XORs the 4 most significant bytes with the 4 least significant bytes then ANDs with the maximum long value to arrive at a pseudo-random snapshot id with a low probability of collision.
The reference Java implementation uses a type 4 uuid and XORs the 4 most significant bytes with the 4 least significant bytes then ANDs with the maximum long value to arrive at a pseudo-random snapshot id with a low probability of collision.


The reference Java implementation writes `-1` for "no current snapshot" with V1 and V2 tables and considers this equivalent to omitted or `null`. This has never been formalized in the spec but for compatibility, other implementations can accept `-1` as `null`. Java will no longer write `-1` and will use `null` for "no current snapshot" for all tables with a version greater than or equal to V3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Still have the confusion as #12334 (comment) , are these two sentence referring to the same implementations? To me, the language suggest they are separate ones ?


## Appendix G: Geospatial Notes

The Geometry and Geography class hierarchy and its Well-known text (WKT) and Well-known binary (WKB) serializations (ISO supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from [OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/).
Expand Down