Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: Support geo type #10981
Spec: Support geo type #10981
Changes from all commits
3f11f00
b7a49a4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
Custom CRS
mean ? Is it any non-default CRS value ?Can values of CRS be of format format
$authorithy:$identifier
similar to that of default value ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
$authority:$identifier
is the intent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some discrepency in the description here. CRS values are mentioned to be specified in two format:
The default value of
OGC:CRS84
doesn't follow the pattern. IfOGC:CRS84
is the identifier itself, then should be mentioend assrid:OGC:CRS84
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@redblackcoder
OGC:CRS84
is not an identifier. It is corresponding to the following projjson string: https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-detailsIt is also equivalent to srid:4326 but flip the axis order to lon/lat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't SRID database specific? From the open specification, it is like following:
SRID
,AUTH_NAME
,AUTH_SRID
,SRTEXT
OGC:CRS84
corresponds toAUTH_NAME
=OGC
andAUTH_SRID
=CRS84
Not sure what does SRID:4326 mean from Auth_Name and Auth_Srid perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
SRID:4326
meanEPSG:4326
? Is it common to assume EPSG as the default authority for the codes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly. SRID usually means "the primary key in the context of those data". In a spatial database, this is the primary key of the
spatial_ref_sys
table. Theoretically, this is independent of EPSG codes. However, it is a common practice to use the same numerical values for convenience.For mapping SRID to EPSG code in a spatial database, we need to look at the other columns. If and only if the value of
AUTH_NAME
isEPSG
, then the value ofAUTH_SRID
is the EPSG code. But we could also have theIAU
value in the authority column, in which case the authority code is for some astronomical body (Moon, Mars, Jupiter, etc.).Note that even when
SRID
= 4326,AUTH_NAME
= "EPSG" andAUTH_SRID
= 4326, we do not really have "SRID:4326" = "EPSG:4326" because the authoritative definition of the latter has (latitude, longitude) axis order. Any other axis order is not EPSG:4326, but rather some derivative of EPSG:4326. Therefore,SRID:4326
should rather be understood as "related to EPSG:4326" instead of "same as EPSG:4326".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
In absense of any spec around
spatial_ref_sys
table for iceberg, should the spec stick withAUTH_NAME:AUTH_SRID
as the identifier when usingsrid
type as the custom CRS? Currently, it is skipping theAUTH_NAME
from the identifier and assumingAUTH_SRID
to be unique across all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiayuasu Why is there a need to flip the axis order? According to projjson spec you linked:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefankandic In Iceberg Geo spec (as well as Parquet Geo spec), we explicitly enforce the x/y axis order to be
lon/lat
order and discard the axis order specified in the CRS. This is a common practice in the industry including Apache Sedona, GeoPandas, GeoArrow, ... This is to make sure that the tools can always have some basic operations on the the x/y values without looking into the CRS. Parsing CRS is a challenging task.With that in mind, OGC:CRS84 fits better as a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it may be safer to only said that the default is OGC:CRS84, without saying that it is equivalent to EPSG:4326 with axis order overridden. Overriding the axis order is deprecated by OGC directive 14 because it is ambiguous. It works for the most common CRS, but is an impediment to interoperability with less common CRS such as South-orientated or polar projections. The practice of overriding axis order in a specification was established in old standards such as GeoTIFF and Simple Features. Then, it became a self-sustaining problem, with some (fortunately not all) newer standards citing Simple Features as a precedent, despite OGC directive 14 explicitly asking us to stop doing so.
Note that the problem is not the desire to have (x, y) axis order. The problem is in the way to achieve that goal, when that order is enforced by the specification. OGC directive 14 proposes a choice between 3 other ways to obtain the same result without the ambiguity of the "specification overrides authoritative definition" approach.
In the case of OGC:CRS84, that definition is non-ambiguous regarding axis order, so it is okay as a default value. OGC:CRS84 is ambiguous regarding the datum however, which means that (longitude, latitude) coordinates in OGC:CRS84 have an uncertainty of about 2 meters. I think this uncertainty would have been worth a note, and I think that the sentence "EPSG:4326 and OGC:CRS84 are equivalent with respect to this specification" should be omitted in order to be more quiet about this deprecated practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do
lower_bounds
andupper_bounds
mean the bounding box for the geospatial data points in the file? Can it be expanded to explain that bounds are based on a bounding box and these values are corners of the bounding box. Maybe this can be added to the Appendix.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, this table covers arbitrary
geometry
andgeography
objects. I think it is fine to use WKT here and not have a separate way to store points, but we should be aware that the implication is that WKT is required for any JSON representation.