-
Notifications
You must be signed in to change notification settings - Fork 22
Geopackage reader #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geopackage reader #174
Conversation
|
Hmm. Interesting. Is it finished? Is there are anything that I could help with? |
No this will be closed most likely, but I may rescope this for a WKB Geometry implementation to better handle the well-known geometry (probably will close this still and open a new PR to avoid confusion.) I also may spin off my work on this to contribute to the JuliaGeo/GeoPackage.jl there. |
It does work for test cases except for e2e test and haven't implemented checks for empty geopackage, But there is a something missing that is causing invalid geopackage files in the writer |
|
@jph6366 thanks for working on this. Your PR evolved a lot since the last time I saw it. I'd be happy to provide additional feedback if you plan to keep working on it. The first thing I would suggest here is to leave the ArchGDAL.jl code intact for now. We can work on the GPKG backend without removing the GDAL fallback. After the PR is merged, I can take care of deprecating the fallback and removing it from the codebase. |
@juliohm I would greatly appreciate feedback. I'm two weeks into my Data Science program and I have been practicing and reading and learning a lot more about Julia. Your feedback would be immensely valuable!
Yes I should have left this out of the PR. I actually couldn't get my ArchGDAL to precompile on my Ubuntu Desktop, but I have a fresh new university laptop that precompiled all the dependencies yesterday without trouble (on Windows). I actually was planning on doing a rewrite this weekend (and I have a holiday Monday as well), so you're reply is well-timed! |
|
That is really good to hear, @jph6366! 🙂 I will share a first review tonight (BRT time zone) or tomorrow. |
juliohm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jph6366 attached you can find a first round of review. There are many improvements to make before we can continue reviewing the code. Below are some general comments:
- Please remove the geopackage.org file from the PR
- Please undo the changes related to ArchGDAL.jl
- Try to avoid underscores in variable names
- Try to avoid type annotations
Will comb through for various edges cases (empty, 3D, etc.) once the writer is gpkg spec-compliant implementation
src/extra/gpkg/read.jl
Outdated
| # Licensed under the MIT License. See LICENSE in the project root. | ||
| # ------------------------------------------------------------------ | ||
|
|
||
| const DBInterface = SQLite.DBInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be in gpkg.jl or in GeoIO.jl (under import SQLite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add DBInterface.jl to Project.toml with an appropriate compat entry. We can then add an explicit import in GeoIO.jl
Update 09/09/2025: Next commit (in maybe a few days) will contain some code for The status for the gpkg writer is still not compliant (enough) to pass the E2E test. It writes valid Standard GeoPackage Binary Header, but it does not handle the WKB Geometry encoding correctly. Still no cases for empty GeoPackage contents and need to test on Geometries with Z values Currently testing using a GADM gpkg file for the US Virgin Islands (a geopackage with 3 tables all with MultiPolygon types) until the writer begins to create valid, readable geopackage files that seamlessly work with other GIS softwares (w/ gpkg support). So I imagine the writer also may be incorrect for other more simple features. Once the status of the GPKG I/O implementation is complete, then I will consider adding functionality for gpkg_extensions table, R-Tree spatial indexes (write-only), and then (aspatial) attributes. There is also the consideration for non-linear geometry type support and/or creation of custom WKB geometry types/format |
|
Thank you for the update @jph6366 ! I am traveling with limited internet connection, but will try to share more review comments when I find WiFi. |
…d import DBInterface
juliohm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions.
src/extra/gpkg/read.jl
Outdated
| # query SQLite.Table given *.gpkg filename; optionally specify quantity of tables and features query | ||
| # default behavior for select statement is limit result to 1 feature geometry row and corresponding attributes row | ||
| # limited to only 1 feature table to select geopackage binary geometry from. | ||
| function gpkgread(fname; ntables=1, nfeatures=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to follow the convention adopted with the layer option. We don't want to support loading multiple tables in a single call. This is bad design of these file formats and specs.
| geomindex = findfirst(==(Symbol(geomcolumn)), keys(row)) | ||
| values = map(keys(row)[[begin:(geomindex - 1); (geomindex + 1):end]]) do key | ||
| key, getproperty(row, key) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this code be simplified in terms of the named tuple snippet I shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long we test against duplicate fieldnames then I don't think we can simplfy this to check for the Symbol(geomcolumn)
src/extra/gpkg/read.jl
Outdated
| headerlen = iszero(envelope) ? 8 : 8 + 8 * 2 * (envelope + 1) | ||
|
|
||
| # Skip reading the double[] envelope and start reading Well-Known Binary geometry | ||
| seek(buff, headerlen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you called read a couple of times to consume the IOBuffer. Aren't you counting the same bytes twice when you add them up in headerlen and then call seek?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am. I'm swapping to using skip and starting from byte[4] given headerlen
src/extra/gpkg/wkb.jl
Outdated
| if _haszextent(wkbtypebits) | ||
| wkbtype = iszero(wkbtypebits & 0x80000000) ? wkbtypebits - 1000 : wkbtypebits & 0x7FFFFFFF | ||
| wkb2single(buff, crs, wkbtype, true, wkbbswap) | ||
| else | ||
| wkb2single(buff, crs, wkbtypebits, false, wkbbswap) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please check if the spec allows mixing geometries with and without z coordinates on the same multi-geometry? We don't want to support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this I don't think this would happen, this is me overfitting the problem because its technically possible
src/extra/gpkg/read.jl
Outdated
| AND object_type IS NOT NULL | ||
| AND g.srs_id = srs.srs_id | ||
| AND g.srs_id = c.srs_id | ||
| AND g.z IN (0, 1, 2) | ||
| AND g.m = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify why we need to add these conditions here when the result is only used to retrieve metadata like srsid, org, tablename, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see Geometry Columns Table Requirements comments above or read reqs at https://www.geopackage.org/spec/#:~:text=2.1.5.1.2.%20Table%20Data%20Values
# Requirement 26: The srs_id value in a gpkg_geometry_columns table row
# SHALL be an srs_id column value from the gpkg_spatial_ref_sys table.
#
# Requirement 27: The z value in a gpkg_geometry_columns table row SHALL be one
# of 0, 1, or 2.
#
# Requirement 28: The m value in a gpkg_geometry_columns table row SHALL be one
# of 0, 1, or 2.
#
# Requirement 146: The srs_id value in a gpkg_geometry_columns table row
# SHALL match the srs_id column value from the corresponding row in the
# gpkg_contents table.
# According to https://www.geopackage.org/spec/#r16
# Values of the gpkg_contents table srs_id column
# SHALL reference values in the gpkg_spatial_ref_sys table srs_id column
# According to https://www.geopackage.org/spec/#r18
# The gpkg_contents table SHALL contain a row
# with a lowercase data_type column value of "features"
# for each vector features user data table or view.
I'll remove what I think we can get away with
src/extra/gpkg/read.jl
Outdated
| # gpkg_contents table. | ||
| function gpkgextract(db; layer=1) | ||
| # get the first (and only) feature table returned in sqlite query results | ||
| metadata = first( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the result only has a single entry, then it is appropriate to use only instead of first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too initially but only returns
SQLite.Row{false}:
:tablename missing
:geomcolumn missing
:srsid missing
:org missing
:orgcoordsysid missingeach row is only valid when called and for some reason only doesn't do it?
src/extra/gpkg/wkb.jl
Outdated
| # supports wkbGeometryType according to | ||
| # SFSQL 1.1 high-bit flag 0x80000000 (Z) | ||
| # SFSQL 1.2 offset of 1000 (Z) | ||
| sfsql11 = iszero(typebits & 0x80000000) | ||
| sfsql12 = typebits > 1000 | ||
| if !sfsql11 || sfsql12 | ||
| wkbtype = sfsql11 ? typebits - 1000 : typebits & 0x7FFFFFFF | ||
| zextent = true | ||
| else | ||
| wkbtype = typebits | ||
| zextent = false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this code in an attempt to understand what is going on. The logic is a bit confusing, could you please explain in simple words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfsql11 checks if there is no high-bit-flag in the wkbtype bits, so the negating that condition returns true if its a wkbtype is a extended WKB type
sfsql12 checks the wkbtype bits is a number greater than the standard range of wkbtype (1-7), specifically ISO WKB type which is (1001-1007)
we want to set the wkbtypeZ to a standard WKB wkbtype and store zextent to indicate the Z-dimension for return the correction dimensions in wkb2point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to accept EWKB or ISO flavoured WKB basically, but we prefer ISO for maximum compatiability.
src/extra/gpkg/wkb.jl
Outdated
| y = byteswap(read(buff, Float64)) | ||
| x = byteswap(read(buff, Float64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please double check this order? Are you aware that LatLon is "(y, x)" and the other CRS are "(x, y)"? Appreciate it if you could read the other opened PR with care to make sure that you are not introducing trivial bugs.
src/extra/gpkg/wkb.jl
Outdated
| x = byteswap(read(buff, Float64)) | ||
| if zextent | ||
| z = byteswap(read(buff, Float64)) | ||
| return x, y, z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly not correct. This function should only return valid Point objects. I really recommend reading the other PR where we already sorted out a clean function to read buffers into points.
| if srsid == 0 || srsid == 4326 | ||
| crs = isone(metadata.zextent) ? LatLonAlt{WGS84Latest} : LatLon{WGS84Latest} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this in the spec? I couldn't find from the link in the comment.
| geomindex = findfirst(==(Symbol(geomcolumn)), keys(row)) | ||
| values = map(keys(row)[[begin:(geomindex - 1); (geomindex + 1):end]]) do key | ||
| key, getproperty(row, key) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked a few times without response: can we use a simpler namedtuple syntax here? Shared an example in a previous comment...
| skiplen = iszero(envelope) ? 4 : 4 + 8 * 2 * (envelope + 1) | ||
|
|
||
| # Skip reading the double[] envelope and start reading Well-Known Binary geometry | ||
| skip(buff, skiplen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked a question here that was not addressed again...
src/extra/gpkg/read.jl
Outdated
| if iszero(srs_id) | ||
| crs = LatLon{WGS84Latest} | ||
| elseif isone(abs(srs_id)) | ||
| crs = zextent ? Cartesian{NoDatum,3} : Cartesian{NoDatum,2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(W/ BRANCH) 1.094199 seconds (628.18 k allocations: 31.932 MiB, 5.21% gc time, 99.20% compilation time)
(MERGED BRANCH) 1.068783 seconds (627.90 k allocations: 31.946 MiB, 5.72% gc time, 99.35% compilation time)
Tested on gpkg file with EPSG{27700}
Not sure if these metrics are entirely accurate (being ran in a jupyter nb) I will attempt set up a proper benchmark once I have the wkb.jl completed.
src/extra/gpkg/read.jl
Outdated
|
|
||
| # Requirement 6: PRAGMA integrity_check returns a single row with the value 'ok' | ||
| # Requirement 7: PRAGMA foreign_key_check (w/ no parameter value) returns an empty result set | ||
| if (DBInterface.execute(db, "PRAGMA integrity_check;") |> first |> only != "ok") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the returned result row iterator only supports a single-pass so we can't retrieve without using iterate(x) here.
first returns the
SQLite.Row{false}:
:integrity_check "ok"
only gets the only element in the collection which is "ok"
src/extra/gpkg/read.jl
Outdated
| # gpkg_contents table. | ||
| function gpkgextract(db; layer=1) | ||
| # get the first (and only) feature table returned in sqlite query results | ||
| metadata = first( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too initially but only returns
SQLite.Row{false}:
:tablename missing
:geomcolumn missing
:srsid missing
:org missing
:orgcoordsysid missingeach row is only valid when called and for some reason only doesn't do it?
src/extra/gpkg/wkb.jl
Outdated
| # supports wkbGeometryType according to | ||
| # SFSQL 1.1 high-bit flag 0x80000000 (Z) | ||
| # SFSQL 1.2 offset of 1000 (Z) | ||
| sfsql11 = iszero(typebits & 0x80000000) | ||
| sfsql12 = typebits > 1000 | ||
| if !sfsql11 || sfsql12 | ||
| wkbtype = sfsql11 ? typebits - 1000 : typebits & 0x7FFFFFFF | ||
| zextent = true | ||
| else | ||
| wkbtype = typebits | ||
| zextent = false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfsql11 checks if there is no high-bit-flag in the wkbtype bits, so the negating that condition returns true if its a wkbtype is a extended WKB type
sfsql12 checks the wkbtype bits is a number greater than the standard range of wkbtype (1-7), specifically ISO WKB type which is (1001-1007)
we want to set the wkbtypeZ to a standard WKB wkbtype and store zextent to indicate the Z-dimension for return the correction dimensions in wkb2point
src/extra/gpkg/wkb.jl
Outdated
| # supports wkbGeometryType according to | ||
| # SFSQL 1.1 high-bit flag 0x80000000 (Z) | ||
| # SFSQL 1.2 offset of 1000 (Z) | ||
| sfsql11 = iszero(typebits & 0x80000000) | ||
| sfsql12 = typebits > 1000 | ||
| if !sfsql11 || sfsql12 | ||
| wkbtype = sfsql11 ? typebits - 1000 : typebits & 0x7FFFFFFF | ||
| zextent = true | ||
| else | ||
| wkbtype = typebits | ||
| zextent = false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to accept EWKB or ISO flavoured WKB basically, but we prefer ISO for maximum compatiability.
| geomindex = findfirst(==(Symbol(geomcolumn)), keys(row)) | ||
| values = map(keys(row)[[begin:(geomindex - 1); (geomindex + 1):end]]) do key | ||
| key, getproperty(row, key) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long we test against duplicate fieldnames then I don't think we can simplfy this to check for the Symbol(geomcolumn)
src/extra/gpkg/read.jl
Outdated
| headerlen = iszero(envelope) ? 8 : 8 + 8 * 2 * (envelope + 1) | ||
|
|
||
| # Skip reading the double[] envelope and start reading Well-Known Binary geometry | ||
| seek(buff, headerlen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am. I'm swapping to using skip and starting from byte[4] given headerlen
Quick Summary
I am looking to add my contributions to help get this GeoIO feature merged in with compliance to the GPkg implementation and to be interoperable with GDALs reader and writer for GPkg also be able to seamlessly connect with Desktop QGIS & ArcGIS. I logged maybe ~25-30 hours into my contribution but it does need to be organized and divided into concise files, as well as more cleanup for optimizing to read more like native Julia code. There is also featured code that is outside of the scope of this feature request, such as "attributes" tables, and gpkg_extensions
Long-winded introduction
I am interetested in picking up Julia as a programming language so I began studying and reading the Julia documentation in June (2025) and practicing leetcode and advent of code problems to learn its quirks. I am also applying for my MS in Data Science, so I've become invested in the Geospatial Data Science with Julia (since I come from a professional geospatial software engineering background).
I also read up on SQLite's documentation, Julia Databases, and of course some of the JuliaEarth & JuliaGeo repositories and their source code. I wish I had discovered this community and Julia sooner!
With this, my ground work of rote memorization is done, so I can consciously answer questions about GPkg and relevant toolings.
You can view my monolith ORG file of all my notes here
https://jph6366.github.io/filing-cabinet/TOP-DOWN/Kanban/GeoPackage.html
Implementation Plan
load implementation consists of SQLite.jl simple SELECT statements that convert
results in a /geotable = georef(table, geoms)/ call to combine these two objects into a GeoTable
the save implementation consists of SQLite CREATE TABLE statements
Remarks
Walkthrough of Implementation
I don't use any sort of CoPilot or AI inside my editor, but I will use a LLM as a pair programmer or a domain-specific search engine. Often to generate code to explore alternative options and that is not usable code usually. This means there is no 'AI-generated' code in this PR.
https://github.com/OSGeo/gdal/tree/master/ogr/ogrsf_frmts/gpkg
reading the geopackage binary header
modeling my wkbGeometryTypes after GDAL
writing geometry from WKB functions
Optimized code to use Julia conventions such as IOBuffer and hoisting and making algorithmic changes
For writing the GPkg followed the sequence of operations from GDAL's gpkg Create function
also followed conventions mentioned in SQLite.jl to write concise code loading in Tables.jl compatible formats
basically just rewrote the original from_wkb (from reader) functions into to_wkb functions
Warmup
parse bytes and SQLite database header
execute PRAGMA integrity_check and foreign_key_check
read gpkg_spatial_ref_sys
shall contain at minimum
extension for WKT for Coordinate Reference Systems
read gpkg_contents
read gpkg_geometry_column
GEOMETRY
subtypes are
subtypes are
subtype is
subtype is
subtypes are
subtype is
subtype is
MULTIPOLYGON
/Non-linear Geometry types not really commonly supported/
given /column_name/ and /table_name/ declared SQL type of the feature table geometry is specified by /geometry_type_name/ column also specifying the /srs_id/ of feature table
feature table geometries are stored in geopackagebinary format
read blob to infer header and geometry
read gpkg_extensions table
Happy Testing
a. R-Tree Spatial Indexes (write)
b. WKT for SRS (write)
simple features
non-spatial
Z-dimension Coordinate{{x,y},z}
non-trivial amount of vector features
UnHappy Testing
E2E Test
a. check if the GeoTables are equivalent
Reading