Skip to content

Discussions towards better stability of core pieces of geoarrow-rs #1018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kylebarron opened this issue Apr 3, 2025 · 3 comments
Closed

Comments

@kylebarron
Copy link
Member

kylebarron commented Apr 3, 2025

From #1015 (comment)

I think there are a few problems with geoarrow-rs:

  • there's a wide variety of code at different states of production-readiness
  • Partly due to me trying to do too much, struggling a bit with how best to model GeoArrow in general, and also learning Rust through this whole process, there's a whole lot of code that is decidedly not production ready.
  • Because it's all one Rust crate geoarrow, there's no clear lines between what is (closer to) production-ready and what is not.

I think a way to break through this impasse is to select relatively small, well-defined subsets of GeoArrow functionality and break them into subcrates. For one, this forces more thought about public APIs because across crates you can't access any pub(crate) attributes. It lets us more clearly document which subsets we expect to be more stable and tested. And external users like yourself can start to build on only those pieces without even bringing in the dependencies for the full geoarrow crate.

In a spectrum of more stable to less stable

  • Core types conforming to the spec, like what is now in geoarrow-schema
  • "primitive" Array layouts like Point/LineString etc
  • "complex" Array layouts like Geometry and GeometryCollection
  • Array builders
  • Conversions between GeoArrow memory and geo, WKB, and WKT
  • Reading/writing Parquet
  • Reading/writing FlatGeobuf
  • Chunked arrays (should maybe remove)
  • Table concept (should probably remove)
  • Conversions between GeoArrow memory and geos
  • Geometry operations using geo
  • Casting
  • Geometry operations using geos
  • Reading/writing other geo formats
  • Reading/writing to PostGIS

Is there a well-defined subset of this project that you think you would use if it were more stable? Is there a piece that you're interested in that we could work on together to make stable?

Originally posted by @kylebarron in #1015 (comment)

cc @paleolimbot

@kylebarron kylebarron changed the title Discussions towards better stability of core pieces of GeoArrow Discussions towards better stability of core pieces of geoarrow-rs Apr 3, 2025
@paleolimbot
Copy link
Contributor

First, the whole geoarrow crate (and the ecosystem adoption it's largely behind) is awesome and any of my gripes should be taken with a grain of salt. The least useful thing I'll say is that all of these things are things that eventually should be enabled!

I think the absolutely essential bits are geoarrow-schema (mostly done!), iterate over by geo-traits (pretty sure this is somewhere), and build by buffer + validate (substantially easier than an arbitrary builder, I think).

I'm definitely happy to contribute some of these pieces although I'm not exactly sure of the timeline. I'm always happy to review, though!

@paleolimbot
Copy link
Contributor

One thing that may be worth considering is building the pieces up (e.g., geoarrow-schema, geoarrow-array, etc.) without refactoring geoarrow as you go. That would allow breaking changes if they're needed to scale back the scope and perhaps be a bit more fun (but maybe it's not bad to refactor as we go!)

@kylebarron
Copy link
Member Author

As described in #1097, the old geoarrow crate is being refactored into a monorepo of smaller crates.

I think this issue can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants