Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orfeo backscatter: preserve sparse partitioner #1018

Open
jdries opened this issue Jan 27, 2025 · 3 comments
Open

Orfeo backscatter: preserve sparse partitioner #1018

jdries opened this issue Jan 27, 2025 · 3 comments
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Jan 27, 2025

The V2 code path of orfeo backscatter results in an RDD with a default partitioner:

tile_layer = geopyspark.TiledRasterLayer.from_numpy_rdd(

As a result, point extraction jobs suffer from bad performance, which would not happen if sparse partitioners are available.

The V1 code path does have some support for sparse partitioners:

result = p.applySparseSpacetimePartitioner(tile_layer.srdd.rdd(),

@VictorVerhaert
Copy link

This might also explain why the point extractions for EUGW cost more than anticipated

@jdries jdries self-assigned this Jan 27, 2025
jdries added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue Jan 27, 2025
jdries added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue Jan 28, 2025
jdries added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue Jan 28, 2025
jdries added a commit to Open-EO/openeo-geotrellis-extensions that referenced this issue Jan 28, 2025
jdries added a commit that referenced this issue Jan 28, 2025
* sar_backscatter: try to preserve partitioner
#1018

* sar_backscatter: print partitioner in test, may need an assert
#1018

* sar_backscatter: print partitioner in test, may need an assert
#1018

* sar_backscatter: should now get partitioner from scala
#1018

* sar_backscatter: should now get partitioner from scala
#1018

* sar_backscatter: add assert on partitioner
#1018
@jdries
Copy link
Contributor Author

jdries commented Jan 29, 2025

The fix is available on staging. For the weed job, costs went from ~900 to ~100.

It is the aggregate temporal step for sentinel-1 which is now more effective:
BEFORE:
aggregate_temporal results in 53280 keys, using partitioner index: org.openeo.geotrelliscommon.package$SpaceTimeByMonthPartitioner$@3a993035 with bounds KeyBounds(SpaceTimeKey(0,0,1704067200000),SpaceTimeKey(36,39,1734739200000))

AFTER:

aggregate_temporal results in 2628 keys, using partitioner index: SparseSpaceTimePartitioner 1620 true with bounds KeyBounds(SpaceTimeKey(0,0,1704067200000),SpaceTimeKey(36,39,1734739200000))

This is probably caused by the fact that aggregate temporal will insert empty tiles where it expects data.

@VictorVerhaert
Copy link

So if I understand correctly this influences PG's that use sar_backscatter followed by an aggregate_temporal? (so not only point extractions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants