-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs for accessing cogs with terra #125
base: staging
Are you sure you want to change the base?
Add docs for accessing cogs with terra #125
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-01-14T00:59:03Z Be clear the NO SIGN is particular to this dataset that's in a public bucket. Should we also add
|
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-01-14T00:59:04Z this is an interesting approach, nice because it operates with only 1 file, but a little weird at the same time in terms of arbitrarily calculating some neighboring extents based on the image.
I wonder if it's better to have a known bbox and two images to pull partials. I don't really want to complicate it by adding STAC (which we have some example to borrow from)
Maybe this would make sense with a plot of the full extent that has a box from the crop extents to be used. |
I agree the 1/100th square is pretty arbitrary. Perhaps it would make sense to build out and draw geojson for the extents to grab and the geotiff tiles to motivate the role of tiling in reducing necessary requests? |
I also realized that this example only shows AWS, we probably need a note that this method is valid for all "vsi" methods in gdal, and that the |
I included a little note about VSI and a reference to the docs that explain such filesystems. I've also replaced the 1/100th (arbitrary) read with a couple of reads that specifically articulate the impact of internal tile structure on how (and how many) requests get made |
So far, there's no |
View / edit / reply to this conversation on ReviewNB zacdezgeo commented on 2025-01-15T16:17:26Z maybe we can link the https://guide.cloudnativegeo.org/ in the introduction? |
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:17Z I think in the very first sentence you should say what COGs are (a file format for storing raster data with a 2D aligned grid?)
The key benefits section feels like some double counting. I think you should remove "Subset Access" and "Cloud-Native Efficiency" those feel like more the mechanisms that enable "Cost Savings" and "Fast Workflows". Also don't COGs enable pyramiding? That feels like it could be mentioned somehow as well. moradology commented on 2025-01-15T19:55:36Z Good call. I even explicitly use overviews later on so I should probably acknowledge that |
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:18Z This is kind of a meta comment, but I tend to glaze over inline comments in notebooks. Maybe just put these each in their own cell with markdown cells describing what is happening.
moradology commented on 2025-01-15T19:54:49Z Wondered about that. They're so nice for getting the information right up next to the place where it has context but I think I do the same... |
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:18Z The tile information is block size and layout right? Statistics are block level right? Might be useful to make that explicit.
|
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:19Z what are these axes? I guess they are meters in EPSG 9807 or something, but good to make it clear |
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:20Z Just a question I had in reading this: can you read the header for the COG without downloading the file?
Also: is there a way to prove that only one tile was downloaded? wildintellect commented on 2025-01-15T21:52:42Z
moradology commented on 2025-01-22T16:17:51Z In the python docs, I have some custom logging that demonstrates the byte-ranges being read. Unfortunately, not so easily done with terra's use of GDAL |
View / edit / reply to this conversation on ReviewNB jsignell commented on 2025-01-15T18:14:21Z Line #6. block_size_y <- block_size_pixels * r_res[2] Do you think it would be clearer to just give a bounding box without showing how it is created? If we think about a real world scenario it doesn't matter that people know how to create the box. Actually maybe the whole code cell here should just be hidden since the visual serves a didactic purpose. Same with the other plot that shows gridlines. moradology commented on 2025-01-15T20:06:31Z I think I disagree on this point because it might be good to show how these APIs for pixel and geospatial coordinates work. I'll change things over if your take here is seconded though - I can totally see your point wildintellect commented on 2025-01-16T00:10:50Z the question is who is the target audience? I think the initial target is data users who almost always just need a bbox (or polygon) of data. I'm ok hiding some of that magic from them, maybe a collapsed cell block is the right answer. |
Wondered about that. They're so nice for getting the information right up next to the place where it has context but I think I do the same... View entire conversation on ReviewNB |
Good call. I even explicitly use overviews later on so I should probably acknowledge that View entire conversation on ReviewNB |
I think I disagree on this point because it might be good to show how these APIs for pixel and geospatial coordinates work. I'll change things over if your take here is seconded though - I can totally see your point View entire conversation on ReviewNB |
View entire conversation on ReviewNB |
the question is who is the target audience? I think the initial target is data users who almost always just need a bbox (or polygon) of data. I'm ok hiding some of that magic from them, maybe a collapsed cell block is the right answer. View entire conversation on ReviewNB |
In the python docs, I have some custom logging that demonstrates the byte-ranges being read. Unfortunately, not so easily done with terra's use of GDAL View entire conversation on ReviewNB |
Additional thought: is it worth adding/possible to add a TL;DR section for this notebook similar to the Python version? It might also be worth linking and mentioning the Python version in this notebook and vice versa. Other than that I don't think I have any additional comments - I really like the detail and explanations in both notebooks! |
View / edit / reply to this conversation on ReviewNB wildintellect commented on 2025-02-25T23:09:21Z Link to STARS documentation https://r-spatial.github.io/stars/ |
@@ -0,0 +1,526 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase this:
>In this notebook, we will explore how terra
interacts with COGs and how aligning data requests with internal tile boundaries improves performance. While it is generally unnecessary to think about tiling explicitly, understanding how COGs structure data can help in designing efficient workflows.
to:
>In this notebook, we will explore how terra
interacts with COGs. While it is generally unnecessary to think about tiling explicitly, this notebook aligns data requests with internal tile boundaries to improve performance. Understanding how COGs structure data can help in designing efficient workflows.
Reply via ReviewNB
@@ -0,0 +1,526 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's note this is equivalent to using a random bounding box that intersects the tile.
Reply via ReviewNB
@@ -0,0 +1,526 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,526 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if specifying driver makes it faster (this is the case with geopandas and sf)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in the case of geopandas
, the default is fiona
backing vs pyogrio
. In the case of terra
, it appears as though the GDAL wrapping is handled directly in terra source (https://github.com/rspatial/terra/blob/master/src/gdalio.cpp) and so probably doesn't admit of any speedup of that sort. I also couldn't find anything in the documentation to suggest it, so I'm thinking there's no speed up to be had in this fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? pyogrio should be default in recent geopandas - but I'm not talking about fiona vs pyogrio. I'm talking about the step where GDAL guesses the format. If you pass a driver it doesn't have to guess.
@@ -0,0 +1,526 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should switch to either using terra::describe or sf::gdal_utils("info") which are R installable and do not rely on a system install of gdal
Reply via ReviewNB
This PR adds some documentation about accessing data within cogs in R with
terra
without loading the whole file. Some (as yet unclear) combination of terra and the r kernel's behaviors prevented the kind of byte-range logging in #124, so this short guide sticks to demonstrating the API and describing its cloud-optimized behaviors