Skip to content

Commit 07acaf2

Browse files
committed
Add some notes
1 parent e82c3e7 commit 07acaf2

File tree

1 file changed

+121
-0
lines changed

1 file changed

+121
-0
lines changed

design_notes/raster_index.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Design for a RasterIndex
2+
3+
## TL;DR
4+
5+
1. We propose designing a RasterIndex that can handle many ways of expressing a raster → model space transformation.
6+
2. We propose that this RasterIndex _record_ the information for this transformation (e.g. `GeoTransform`) *internally* and remove that information from the dataset when constructed.
7+
3. Since the information is recorded internally, a user must intentionally write the transformation back to the dataset, destroying the index while doing so, and then write to disk.
8+
9+
## Goals:
10+
11+
### UX goals:
12+
13+
1. Make it easy to read a GeoTIFF with CRS and raster -> model space transformation information in to Xarray with appropriate indexes. There are at least two indexes: one associated with the CRS; and one with the transformation.
14+
2. The raster ↔ model transformation information can be ambiguous, so an explicit API should be provided.
15+
1. [RPCs](http://geotiff.maptools.org/rpc_prop.html):
16+
> The RPC model in a GeoTIFF file is supplementary to all other GeoTIFF tags and not directly related. That is, it is possible to have a conventional set of GeoTIFF tags (such as a tiepoint + pixel scale + projected coordinate system description) along with the RPCCoefficientTag. The RPCCoefficientTag is always describing a transformation to WGS84, regardless of what geographic coordinate system might be described in the coordinate system description tags of the GeoTIFF file. It is also possible to have only the RPCCoefficientTag tag and no other GeoTIFF tags.
17+
2. [GeoTransform is not in the GeoTIFF standard](https://docs.ogc.org/is/19-008r4/19-008r4.html). Instead that uses ModelTiepointTag, ModelPixelScaleTag, ModelTransformationTag.
18+
3. [GCPs are ambiguous](https://gdal.org/en/stable/user/raster_data_model.html#gcps):
19+
> The GDAL data model does not imply a transformation mechanism that must be generated from the GCPs … this is left to the application. However 1st to 5th order polynomials are common.
20+
4. [And for extra fun](https://gdal.org/en/stable/user/raster_data_model.html#gcps):
21+
> Normally a dataset will contain either an affine geotransform, GCPs or neither. It is uncommon to have both, and it is undefined which is authoritative.
22+
23+
### Index design goals:
24+
25+
1. The CRS Index allows us to assert CRS compliance during alignment. This is provided by XProj.
26+
2. The transform index should allow us to:
27+
1. Do accurate alignment in pixel space unaffected by floating-point inaccuracies in model-space;
28+
2. All possible transforms have **offsets** which means they need to be kept up-to-date during slicing.
29+
3. Allow extracting metadata necessary to accurately represent the information on disk.
30+
31+
## Some complications
32+
33+
### Handling projections
34+
1. There are at least 5 ways to record a raster ⇒ model space transformation.
35+
2. Information for these transforms may be stored in many places depending on the reading library:
36+
1. `ds.spatial_ref.attrs`(rioxarray stores GeoTransform, gcps here)
37+
2. `ds.band_data.attrs` (rioxarray stores TIEPOINTS here)
38+
3. `ds.attrs` possibly since this would be the most direct way to map TIFF tags
39+
4. It seems possible to store RPCs as either arrays or as attrs.
40+
41+
We'd like a design that is extensible to handle all 5 (or more) cases; again suggesting an explicit lower-level API.
42+
43+
### Composing with CRSIndex
44+
45+
[One unanswered question so far is](https://github.com/benbovy/xproj/issues/22#issuecomment-2789459387)
46+
> I think the major decision is "who handles the spatial_ref / grid mapping variable". Should it be exclusively handled by xproj.CRSIndex? Or should it be bound to a geospatial index such as rasterix.RasterIndex, xvec.GeometryIndex, xoak.S2PointIndex, etc.?
47+
48+
i.e. does the Index wrap CRSIndex too, or compose with it.
49+
Some points:
50+
1. reprojection requires handling both the transform and the CRS.
51+
2. the EDR-like selection API is similar.
52+
53+
Importantly, GDAL chooses to write GeoTransform to the `grid_mapping` variable in netCDF which _also_ records CRS information.
54+
55+
This gives us two options:
56+
1. RasterIndex wraps CRSIndex too and handles everything.
57+
2. RasterIndex extracts projection information, however it is stored (e.g. GeoTransform), and tracks it internally. Any functionality that relies on both the transformation and CRS will need to be built as an accessor layer.
58+
59+
Below is a proposal for (2).
60+
61+
## Proposal for transform index
62+
63+
We design RasterIndex as a wrapper around **one** of many transform based indexes:
64+
1. AffineTransformIndex ↔ GeoTransform
65+
2. ModelTransformationIndex ↔ ModelTransformationTag
66+
3. ModelTiepointScaleIndex ↔ ModelTiepointTag + ModelPixelScaleTag
67+
4. GCPIndex ↔ Ground Control Points
68+
5. RPCIndex ↔ Rational Polynomial Coefficients
69+
70+
Each of the wrapped index has an associated transform:
71+
```python
72+
@dataclass
73+
class RasterTransform:
74+
rpc: RPC | None # rpcs
75+
tiepoint_scale : ModelTiepointAndScale | None # ModelTiepointTag, ModelPixelScaleTag
76+
gcps: GroundControlPoints | None
77+
transformation : ModelTransformation | None
78+
geotransform : Affine | None # GeoTransform
79+
80+
def from_geotransform(attrs: dict) -> Self:
81+
...
82+
83+
def from_tiepoints(attrs: dict) -> Self:
84+
...
85+
86+
def from_gcps(gcps: ?) -> Self:
87+
...
88+
89+
def from_rpcs(?) -> Self:
90+
...
91+
```
92+
93+
### Read-time
94+
These transforms are constructed by **popping** the relevant information from a user-provided source.
95+
```python
96+
transform = rasterix.RasterTransform.from_geotransform(ds.spatial_ref.attrs)
97+
# transform = rasterix.RasterTransform.from_tiepoints(ds.band_data.attrs)
98+
```
99+
100+
By **popping** the information, the transformation is stored _only_ on the index, and must be rewritten back to the dataset by the user when writing to the disk. This is the RioXarray pattern.
101+
102+
Once a transform is constructed, we could do
103+
```python
104+
index = RasterIndex.from_transform(transform, dims_and_sizes=...)
105+
ds = ds.assign_coords(xr.Coordinates.from_xindex(index))
106+
ds = ds.set_xindex("spatial_ref", xproj.CRSIndex)
107+
```
108+
### Write-time
109+
110+
Before write, we must write the transform (similar to rioxarray) and _destroy_ the RasterIndex instance.
111+
This seems almost required since there are many ways of recording the transformation information on the dataset; and many of these approaches might be used in the same dataset.
112+
```python
113+
# encodes the internal RasterTransform in attributes of the `to` variable
114+
# destroys the RasterIndex
115+
ds = ds.rasterix.write_transform(to: Hashable | rasterix.SELF, formats=["geotransform", "tiepoint"])
116+
ds.rio.to_raster()
117+
```
118+
Here:
119+
1. `SELF` could mean write to the object (Dataset | DataArray) attrs
120+
2. `formats` allow you to record the same information in multiple ways
121+
3. `.rio.write_transform` could just dispatch to this method.

0 commit comments

Comments
 (0)