Skip to content

Python bindings: Add Dataset.ReadAsMaskedArray #12172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 28, 2025

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Apr 18, 2025

Resolves #12005

xoff=xoff, yoff=yoff,
win_xsize=xsize, win_ysize=ysize,
buf_xsize=buf_xsize, buf_ysize=buf_ysize,
resample_alg=gdalconst.GRIORA_Mode) != 255
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps indicate in the doc that when downsampling the mask resampling algorithm is Mode (ie majority) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented this and also changed behavior of Band.ReadAsMaskedArray to match.

@dbaston dbaston marked this pull request as draft April 18, 2025 16:10
@dbaston
Copy link
Member Author

dbaston commented Apr 18, 2025

Hmm, I'm not sure how my tests were passing locally before. Right now I'm puzzled by

(Pdb) band.GetMaskBand().ReadAsArray()
array([[  0, 255],
       [  0,   0]], dtype=uint8)
(Pdb) band.GetMaskBand().ReadAsArray(buf_xsize=1, buf_ysize=1, resample_alg=gdal.GRIORA_Mode)
array([[255]], dtype=uint8)

@rouault
Copy link
Member

rouault commented Apr 18, 2025

@dbaston
Copy link
Member Author

dbaston commented Apr 18, 2025

I don't think GRA_Mode downscaling works for NoData mask bands. If I'm following the code right, it performs a downscaled read from the parent band (using GRA_Mode, and excluding NoData pixels) and then sets all valid pixels to 255. Which is different from performing a full resolution read on the parent, setting valid pixels to 255, and then downscaling.

@rouault
Copy link
Member

rouault commented Apr 19, 2025

I don't think GRA_Mode downscaling works for NoData mask bands.

yes the logic I pointed out where the mask band of the mask band is the mask band itself should probably be modified for Mode, and just get the GetMaskBand() of the mask band, which would be a AllValid mask band

@dbaston
Copy link
Member Author

dbaston commented Apr 21, 2025

I believe this is due to: https://github.com/OSGeo/gdal/blob/master/gcore/overview.cpp#L4453 / https://github.com/OSGeo/gdal/blob/master/gcore/overview.cpp#L5798

I don't see either of these lines hit in this case. The issue I see is

eErr = m_poParent->RasterIO(
GF_Read, nXOff, nYOff, nXSize, nYSize, pTemp, nBufXSize, nBufYSize,
eWrkDT, nWrkDTSize, static_cast<GSpacing>(nBufXSize) * nWrkDTSize,
psExtraArg);

where a downscaled read is performed from the parent band, and the result reclassified to 0/255. To get a correct result for GRIORA_Mode, I think we need to read from the parent band at full resolution, reclassify, and then downscale.

All this said, the current behavior of GRIORA_Mode is essentially GRIORA_Max and not necessarily undesirable as a default.

@dbaston dbaston force-pushed the python-dataset-read-as-masked-array branch from ffd3c4e to 3198ece Compare April 28, 2025 13:57
@dbaston
Copy link
Member Author

dbaston commented Apr 28, 2025

@rouault any thoughts on me marking test_numpy_rw_masked_array_mask_downsampling as xfail, then proceeding along these lines in a separate PR?

I think we need to read from the parent band at full resolution, reclassify, and then downscale

@rouault
Copy link
Member

rouault commented Apr 28, 2025

then proceeding along these lines in a separate PR?

I think we need to read from the parent band at full resolution, reclassify, and then downscale

Only for mode ? Otherwise I'm slightly concerned that if using a different downscaling algorithm for the mask than the actualband, there could be situations where this would lead to the downscaled mask indicating a pixel as valid whereas the actual downscaled band would have determined it not to be valid. In particular, for cubic or lanczos, there are subtelties regarding that determination (cf overview.cpp lines 3447-3471 (horizontal pass), 3654-3684 (vertical pass))

@dbaston
Copy link
Member Author

dbaston commented Apr 28, 2025

Only for mode ?

It's hard for me to picture what cubic resampling of a binary mask should yield, so it wouldn't bother me to keep the status quo there.

I'm approaching this from the expectation that resampling a mask band should yield the same results as resampling an ordinary band with the same values (0 or 255). Is that the wrong way to look at it?

@rouault
Copy link
Member

rouault commented Apr 28, 2025

Is that the wrong way to look at it?

I'm not sure what the "right" way is. I'm just underlying that the overview resampling code (also used by RasterIO() for non-nearest resampling) uses the mask band as its own mask band, ignoring 0 values. Both approaches (current GDAL one or the one you suggest) have their pros and cons depending use cases (if you want to be "optimistic" or "pessimistic" regarding pixel validity). Perhaps we should have a way for the user to specify which it prefers, but that would further complicates things...

@dbaston dbaston force-pushed the python-dataset-read-as-masked-array branch from 519632d to 4f495bc Compare April 28, 2025 15:32
@dbaston
Copy link
Member Author

dbaston commented Apr 28, 2025

I've exposed a mask_resample_alg argument in both ReadAsMaskedArray() methods, which I think is a resonable addition: I can imagine a user wanting to read pixels with GRIORA_Average and have the mask resampled with GRIORA_Mode, GRIORA_Min, or GRIORA_Max (I realize these last two don't currently exist).

This also sidesteps questions about the behavior of those algorithms for now.

@dbaston dbaston marked this pull request as ready for review April 28, 2025 16:03
@coveralls
Copy link
Collaborator

Coverage Status

coverage: 70.754% (+0.01%) from 70.74%
when pulling 4f495bc on dbaston:python-dataset-read-as-masked-array
into 4aa3135 on OSGeo:master.

@rouault rouault merged commit adcb70a into OSGeo:master Apr 28, 2025
37 checks passed
@rouault rouault added this to the 3.11.0 milestone Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python: Add Dataset.ReadAsMaskedArray
3 participants