Analysis of Inverted NDWI Masks and Batch Processing Limitation #85
Replies: 3 comments 3 replies
-
|
I really appreciate the effort you put into understanding the codebase.
1. About ndwi_labels.py using only one image
Yes, that is intentional.
Initially, I had very limited data (the sample_data folder). At that stage,
the goal was to generate an initial ground truth mask from a single scene
and then create shapefiles from it. That shapefile was later used as ground
truth for further steps.
So ndwi_labels.py was designed as a single-image script (or I would say I
can only use that single good image due to data limitation). It
intentionally processes: 268898_0369619_2016-10-15_0e14_BGRN_SR_clip.tif
This was done because the other images in sample_data folder are not
complete .
Input → sample_data
Output → results_ndwi_labels
(make sure you correctly understand the what is input folder to the file
and what is output folder)
2.About spatial intersection / skipped buffers
Yes, your understanding is correct about this .The script loops over the
points in the shapefile, not over multiple images. For each point, it
creates a buffer and checks if that buffer overlaps with the image. If it
does not overlap, it skips that point.So if many points are outside the
image boundary, they will simply be skipped. That is why you may see
limited output.
3.. About inverted NDWI masks and Otsu
Right now, there is no special step to remove or ignore the no-data margins
before applying Otsu. Because of that, the no-data borders can affect the
threshold value.When that happens, the model may classify land as the
high-value class and water as the low-value class, which looks like an
inversion. but that is not our concern right now....so don't not spend more
time on the border or why the mask is a little bit inverted ...although
you can mention this point in the proposal.
So this is more of a preprocessing limitation rather than an actual bug in
the code.
regards,
Ritika
…On Sun, Mar 1, 2026 at 2:16 PM pupaknightKG ***@***.***> wrote:
*Subject:* Addressing the Batch Processing Limitation in 'ndwi_labels.py'
and analysis of Inverted NDWI Masks.
*Addressing to:* @fwitmer <https://github.com/fwitmer> , @Ritika-K7
<https://github.com/Ritika-K7>
*Current Progress:*
1. Environment setup and Pipeline verification: With a successful
Conda Environment setup, I've run the 'ndwi_labels.py' script on the
sample_data\PlanetLabs , successfully generating the NDWI binarized masks
and summary PNGs, and visualizing the Otsu thresholding outputs.
*Issue Observation:*
1.
*'ndwi_labels.py' Limitation:* After running the script, I was getting
a single output TIFF(among other files). I even altered the order of the
image files but got the same, unchanged processed output. Auditing the
data-loading utilities, I came across this line:
'*get_image_path(config, index=0)*'
inside a helper function in 'load_config.py' and got to know that it's
hardcoded to just a single iteration. I suspect the lack of change when
reordering images is due to the spatial intersection logic—the points in
the current shapefile likely only overlap with a specific scene, causing
the buffer loop to skip processing for other images.
2.
*Inverted NDWI Masks:* At first I didn't know which colours represent
water and land, but after analysing I realised the inversion in the binary
mask outputs (where land was rendered as the high-value class), and also,
that this is due to Otsu’s thresholding being biased to the 'no-data'
borders in the clipped PlanetLabs scenes, which shifts the bimodal
distribution(essentially inverting the classification of land and water).
Screenshot.2026-02-27.212016.png (view on web)
<https://github.com/user-attachments/assets/6f80da2b-1624-4679-bf0e-afc313ebf9c3>
*Help needed / Query:*
1.
Although, I might have my finger on the thing that's causing the "no
change in output files", I would like to have a proper understanding of the
situation from the mentor's perspective.
2.
Does the current pipeline have a standard preprocessing step to mask
out the margins before thresholding, or is the plan for GSoC 2026 to let
the U-Net architecture handle these artifacts through spatial context?
3.
Would you suggest me refactoring the batch processing loops to prepare
the infrastructure for the DL models and PR for it? Or is there a specific
data-preprocessing task you'd prefer me to look at first?
I will be sharing my progress as I further dive into the codes. Apologies
to anyone who finds my late joining to be insincere. Looking forward to
work with Alaska in and beyond #GSoC2026.
I will be Soumya.
Sincere Regards
------------------------------
*About Me:* My name is Soumya R. Sahoo, from India. I'm currently doing
an AI/ML specialization. Currently eyeing for two projects under Alaska,
and another under Project Mesa in #GSoC2026.
—
Reply to this email directly, view it on GitHub
<#85>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/BD5MT6JR5OI6GBXFBY2V2BD4OP2E5AVCNFSM6AAAAACWDFH5DKVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZZGU2TGNJTGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Subject: Synthesis of Pipeline Observations and GSoC Proposal Progress. Addressing to: @fwitmer , @Ritika-K7 Acknowledgement: Thanks for the detailed clarification in your last response. Your insights into the intentional single-image scope of 'ndwi_labels.py' and the spectral bias introduced by no-data margins helped. Current Progress:
Regarding the timeline, I have spent the last two weeks partly in this project and partly in another project, both from the Alaska project ecosystem. Approximately one week was dedicated to the technical audit of the 'CoastlineExtraction' codebase, with remainder of the window spent researching meteorological and optical data fusion for the 'WildfirePrediction' project. While working on these distinct tasks separately, the parallel engagement incidentally enhanced my proficiency with some of the shared concepts and geospatial frameworks... specifically in optimizing xarray operations and managing multi-dimensional tensors. In saying so, I hope you get the gap since my last update. I am currently finalizing the formal proposal and technical roadmap based on these milestones. I will be sharing my proposal draft within the next 24 hours with you and discuss the implementation details further. Open for any suggestions within this time. Soumya. About Me: My name is Soumya R. Sahoo, from India. I'm currently doing an AI/ML specialization. Currently eyeing for two projects under Alaska, and another under Project Mesa in #GSoC2026. |
Beta Was this translation helpful? Give feedback.
-
|
Addressing to: @fwitmer , @Ritika-K7 Update: I have mailed my proposal draft for review to both the mentors' designated mailing addresses with success. Kindly review the draft and share your suggestions so I could put up the proposal under the official GSoC website at the earliest. In case any of the mentors don't get the mail delivered to them, please inform me. Soumya |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Subject: Addressing the Batch Processing Limitation in 'ndwi_labels.py' and analysis of Inverted NDWI Masks.
Addressing to: @fwitmer , @Ritika-K7
Current Progress:
Issue Observation:
'ndwi_labels.py' Limitation: After running the script, I was getting a single output TIFF(among other files). I even altered the order of the image files but got the same, unchanged processed output. Auditing the data-loading utilities, I came across this line:
'get_image_path(config, index=0)'
inside a helper function in 'load_config.py' and got to know that it's hardcoded to just a single iteration. I suspect the lack of change when reordering images is due to the spatial intersection logic—the points in the current shapefile likely only overlap with a specific scene, causing the buffer loop to skip processing for other images.
Inverted NDWI Masks: At first I didn't know which colours represent water and land, but after analysing I realised the inversion in the binary mask outputs (where land was rendered as the high-value class), and also, that this is due to Otsu’s thresholding being biased to the 'no-data' borders in the clipped PlanetLabs scenes, which shifts the bimodal distribution(essentially inverting the classification of land and water).
Help needed / Query:
Although, I might have my finger on the thing that's causing the "no change in output files", I would like to have a proper understanding of the situation from the mentor's perspective.
Does the current pipeline have a standard preprocessing step to mask out the margins before thresholding, or is the plan for GSoC 2026 to let the U-Net architecture handle these artifacts through spatial context?
Would you suggest me refactoring the batch processing loops to prepare the infrastructure for the DL models and PR for it? Or is there a specific data-preprocessing task you'd prefer me to look at first?
I will be sharing my progress as I further dive into the codes. Apologies to anyone who finds my late joining to be insincere. Looking forward to work with Alaska in and beyond #GSoC2026.
I will be Soumya.
Sincere Regards
About Me: My name is Soumya R. Sahoo, from India. I'm currently doing an AI/ML specialization. Currently eyeing for two projects under Alaska, and another under Project Mesa in #GSoC2026.
Beta Was this translation helpful? Give feedback.
All reactions