Analysis of Inverted NDWI Masks and Batch Processing Limitation #85

pupaknightKG · 2026-03-01T08:45:44Z

pupaknightKG
Mar 1, 2026

Subject: Addressing the Batch Processing Limitation in 'ndwi_labels.py' and analysis of Inverted NDWI Masks.

Addressing to: @fwitmer , @Ritika-K7

Current Progress:

Environment setup and Pipeline verification: With a successful Conda Environment setup, I've run the 'ndwi_labels.py' script on the sample_data\PlanetLabs , successfully generating the NDWI binarized masks and summary PNGs, and visualizing the Otsu thresholding outputs.

Issue Observation:

'ndwi_labels.py' Limitation: After running the script, I was getting a single output TIFF(among other files). I even altered the order of the image files but got the same, unchanged processed output. Auditing the data-loading utilities, I came across this line:
'get_image_path(config, index=0)'
inside a helper function in 'load_config.py' and got to know that it's hardcoded to just a single iteration. I suspect the lack of change when reordering images is due to the spatial intersection logic—the points in the current shapefile likely only overlap with a specific scene, causing the buffer loop to skip processing for other images.
Inverted NDWI Masks: At first I didn't know which colours represent water and land, but after analysing I realised the inversion in the binary mask outputs (where land was rendered as the high-value class), and also, that this is due to Otsu’s thresholding being biased to the 'no-data' borders in the clipped PlanetLabs scenes, which shifts the bimodal distribution(essentially inverting the classification of land and water).

Help needed / Query:

Although, I might have my finger on the thing that's causing the "no change in output files", I would like to have a proper understanding of the situation from the mentor's perspective.
Does the current pipeline have a standard preprocessing step to mask out the margins before thresholding, or is the plan for GSoC 2026 to let the U-Net architecture handle these artifacts through spatial context?
Would you suggest me refactoring the batch processing loops to prepare the infrastructure for the DL models and PR for it? Or is there a specific data-preprocessing task you'd prefer me to look at first?

I will be sharing my progress as I further dive into the codes. Apologies to anyone who finds my late joining to be insincere. Looking forward to work with Alaska in and beyond #GSoC2026.

I will be Soumya.
Sincere Regards

About Me: My name is Soumya R. Sahoo, from India. I'm currently doing an AI/ML specialization. Currently eyeing for two projects under Alaska, and another under Project Mesa in #GSoC2026.

Ritika-K7 · 2026-03-01T14:41:16Z

Ritika-K7
Mar 1, 2026

I really appreciate the effort you put into understanding the codebase. 1. About ndwi_labels.py using only one image Yes, that is intentional. Initially, I had very limited data (the sample_data folder). At that stage, the goal was to generate an initial ground truth mask from a single scene and then create shapefiles from it. That shapefile was later used as ground truth for further steps. So ndwi_labels.py was designed as a single-image script (or I would say I can only use that single good image due to data limitation). It intentionally processes: 268898_0369619_2016-10-15_0e14_BGRN_SR_clip.tif This was done because the other images in sample_data folder are not complete . Input → sample_data Output → results_ndwi_labels (make sure you correctly understand the what is input folder to the file and what is output folder) 2.About spatial intersection / skipped buffers Yes, your understanding is correct about this .The script loops over the points in the shapefile, not over multiple images. For each point, it creates a buffer and checks if that buffer overlaps with the image. If it does not overlap, it skips that point.So if many points are outside the image boundary, they will simply be skipped. That is why you may see limited output. 3.. About inverted NDWI masks and Otsu Right now, there is no special step to remove or ignore the no-data margins before applying Otsu. Because of that, the no-data borders can affect the threshold value.When that happens, the model may classify land as the high-value class and water as the low-value class, which looks like an inversion. but that is not our concern right now....so don't not spend more time on the border or why the mask is a little bit inverted ...although you can mention this point in the proposal. So this is more of a preprocessing limitation rather than an actual bug in the code. regards, Ritika

…

On Sun, Mar 1, 2026 at 2:16 PM pupaknightKG ***@***.***> wrote: *Subject:* Addressing the Batch Processing Limitation in 'ndwi_labels.py' and analysis of Inverted NDWI Masks. *Addressing to:* @fwitmer <https://github.com/fwitmer> , @Ritika-K7 <https://github.com/Ritika-K7> *Current Progress:* 1. Environment setup and Pipeline verification: With a successful Conda Environment setup, I've run the 'ndwi_labels.py' script on the sample_data\PlanetLabs , successfully generating the NDWI binarized masks and summary PNGs, and visualizing the Otsu thresholding outputs. *Issue Observation:* 1. *'ndwi_labels.py' Limitation:* After running the script, I was getting a single output TIFF(among other files). I even altered the order of the image files but got the same, unchanged processed output. Auditing the data-loading utilities, I came across this line: '*get_image_path(config, index=0)*' inside a helper function in 'load_config.py' and got to know that it's hardcoded to just a single iteration. I suspect the lack of change when reordering images is due to the spatial intersection logic—the points in the current shapefile likely only overlap with a specific scene, causing the buffer loop to skip processing for other images. 2. *Inverted NDWI Masks:* At first I didn't know which colours represent water and land, but after analysing I realised the inversion in the binary mask outputs (where land was rendered as the high-value class), and also, that this is due to Otsu’s thresholding being biased to the 'no-data' borders in the clipped PlanetLabs scenes, which shifts the bimodal distribution(essentially inverting the classification of land and water). Screenshot.2026-02-27.212016.png (view on web) <https://github.com/user-attachments/assets/6f80da2b-1624-4679-bf0e-afc313ebf9c3> *Help needed / Query:* 1. Although, I might have my finger on the thing that's causing the "no change in output files", I would like to have a proper understanding of the situation from the mentor's perspective. 2. Does the current pipeline have a standard preprocessing step to mask out the margins before thresholding, or is the plan for GSoC 2026 to let the U-Net architecture handle these artifacts through spatial context? 3. Would you suggest me refactoring the batch processing loops to prepare the infrastructure for the DL models and PR for it? Or is there a specific data-preprocessing task you'd prefer me to look at first? I will be sharing my progress as I further dive into the codes. Apologies to anyone who finds my late joining to be insincere. Looking forward to work with Alaska in and beyond #GSoC2026. I will be Soumya. Sincere Regards ------------------------------ *About Me:* My name is Soumya R. Sahoo, from India. I'm currently doing an AI/ML specialization. Currently eyeing for two projects under Alaska, and another under Project Mesa in #GSoC2026. — Reply to this email directly, view it on GitHub <#85>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BD5MT6JR5OI6GBXFBY2V2BD4OP2E5AVCNFSM6AAAAACWDFH5DKVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZZGU2TGNJTGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

pupaknightKG · 2026-03-17T18:11:08Z

pupaknightKG
Mar 17, 2026
Author

Subject: Synthesis of Pipeline Observations and GSoC Proposal Progress.

Addressing to: @fwitmer , @Ritika-K7

Acknowledgement: Thanks for the detailed clarification in your last response. Your insights into the intentional single-image scope of 'ndwi_labels.py' and the spectral bias introduced by no-data margins helped.

Current Progress:
Below is the refined architectural focus I am incorporating into my formal proposal:

Scene-Aware Batch Processing: I have mapped out a refactor for the data ingestion layer to replace the current point-based buffer loop(which skips valid spatial data) with a vectorized spatial join infrastructure. The proposed architecture will refactor the current single-tile pilot implementation into a distributed, scene-agnostic batch-processing framework. By decoupling the data ingestion from hardcoded indices and utilizing vectorized spatial joins, the system will achieve the scalability required to perform high-throughput coastline extraction across the entire Alaskan coastal domain.
Thresholding Stability & Margin Masking: To resolve the "Otsu Inversion" issue, I’ve designed a pre-processing module that implements a Dynamic Validity Mask. This mask identifies and excludes "no-data" pixels from the histogram calculation prior to thresholding, ensuring the bimodal distribution accurately reflects the land-water boundary rather than scene artifacts.
Semantic Segmentation Transition (U-Net): Recognizing that traditional indices struggle with Arctic spectral complexities, I propose a transition to a U-Net architecture in the roadmap. This approach leverages spatial context to provide higher resilience against the ice and cloud-shadow artifacts that frequently compromise standard thresholding methods.

Regarding the timeline, I have spent the last two weeks partly in this project and partly in another project, both from the Alaska project ecosystem. Approximately one week was dedicated to the technical audit of the 'CoastlineExtraction' codebase, with remainder of the window spent researching meteorological and optical data fusion for the 'WildfirePrediction' project. While working on these distinct tasks separately, the parallel engagement incidentally enhanced my proficiency with some of the shared concepts and geospatial frameworks... specifically in optimizing xarray operations and managing multi-dimensional tensors. In saying so, I hope you get the gap since my last update.

I am currently finalizing the formal proposal and technical roadmap based on these milestones. I will be sharing my proposal draft within the next 24 hours with you and discuss the implementation details further. Open for any suggestions within this time.

Soumya.
Sincere Regards

About Me: My name is Soumya R. Sahoo, from India. I'm currently doing an AI/ML specialization. Currently eyeing for two projects under Alaska, and another under Project Mesa in #GSoC2026.

1 reply

pupaknightKG Mar 18, 2026
Author

Update: As eager I am to share my proposal draft with you, I have to take some more time for doing the same. I could've stated the reason, but I have to come brief and certain with my working. I will be sharing my draft with you soon.

pupaknightKG · 2026-03-28T14:52:42Z

pupaknightKG
Mar 28, 2026
Author

Addressing to: @fwitmer , @Ritika-K7

Update: I have mailed my proposal draft for review to both the mentors' designated mailing addresses with success. Kindly review the draft and share your suggestions so I could put up the proposal under the official GSoC website at the earliest.

In case any of the mentors don't get the mail delivered to them, please inform me.

Soumya
Best Regards.

2 replies

pupaknightKG Mar 30, 2026
Author

Addressing to: @fwitmer , @Ritika-K7

Appeal: Reminder to kindly consider my proposal for reviewing. I shall be applying on the official GSoC website by tonight, so a response on the review prior tonight will be really helping.

The mail should have its sender as "[email protected]" .

Soumya
Best Regards.

pupaknightKG Mar 30, 2026
Author

Update: I will be engaged with only the project during the whole of summer-code period, hence will be flexible with my working hours. I'll appreciate if any/both of the mentors could suggest me the timings which they will be wanting me to work in; also, knowledge of the mentors' respective time zones, that they reside in, will ease off deciding the work hours as well(only if it's appropriate to mention).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of Inverted NDWI Masks and Batch Processing Limitation #85

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Analysis of Inverted NDWI Masks and Batch Processing Limitation #85

Uh oh!

pupaknightKG Mar 1, 2026

Replies: 3 comments · 3 replies

Uh oh!

Ritika-K7 Mar 1, 2026

Uh oh!

pupaknightKG Mar 17, 2026 Author

Uh oh!

pupaknightKG Mar 18, 2026 Author

Uh oh!

pupaknightKG Mar 28, 2026 Author

Uh oh!

pupaknightKG Mar 30, 2026 Author

Uh oh!

pupaknightKG Mar 30, 2026 Author

pupaknightKG
Mar 1, 2026

Replies: 3 comments 3 replies

Ritika-K7
Mar 1, 2026

pupaknightKG
Mar 17, 2026
Author

pupaknightKG Mar 18, 2026
Author

pupaknightKG
Mar 28, 2026
Author

pupaknightKG Mar 30, 2026
Author

pupaknightKG Mar 30, 2026
Author