SNAPHU `Killed` Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

pbrotoisworo · 2025-01-25T03:41:56Z

EDIT: Updated title to reflect underlying SNAPHU issue. (out of memory)

Hi,

I've managed to create an analysis using 34 Sentinel-1 images. However, around 50% of the data is missing when I did a single reference network.
I put in 34 SLC images and the output timeseries and network only contains 15 data points. Is that normal? The network is still nicely distributed but the temporal resolution is less than expected due to the missing data.

I've checked the output of stackSentinel.py and the missing dates are there in the output folders such as merged/interferograms, baselines, coreg_secondarys, etc.

I've inspected the slcStack.h5 file and the "slc" key has shape (34, 1029, 5864) which I assume means all Sentinel-1 data was able to be ingested.

Input stackSentinel code

stackSentinel.py -s /mnt/e/data/insar-highways/demak \
--workflow interferogram \
--working_directory /mnt/e/data/insar-highways/demak_v5 \
-n 1 --bbox "-6.980585 -6.896600 110.435772 110.636444" \
-o /mnt/e/data/insar-highways/demak_v5/orbits \
-a /mnt/e/data/insar-highways/demak_v5/auxfiles \
-d /mnt/e/data/insar-highways/demak_v5/dem/dem.geo \
-V False \
-z 4 \
-r 20

Then I ran the miaplpyApp using miaplpyApp.py demak.cfg --dir /mnt/e/data/insar-highways/demak_v5/miaplpy using the cfg below.

################
miaplpy.load.processor      = isce  #[isce,snap,gamma,roipac], auto for isceTops
miaplpy.load.updateMode     = no  #[yes / no], auto for yes, skip re-loading if HDF5 files are complete
miaplpy.load.compression    = auto  #[gzip / lzf / no], auto for no.
miaplpy.load.autoPath       = no    # [yes, no] auto for no
        
		
miaplpy.load.slcFile        = /mnt/e/data/insar-highways/demak_v5/merged/SLC/*/*.slc.full  #[path2slc_file]
##---------for ISCE only:
miaplpy.load.metaFile       = /mnt/e/data/insar-highways/demak_v5/reference/IW*.xml
miaplpy.load.baselineDir    = /mnt/e/data/insar-highways/demak_v5/baselines
##---------geometry datasets:
miaplpy.load.demFile          = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/hgt.rdr.full
miaplpy.load.lookupYFile      = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/lat.rdr.full
miaplpy.load.lookupXFile      = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/lon.rdr.full
miaplpy.load.incAngleFile     = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/los.rdr.full
miaplpy.load.azAngleFile      = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/los.rdr.full
miaplpy.load.shadowMaskFile   = /mnt/e/data/insar-highways/demak_v5/merged/geom_reference/shadowMask.rdr.full
##---------miaplpy.load.waterMaskFile    = /mnt/e/data/insar-highways/demak_v4/water_mask/swbdLat_S08_S06_Lon_E110_E111.wbd
##---------interferogram datasets:
miaplpy.load.unwFile        = /mnt/e/data/insar-highways/demak_v5/miaplpy/inverted/interferograms_single_reference/*/*fine*.unw
miaplpy.load.corFile        = /mnt/e/data/insar-highways/demak_v5/miaplpy/inverted/interferograms_single_reference/*/*fine*.cor
miaplpy.load.connCompFile   = /mnt/e/data/insar-highways/demak_v5/miaplpy/inverted/interferograms_single_reference/*/*.unw.conncomp
        
##---------subset (optional):
## if both yx and lalo are specified, use lalo option unless a) no lookup file AND b) dataset is in radar coord
miaplpy.subset.lalo         = -6.980585:-6.896600,110.435772:110.636444

# MiaplPy options 
miaplpy.multiprocessing.numProcessor   = 10
miaplpy.interferograms.type = single_reference

## Mintpy options
mintpy.compute.cluster     = local  # if dask is not available, set this option to no 
mintpy.compute.numWorker   = 4

mintpy.reference.lalo     = -6.9062397501293855, 110.62864532047873
mintpy.troposphericDelay.method = no

The text was updated successfully, but these errors were encountered:

pbrotoisworo · 2025-01-25T12:03:19Z

Just an update. I fixed the error.

I saw output in step 5 unwrap_ifgram where it says Killed multiple times which I think my computer just ran out of memory. So I think this resulted in the downstream processing assuming there were only 15 datasets because of many SNAPHU failures.

Checking my run_05_miaplpy_unwrap_ifgram file I see it has a lot of run commands executing at the same time. There are 20 run commands then a wait. After that 13 run commands before the last wait. I rewrote the file so there is a wait command after every 4 SNAPHU command. I'm not sure which parameter I originally used in the cfg, maybe miaplpy.compute.numCores. I set it to 20 because I have 20 CPU cores.

I run again with no problem. But then I had to delete numInvIfgram.h5, timeseries.h5, and temporalCoherence.h5 due to a mismatch in dataset sizes in later steps but the resulting output is good.

My thoughts on this for the project team:

Could there be a specific parameter for number of jobs for SNAPHU? I was fine tuning the CFG file for the phase linking step since it takes so long so I wanted to maximize the CPU cores. But if I'm understanding correctly the same parameter was used for number of jobs for SNAPHU which lead to unsafe process terminations.
There should be a way to safely catch the SNAPHU out of memory error in unwrap_ifgram. It doesn't raise an exception and the rest of the MiaplPy was able to run and just assumed it was valid despite 50% of the dataset missing.

mirzaees · 2025-02-06T16:31:45Z

Yes you are right, I am planning to use a python version of Snaphu and fix this issue in the near future.

codeautopilot · 2025-02-09T09:05:26Z

Potential Solution

The plan to solve the bug involves addressing the memory management issues during the SNAPHU unwrapping process, which is likely causing the "Killed" error due to excessive memory usage. By optimizing the configuration settings for SNAPHU, implementing memory checks, and enhancing error handling, we can prevent the process from being terminated unexpectedly and ensure all scenes are processed correctly.

What is Causing This Bug?

The bug is primarily caused by the SNAPHU unwrapping process consuming more memory than is available, leading to the process being killed by the operating system. This is likely due to the size of the interferograms being processed and the configuration settings not being optimized for the available system resources. Additionally, the lack of memory checks and detailed error handling in the scripts contributes to the issue.

Code

Optimize SNAPHU Configuration: Adjust the SNAPHU configuration parameters to better match the available system resources. This may involve reducing the number of tiles or adjusting other parameters to reduce memory usage.

# Example of adjusting SNAPHU configuration
snaphu_config = {
    'NLOOKSRANGE': 1,
    'NLOOKSAZ': 1,
    'TILEDIR': '/path/to/tiledir',
    'NUM_TILES': 4,  # Reduce the number of tiles if memory is limited
    'MAX_DISCONTINUITY': 10  # Adjust based on dataset characteristics
}

Implement Memory Checks: Add checks to ensure that sufficient memory is available before starting the SNAPHU process.

import psutil

def check_memory_availability(required_memory_gb):
    available_memory_gb = psutil.virtual_memory().available / (1024 ** 3)
    if available_memory_gb < required_memory_gb:
        raise MemoryError(f"Insufficient memory: {available_memory_gb} GB available, {required_memory_gb} GB required.")

# Example usage
check_memory_availability(8)  # Check if at least 8 GB of memory is available

Enhanced Error Handling: Improve error handling to catch and report memory-related errors more gracefully.

try:
    # Run SNAPHU process
    run_snaphu_process()
except MemoryError as e:
    print(f"Memory error encountered: {e}")
    # Additional logging or cleanup actions
except RuntimeError as e:
    print(f"Runtime error encountered: {e}")
    # Additional logging or cleanup actions

Logging and Monitoring: Enhance logging to provide more detailed information about the execution of each step, including memory usage and any errors encountered.

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def log_memory_usage():
    memory_info = psutil.virtual_memory()
    logger.info(f"Memory usage: {memory_info.percent}% used, {memory_info.available / (1024 ** 3):.2f} GB available")

# Example usage
log_memory_usage()

How to Replicate the Bug

Prepare a dataset of 34 Sentinel-1 SLC images and configure the processing pipeline as described in the user input.
Run the stackSentinel.py script with the specified parameters to generate the necessary input files.
Execute the miaplpyApp.py script using the provided configuration file.
Observe the output timeseries and network for missing scenes, and check for any "Killed" error messages in the logs.

By following these steps, the bug should be replicated, allowing for further testing and validation of the proposed solution.

Click here to create a Pull Request with the proposed solution

Files used for this task:

Changes on src/miaplpy/unwrap_ifgram.py

Analysis of `src/miaplpy/unwrap_ifgram.py`

Overview

The file unwrap_ifgram.py is responsible for unwrapping interferograms using the SNAPHU software. The script includes several functions and classes that manage the configuration and execution of the SNAPHU unwrapping process. The main class, Snaphu, handles the setup and execution of the unwrapping, including determining whether to split the process into tiles and managing the configuration files.

Potential Causes of the Bug

Memory Management:
- The error message "Killed" typically indicates that the process was terminated by the operating system, often due to excessive memory usage. The script does not appear to have explicit memory management or optimization strategies, which could lead to high memory consumption, especially when processing large datasets or multiple tiles.
Tile Management:
- The script includes logic to determine whether the unwrapping should be split into tiles (unwrap_tile method). If the number of tiles is not optimally configured, it could lead to inefficient memory usage. The calculation of y_tile and x_tile might not be optimal for the given dataset size.
Configuration File Handling:
- The configuration for SNAPHU is dynamically generated and written to a file. If the configuration parameters (e.g., NLOOKSRANGE, NLOOKSAZ, TILEDIR) are not set correctly, it could lead to inefficient processing and increased memory usage.
Error Handling:
- The script raises a RuntimeError if SNAPHU returns an error. However, it does not provide detailed logging or handling for memory-specific errors, which could help diagnose the issue.

Recommendations

Optimize Memory Usage:
- Implement memory profiling to identify bottlenecks and optimize memory usage. Consider using memory-efficient data structures or processing techniques.
Tile Configuration:
- Review and optimize the logic for determining the number of tiles (get_nproc_tile). Ensure that the tile size and number are appropriate for the dataset size and available system memory.
Configuration Parameters:
- Verify that the configuration parameters for SNAPHU are set optimally for the dataset and system. Consider allowing user input for critical parameters that affect memory usage.
Enhanced Logging and Error Handling:
- Implement detailed logging to capture memory usage statistics and SNAPHU output. Enhance error handling to provide more informative messages, particularly for memory-related issues.
Testing with Smaller Datasets:
- Test the unwrapping process with smaller datasets to ensure that the logic and configuration are correct before scaling up to larger datasets.

By addressing these areas, the script can be improved to handle large datasets more efficiently and reduce the likelihood of memory-related errors.

Changes on src/miaplpy/miaplpyApp.py

Analysis of `miaplpyApp.py`

The miaplpyApp.py script is a part of a larger workflow for processing InSAR data using MiaplPy. The script orchestrates various steps in the processing pipeline, including loading data, generating interferograms, unwrapping them, and performing time series analysis. The script is designed to be flexible, allowing for different configurations and processing options through a configuration file.

Key Observations

Workflow Steps: The script defines a series of steps (STEP_LIST) that are executed in sequence. These steps include loading data, phase linking, generating interferograms, unwrapping interferograms, and more.
Configuration Management: The script uses a configuration file to manage various parameters and settings. This includes paths to input data, processing options, and output directories.
Integration with SNAPHU: The unwrapping step (run_unwrap) is particularly relevant to the issue at hand. This step involves calling an external script (unwrap_ifgram.py) to perform phase unwrapping using SNAPHU. The script constructs command-line arguments for SNAPHU based on the configuration settings.
Resource Management: The script attempts to manage computational resources by determining the number of CPU cores available and adjusting the number of tasks accordingly. However, there is no explicit memory management or checks for available memory before executing memory-intensive tasks like unwrapping.
Error Handling: There is limited error handling in the script. If SNAPHU runs out of memory, it may simply terminate with a "Killed" message, which is consistent with the user's reported issue.

Potential Causes of the Bug

Memory Usage: The SNAPHU process may be consuming more memory than is available, leading to the process being killed by the operating system. This could be due to the size of the interferograms or the number of tiles being processed simultaneously.
Configuration Settings: The configuration settings for SNAPHU, such as the number of tiles (--num_tiles) and the maximum discontinuity (--max_discontinuity), may not be optimal for the available system resources.
Lack of Memory Checks: The script does not perform any checks on available system memory before starting the SNAPHU process. Implementing such checks could prevent the process from being killed unexpectedly.

Recommendations

Optimize SNAPHU Configuration: Review and adjust the SNAPHU configuration settings to better match the available system resources. This may involve reducing the number of tiles or adjusting other parameters to reduce memory usage.
Implement Memory Checks: Add checks to ensure that sufficient memory is available before starting the SNAPHU process. This could involve querying the system for available memory and adjusting the processing parameters accordingly.
Error Handling: Improve error handling to catch and report memory-related errors more gracefully. This could involve capturing the output of the SNAPHU process and checking for specific error messages.
Logging and Monitoring: Enhance logging to provide more detailed information about the execution of each step, including memory usage and any errors encountered. This would aid in diagnosing issues more effectively.

By addressing these areas, the likelihood of encountering out-of-memory errors during the SNAPHU process can be reduced, potentially resolving the issue of missing scenes in the output timeseries.

Changes on src/miaplpy/objects/slcStack.py

Analysis of `src/miaplpy/objects/slcStack.py`

The slcStack.py file is responsible for handling the loading and processing of Sentinel-1 SLC (Single Look Complex) data into a format suitable for time-series analysis. The main class, slcStackDict, manages a collection of SLCs, while the slcStack class handles the reading and writing of these datasets to and from HDF5 files.

Key Functions and Potential Issues

Data Loading and Processing:
- The slcStackDict class initializes with a dictionary of SLC pairs, which are then processed to extract metadata, size, and date information.
- The write2hdf5 method writes the processed SLC data into an HDF5 file, ensuring that the data is correctly structured and metadata is included.
Date and Scene Management:
- The get_date_list method retrieves a sorted list of dates from the SLC pairs, which is crucial for ensuring all scenes are accounted for in the time-series.
- The write2hdf5 method includes a section for creating a dataset of dates, which should match the number of SLCs processed.
Potential Causes for Missing Scenes:
- Incomplete Data Handling: If the pairsDict does not contain all expected SLC pairs, some scenes might be missing from the output. This could be due to an error in how the pairs are generated or filtered.
- HDF5 Writing Issues: The write2hdf5 method might not correctly write all datasets if there are issues with the input data or if the method encounters an error during execution.
- Metadata Misalignment: If the metadata does not correctly reflect the number of scenes or their dates, this could lead to discrepancies in the output.
Memory Management:
- The file does not explicitly handle memory management, which could be a concern given the large size of SLC datasets. This might indirectly affect the processing if the system runs out of memory, leading to incomplete data processing.

Recommendations

Verify Input Data: Ensure that the pairsDict contains all expected SLC pairs and that they are correctly processed. This might involve checking the input data source and any filtering logic applied before this point.
Error Handling: Implement additional error handling in the write2hdf5 method to catch and log any issues that occur during the writing process.
Memory Usage: Consider optimizing memory usage, especially if the system is prone to running out of memory during processing. This might involve processing data in smaller chunks or using more efficient data structures.

Overall, the file appears to be correctly structured for its intended purpose, but careful attention should be paid to the input data and the handling of potential errors during processing.

This comment was generated by AI. Information provided may be incorrect.

Current plan usage: 0%

Have feedback or need help?
Documentation
[email protected]

pbrotoisworo · 2025-02-09T09:08:55Z

Thanks @mirzaees. I've updated the Issue title to reflect the actual issue.

pbrotoisworo changed the title ~~Missing scenes in output timeseries (Only 15/34 scenes showing)~~ SNAPHU Killed Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNAPHU `Killed` Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

SNAPHU `Killed` Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

pbrotoisworo commented Jan 25, 2025 •

edited

Loading

pbrotoisworo commented Jan 25, 2025 •

edited

Loading

mirzaees commented Feb 6, 2025

codeautopilot bot commented Feb 9, 2025 •

edited

Loading

Analysis of `src/miaplpy/unwrap_ifgram.py`

Overview

Potential Causes of the Bug

Recommendations

Analysis of `miaplpyApp.py`

Key Observations

Potential Causes of the Bug

Recommendations

Analysis of `src/miaplpy/objects/slcStack.py`

Key Functions and Potential Issues

Recommendations

pbrotoisworo commented Feb 9, 2025

SNAPHU Killed Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

SNAPHU Killed Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

Comments

pbrotoisworo commented Jan 25, 2025 • edited Loading

pbrotoisworo commented Jan 25, 2025 • edited Loading

mirzaees commented Feb 6, 2025

codeautopilot bot commented Feb 9, 2025 • edited Loading

Potential Solution

What is Causing This Bug?

Code

How to Replicate the Bug

Analysis of src/miaplpy/unwrap_ifgram.py

Overview

Potential Causes of the Bug

Recommendations

Analysis of miaplpyApp.py

Key Observations

Potential Causes of the Bug

Recommendations

Analysis of src/miaplpy/objects/slcStack.py

Key Functions and Potential Issues

Recommendations

pbrotoisworo commented Feb 9, 2025

SNAPHU `Killed` Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

SNAPHU `Killed` Error Causes Missing scenes in output timeseries (Only 15/34 scenes showing) #104

pbrotoisworo commented Jan 25, 2025 •

edited

Loading

pbrotoisworo commented Jan 25, 2025 •

edited

Loading

codeautopilot bot commented Feb 9, 2025 •

edited

Loading

Analysis of `src/miaplpy/unwrap_ifgram.py`

Analysis of `miaplpyApp.py`

Analysis of `src/miaplpy/objects/slcStack.py`