Richardson Lucy Parallelization V2 #274

avalluvan · 2024-12-13T01:46:02Z

Based on feedback that I received on version 1 of RL parallelization, I have incorporated a new setup.

RichardsonLucy.py

Introduction of a new object comm that handles all MPI communication if a MPI descriptor is passed as an argument during initialization
No changes to existing structure. Features such as acceleration parameter, background normalization, etc. re-use existing code
1 synchronization barrier at initialization (summed exposure map) and 3 barriers at every iteration (end of Estep - expectation, end of Mstep - delta_model, and postprocessing broadcast of updated model) to support message passing.
Introduction of "slice" variables to work with the partial vectors that are generated at each node
Final output is exactly similar to earlier
Works in both serial and parallel modes
Serial mode (and single node parallel mode, which is effectively a serial code) can be accessed via a jupyter notebook (including the existing tutorial). For parallel mode, use RLparallelscript.py

DataInterfaceWithParallelSupport.py

An iterative update for the data interface
Instead of passing the data directly, one needs to pass the event, bg and image response filenames.
Also pass the comm object
Limitation: I am working towards expanding this to handling the FullDetectorResponse file. The NUMROWS and NUMCOLS are currently fixed but will use FullDetectorResponse.npix in the future
The object dataset returned by this new module works exactly the same way as the DataInterfaceDC2 module. Pass it to image_deconvolution through ImageDeconvolution.set_dataset([dataset])
Limitation: Many ad-hoc fixes for sliced data formats. Could be fixed with specific features from histpy.Histogram if they exist. Multiple instances of object reconstruction was required.
Ideally, two response files should be stores (the usual and the transposed version) to maximize speed upgrades from column major data reads.
Functions like calc_likelihood, calc_expectation, etc. remain the same except minor changes from "full" variables to "sliced" variables to accommodate each node's smaller execution size.

RLparallelscript.py

A completely refreshed version of RLparallelscript.py
Invoked as mpiexec -n <number of processes> python RLparallelscript.py

avalluvan · 2024-12-13T02:25:47Z

Currently, DC2 (existing) and Parallel (new) Data Interfaces can be used interchangeably for serial code. They produce the same output. However, the latter must be used for parallel code.

israelmcmc · 2024-12-13T15:13:09Z

Thanks @avalluvan! I think it's a great improvement with respect to V1. I still need to look at the code in detail, but I read your description and checked the files changed. A few first impressions:

About these two limitations:

Limitation: I am working towards expanding this to handling the FullDetectorResponse file. The NUMROWS and NUMCOLS are currently fixed but will use FullDetectorResponse.npix in the future
Limitation: Many ad-hoc fixes for sliced data formats. Could be fixed with specific features from histpy.Histogram if they exist. Multiple instances of object reconstruction was required.

It’s good that you open the PR so we can start the review, but I’d wait for these two limitations to be resolved before merging.

Can you please clarify why you needed to modify RichardsonLucy.py? In the example I shared on Richardson Lucy Parallelization #237 I was able to use MPI only by changing the data interface (all barriers were contained in the data interface of that class). I'd prefer to modify as few parts of the code as possible unless really necessary.
In a similar spirit, I see that many other file changes seem unrelated or not needed for the RL parallelization, e.g., notebooks, config files, your personal comments in code, and probably some features related to the new response handling. Can you please keep only the changes needed for the features in this PR? That will facilitate the review. I'd be happy to help if you have questions on how to deal with git to achieve this.

avalluvan · 2024-12-15T20:31:51Z

On point 1, I have updated the code to migrate most parallelization features to dataIFWithParallelSupport.py. There are three instances where the RichardsonLucy class needs to know which node is running it (serial mode/MASTER node or any other node), all of them pertaining to how results are processed. register_results() can, in principle, be stored in each node, however, if all the nodes were to save it to disk then that may cause unnecessary R/W slowdowns. Instead of passing the comm object to the RichardsonLucy class, do you suggest adding a flag variable? If yes, where should I add that - image_deconvolution.py or RichardsonLucy.py

I have resolved the merge conflicts in the response handling code. I had added a few comments to parts of the imaging code that took me a while to figure out for easier reading. Do you want me to remove those? The tutorial notebooks were probably modified greatly and I would not want to commit those changes to the develop branch.

Do you think we should wait till the dataIF code is modified for DC3 and handles FullDetectorResponse objects properly? The current pull request adds a feature for parallel execution on top of the existing DC2 imaging codes, and I think could be merged as an iterative update.

hiyoneda

Thanks, @avalluvan. I added some comments on your RL codes directly.

By the way, I noticed that you changed some classes which are probably not related to the RL parallelization itself. For example, FullDetectorResponse, SpacecraftFile, PointSourceResponse. I am concerning that reviewing these different issues simultaneously may cause mistakes easily. So, is it possible to separate them from this PR? Then, we can review this PR more easily.

cosipy/image_deconvolution/RichardsonLucy.py

hiyoneda · 2024-12-16T14:28:24Z

cosipy/image_deconvolution/RichardsonLucy.py

-        # expected count histograms
-        self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm)
-        logger.info("The expected count histograms were calculated with the initial model map.")
-


It is possible to keep these lines? To use the updated model for the likelihood calculation, I wanted to perform the expected count calculation at the post-processing and the initialization step and skip it in the Estep.

I can undo these changes. Do you plan on moving this to Estep() in the future / removing Estep() altogether?

hiyoneda · 2024-12-16T14:36:10Z

cosipy/image_deconvolution/RichardsonLucy.py

@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter):
            else:
                os.makedirs(self.save_results_directory)



I understand that RL needs to know if it is performed on the master node and needs this kind of parameter. I would suggest preparing two parameters alternatively, something like

self.parallel_computation = True / False

self.master_node = True / False

I want to prepare a parameter that explicitly tells if the computation is in parallel or not. I will add some suggestions regarding these changes at other lines.

One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py or in RichardsonLucy.py. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.

cosipy/image_deconvolution/RichardsonLucy.py

cosipy/image_deconvolution/allskyimage.py

avalluvan · 2024-12-16T18:09:48Z

docs/tutorials/source_injector/Point_source_injector.ipynb

I do not understand why this file is showing up in this pull request.

cosipy/image_deconvolution/allskyimage.py

avalluvan · 2024-12-16T18:21:17Z

cosipy/image_deconvolution/RichardsonLucy.py

@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter):
            else:
                os.makedirs(self.save_results_directory)



One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py or in RichardsonLucy.py. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.

avalluvan · 2024-12-16T18:26:45Z

cosipy/image_deconvolution/RichardsonLucy.py

-        # expected count histograms
-        self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm)
-        logger.info("The expected count histograms were calculated with the initial model map.")
-


I can undo these changes. Do you plan on moving this to Estep() in the future / removing Estep() altogether?

cosipy/image_deconvolution/RichardsonLucy.py

avalluvan

Reviewed all changes. All files except point_source_injector.ipynb are intact.

avalluvan · 2024-12-16T21:02:36Z

It looks like the unit tests are failing because mpi4py is not part of the package. How do I rectify that?

israelmcmc · 2024-12-16T21:24:05Z

Thanks @avalluvan . I haven't checked all of this yet, but about this:

It looks like the unit tests are failing because mpi4py is not part of the package. How do I rectify that?
In principle, you should update the requirements here:

cosipy/setup.py

Line 25 in c87577d

install_requires = ["histpy",

However, mpi4py is a special case, because it needs to have the backend MPI installed, which I don't think you can do with pip (I used conda). One option is

Since most users won't use mpi4py, we can make it an optional package. If it's not installed, then a try/except clause will catch it during the imports. The code can only run in series in that case, which is fine.
The above will make the test pass, but it won't actually test the parallelization. We should probably have a custom workflow where MPI is installed to test this. It's OK with me if we leave this for future work.

…orithm_classes

…keleton from RichardsonLucy.py

…pt.py

…o other files

… custom data types

…interface and main script to test the implementation

…ithParallelSupport

…gle node execution

Create new RLparallelscript.py with MPI porting capabilities Update dataIFWithParallelSupport.py to cull unnecessary for loops

Fixed bugs with summed_exposure_map(needs to be summed across processes) and dict_bkg_norm (was only being updated in MASTER node)

…pports parallel execution with a simple change to DataIF. Next task is to generalize DataIF

…as been removed. Bug fixed.

…te pull request

…llel Three instances (all pertaining to saving results) remain in RichardsonLucy class.

…formations - Added polarization module - Included util.py with functions for generating meshgrids, projections, and angle transformations - Created __init__.py to expose key functions from the util module

- Updated Orthographic and Stereographic conventions to directly accept SkyCoord objects for source direction and ref_vector. - Removed separate SC-to-celestial and celestial-to-SC transformation functions; consolidated into a single general transformation function. - Introduced a base PolarizationConvention class with OrthographicConvention and StereographicConvention as child classes. - Simplified reference vector handling, ensuring consistent frame transformations. - Added unit tests to verify the correctness of the new implementation and transformations.

- Updated the code to use astropy.coordinates.Angle for input and output of angles, allowing for arbitrary units instead of only radians. - Normalized the source direction vector to ensure orthographic projection equations work correctly regardless of the vector's length. - Removed the unnecessary project() function from util.py to clean up the code. - Updated unit tests to accommodate changes in angle handling.

- Corrected lines 55-57 to divide by norm_source**2 for accurate normalization. - Modified to ensure the polarization angle is returned within the range [0, pi] instead of [-pi, pi].

…on and master_node flags

…mage deconvolution config file.

avalluvan added the imaging label Dec 13, 2024

avalluvan added this to the v4.0 - DC4 milestone Dec 13, 2024

avalluvan requested review from israelmcmc and hiyoneda December 13, 2024 01:46

avalluvan self-assigned this Dec 13, 2024

hiyoneda reviewed Dec 16, 2024

View reviewed changes

avalluvan commented Dec 16, 2024

View reviewed changes

israelmcmc closed this Dec 16, 2024

israelmcmc reopened this Dec 16, 2024

avalluvan added 15 commits April 7, 2025 22:06

Added new option "RLparallel" to ImageDeconvolution.deconvolution_alg…

00e3d54

…orithm_classes

Renamed RLparallel.py to RichardsonLucyParallel.py and adapted code s…

bc7bf86

…keleton from RichardsonLucy.py

Installed mpi4py to cosipy venv and added it to skeleton

668a0f8

Created subprocess call. Ported MPI to separate script RLparallelscri…

929725b

…pt.py

Added code to register results if save_results_flag is set to True

50bd0f2

End-to-end working script version commit

3c8a5c4

Unclear what files are causing conflicts

cf46750

git fetch changes unclear. Performing a commit to avoid data loss.

159145e

Create new file RichardsonLucyWithParallelSupport.py. Stash changes t…

d70f4f4

…o other files

Skeleton for RLWithParellelSupport.py. Need to update MPI.DOUBLE with…

a5bb25a

… custom data types

Looks like RLWithParallelSupport.py is complete. Need to create data …

eadba9a

…interface and main script to test the implementation

Everything up to run_deconvolution works with new data interface WPS

bdb95ca

Dry run with single MPI process works with RichardsonLucy and dataIFW…

9444894

…ithParallelSupport

RichardsonLucyWPS.py works with both dataIF_DC2 and dataIFWPS for sin…

291433a

…gle node execution

Working set up for small numproc MPI runs

5b27710

avalluvan and others added 26 commits April 7, 2025 22:29

Syntactically correct. Algorithm has some discrepancies

9bb7475

Archive old version of RLparallelscript.py and RichardsonLucyParallel.py

3a6cdb6

Create new RLparallelscript.py with MPI porting capabilities Update dataIFWithParallelSupport.py to cull unnecessary for loops

Single and multi process produce same, exact outputs!

ca0a922

Fixed bugs with summed_exposure_map(needs to be summed across processes) and dict_bkg_norm (was only being updated in MASTER node)

Replaced old RichardsonLucy.py with new RichardsonLucy.py that now su…

d3c9ae7

…pports parallel execution with a simple change to DataIF. Next task is to generalize DataIF

Minor bug in image_deconvolution now that RichardsonLucyParallel.py h…

b76b634

…as been removed. Bug fixed.

Removed unnecessary imports in new RichardsonLucy.py

1ca8ea4

Added comments. Removed stubs of RLparallel. Ready to rebase and crea…

19f12e9

…te pull request

Migrated RichardsonLucy slicing and message passing to DataIFWithPara…

0b09189

…llel Three instances (all pertaining to saving results) remain in RichardsonLucy class.

Add polarization module with utility functions for polarization trans…

baa6b31

…formations - Added polarization module - Included util.py with functions for generating meshgrids, projections, and angle transformations - Created __init__.py to expose key functions from the util module

Fix: Correct normalization and constrain polarization angle to [0, pi]

1c8627e

- Corrected lines 55-57 to divide by norm_source**2 for accurate normalization. - Modified to ensure the polarization angle is returned within the range [0, pi] instead of [-pi, pi].

Fix polarization init filename

883b91f

Move conventions to conventions class. Fix a few definition issues

c9b39e1

MEGAlib relative Pa conventions

d22621a

pa_transform refactoring

e68ea00

Update polarization angle tests after refactoring

0ddf3ee

polarization convention docs

39e46de

Doc PA conversion

5a48241

Bug fix in reshaping and concatenating epsilon and C slices

0db340b

Replacing superfluous edits with files from cositools/develop

cfee30d

Undo changes to dataIF_COSI_DC2.py and allskyimage.py

87ba1c2

Undoing changes to RichardsonLucy.py Estep structure

97c03da

Incorporated review suggestions about placement of parallel_computati…

547f71d

…on and master_node flags

Rename dataIFWithParallel.py to dataIF_Parallel.py. Undo changes to i…

0985031

…mage deconvolution config file.

Calculate parallel and master_node flags in RLparallelscript.py

9532fbf

avalluvan force-pushed the feature/RLparallel branch from 17aae47 to 9532fbf Compare April 8, 2025 05:36

Remove all rebase related excess commits

e0ea997

avalluvan force-pushed the feature/RLparallel branch 2 times, most recently from 8618bec to e0ea997 Compare April 8, 2025 07:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Richardson Lucy Parallelization V2 #274

Richardson Lucy Parallelization V2 #274

avalluvan commented Dec 13, 2024

avalluvan commented Dec 13, 2024

israelmcmc commented Dec 13, 2024

avalluvan commented Dec 15, 2024

hiyoneda left a comment

hiyoneda Dec 16, 2024

avalluvan Dec 16, 2024

hiyoneda Dec 16, 2024

avalluvan Dec 16, 2024

avalluvan Dec 16, 2024

avalluvan Dec 16, 2024

avalluvan Dec 16, 2024

avalluvan left a comment

avalluvan commented Dec 16, 2024

israelmcmc commented Dec 16, 2024

		@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter):
		else:
		os.makedirs(self.save_results_directory)

Richardson Lucy Parallelization V2 #274

Are you sure you want to change the base?

Richardson Lucy Parallelization V2 #274

Conversation

avalluvan commented Dec 13, 2024

avalluvan commented Dec 13, 2024

israelmcmc commented Dec 13, 2024

avalluvan commented Dec 15, 2024

hiyoneda left a comment

Choose a reason for hiding this comment

hiyoneda Dec 16, 2024

Choose a reason for hiding this comment

avalluvan Dec 16, 2024

Choose a reason for hiding this comment

hiyoneda Dec 16, 2024

Choose a reason for hiding this comment

avalluvan Dec 16, 2024

Choose a reason for hiding this comment

avalluvan Dec 16, 2024

Choose a reason for hiding this comment

avalluvan Dec 16, 2024

Choose a reason for hiding this comment

avalluvan Dec 16, 2024

Choose a reason for hiding this comment

avalluvan left a comment

Choose a reason for hiding this comment

avalluvan commented Dec 16, 2024

israelmcmc commented Dec 16, 2024