-
Notifications
You must be signed in to change notification settings - Fork 22
Richardson Lucy Parallelization V2 #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Currently, DC2 (existing) and Parallel (new) Data Interfaces can be used interchangeably for serial code. They produce the same output. However, the latter must be used for parallel code. |
Thanks @avalluvan! I think it's a great improvement with respect to V1. I still need to look at the code in detail, but I read your description and checked the files changed. A few first impressions:
It’s good that you open the PR so we can start the review, but I’d wait for these two limitations to be resolved before merging.
|
On point 1, I have updated the code to migrate most parallelization features to I have resolved the merge conflicts in the response handling code. I had added a few comments to parts of the imaging code that took me a while to figure out for easier reading. Do you want me to remove those? The tutorial notebooks were probably modified greatly and I would not want to commit those changes to the develop branch. Do you think we should wait till the dataIF code is modified for DC3 and handles FullDetectorResponse objects properly? The current pull request adds a feature for parallel execution on top of the existing DC2 imaging codes, and I think could be merged as an iterative update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @avalluvan. I added some comments on your RL codes directly.
By the way, I noticed that you changed some classes which are probably not related to the RL parallelization itself. For example, FullDetectorResponse, SpacecraftFile, PointSourceResponse. I am concerning that reviewing these different issues simultaneously may cause mistakes easily. So, is it possible to separate them from this PR? Then, we can review this PR more easily.
# expected count histograms | ||
self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm) | ||
logger.info("The expected count histograms were calculated with the initial model map.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to keep these lines? To use the updated model for the likelihood calculation, I wanted to perform the expected count calculation at the post-processing and the initialization step and skip it in the Estep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can undo these changes. Do you plan on moving this to Estep()
in the future / removing Estep()
altogether?
@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter): | |||
else: | |||
os.makedirs(self.save_results_directory) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that RL needs to know if it is performed on the master node and needs this kind of parameter. I would suggest preparing two parameters alternatively, something like
- self.parallel_computation = True / False
- self.master_node = True / False
I want to prepare a parameter that explicitly tells if the computation is in parallel or not. I will add some suggestions regarding these changes at other lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py
or in RichardsonLucy.py
. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why this file is showing up in this pull request.
@@ -66,16 +70,26 @@ def __init__(self, initial_model, dataset, mask, parameter): | |||
else: | |||
os.makedirs(self.save_results_directory) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the ideas we discussed in a previous meeting was to let the program directly infer if it was being run in serial or parallel mode. In fact, the suggested flag variables were what I used in the initial V2 pull request code. Do you recommend making this modification, i.e, inferring self.parallel_computation in image_deconvolution.py
or in RichardsonLucy.py
. The issue with inferring this in the image deconvolution class is - what happens when we have multiple input datasets? --> [dataset1, dataset2, ...], each dataset will have its own "sub_comm" object.
# expected count histograms | ||
self.expectation_list = self.calc_expectation_list(model = self.initial_model, dict_bkg_norm = self.dict_bkg_norm) | ||
logger.info("The expected count histograms were calculated with the initial model map.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can undo these changes. Do you plan on moving this to Estep()
in the future / removing Estep()
altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all changes. All files except point_source_injector.ipynb are intact.
It looks like the unit tests are failing because |
Thanks @avalluvan . I haven't checked all of this yet, but about this:
However, mpi4py is a special case, because it needs to have the backend MPI installed, which I don't think you can do with pip (I used conda). One option is
|
…keleton from RichardsonLucy.py
… custom data types
…interface and main script to test the implementation
…ithParallelSupport
…gle node execution
Create new RLparallelscript.py with MPI porting capabilities Update dataIFWithParallelSupport.py to cull unnecessary for loops
Fixed bugs with summed_exposure_map(needs to be summed across processes) and dict_bkg_norm (was only being updated in MASTER node)
…pports parallel execution with a simple change to DataIF. Next task is to generalize DataIF
…as been removed. Bug fixed.
…llel Three instances (all pertaining to saving results) remain in RichardsonLucy class.
…formations - Added polarization module - Included util.py with functions for generating meshgrids, projections, and angle transformations - Created __init__.py to expose key functions from the util module
- Updated Orthographic and Stereographic conventions to directly accept SkyCoord objects for source direction and ref_vector. - Removed separate SC-to-celestial and celestial-to-SC transformation functions; consolidated into a single general transformation function. - Introduced a base PolarizationConvention class with OrthographicConvention and StereographicConvention as child classes. - Simplified reference vector handling, ensuring consistent frame transformations. - Added unit tests to verify the correctness of the new implementation and transformations.
- Updated the code to use astropy.coordinates.Angle for input and output of angles, allowing for arbitrary units instead of only radians. - Normalized the source direction vector to ensure orthographic projection equations work correctly regardless of the vector's length. - Removed the unnecessary project() function from util.py to clean up the code. - Updated unit tests to accommodate changes in angle handling.
- Corrected lines 55-57 to divide by norm_source**2 for accurate normalization. - Modified to ensure the polarization angle is returned within the range [0, pi] instead of [-pi, pi].
…on and master_node flags
…mage deconvolution config file.
17aae47
to
9532fbf
Compare
8618bec
to
e0ea997
Compare
Based on feedback that I received on version 1 of RL parallelization, I have incorporated a new setup.
RichardsonLucy.py
comm
that handles all MPI communication if a MPI descriptor is passed as an argument during initializationDataInterfaceWithParallelSupport.py
comm
objectdataset
returned by this new module works exactly the same way as the DataInterfaceDC2 module. Pass it to image_deconvolution throughImageDeconvolution.set_dataset([dataset])
histpy.Histogram
if they exist. Multiple instances of object reconstruction was required.RLparallelscript.py
mpiexec -n <number of processes> python RLparallelscript.py