Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zernike3d: performance, debug, and further documentation #104

Open
geoffwoollard opened this issue Dec 19, 2024 · 1 comment
Open

zernike3d: performance, debug, and further documentation #104

geoffwoollard opened this issue Dec 19, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@geoffwoollard
Copy link
Collaborator

@DavidHerreros I did some tests on a set of 80 smoothed gt maps against themselves. Results looked good so I merged in the zernike3d map to map distance code into dev.

However, I did notice one issue in the results: if you look in the first row, and last rows, the trend is different. Should I try to increase numProjections, or perhaps some other parameter? What might be going on here. Might it be an issue with the symmetry of the motion? The maps aren't out of order since this trend does not exist in the L2 distance. Why is it row wise vs column wise?
#100 (comment)

Some documentation on what gets written to disk inside the tmp directory (and what can be re-used for the same submission or gt maps, like the hd5 model, or reference/target .npy data)

Finally, if I have a large machine with X nodes with 128 cores per node, what value of thr in the config do you suggest I use, and how much memory do you suggest I use?

I requested these resources

#!/bin/bash
#SBATCH --job-name=zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=ccb
#SBATCH -C rome
#SBATCH --time=99:00:00

And for the 80 x 80 map to map distance computation, and after the job was complete these are the post mortem stats:

> seff 4235706
Job ID: 4235706
Cluster: rusty
User/Group: gwoollard/gwoollard
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 128
CPU Utilized: 2-02:57:21
CPU Efficiency: 20.90% of 10-03:50:24 core-walltime
Job Wall-clock time: 01:54:18
Memory Utilized: 21.01 GB
Memory Efficiency: 2.10% of 1000.00 GB

In general the computational will be over 3861 x 80

@geoffwoollard geoffwoollard added the enhancement New feature or request label Dec 19, 2024
@DavidHerreros
Copy link
Collaborator

Hi @geoffwoollard,

@DavidHerreros I did some tests on a set of 80 smoothed gt maps against themselves. Results looked good so I merged in the zernike3d map to map distance code into dev.

However, I did notice one issue in the results: if you look in the first row, and last rows, the trend is different. Should I try to increase numProjections, or perhaps some other parameter? What might be going on here. Might it be an issue with the symmetry of the motion? The maps aren't out of order since this trend does not exist in the L2 distance. Why is it row wise vs column wise? #100 (comment)

I might have found the bug related to the row issue in the distance matrix, but it would need some additional testing. Do you think it would be possible to run the fix on your data to check if everything works as expected? You will need to change your branch to devel in Flexutils-Toolkit to get the fix.

Some documentation on what gets written to disk inside the tmp directory (and what can be re-used for the same submission or gt maps, like the hd5 model, or reference/target .npy data)

This sounds good! Do you mind if I open an issue to keep track of this enhancement? I will work on it and make a PR as soon as the documentation is ready.

Finally, if I have a large machine with X nodes with 128 cores per node, what value of thr in the config do you suggest I use, and how much memory do you suggest I use?

Right now, the number of threads to be used should not be very memory-hungry, as they will only be used to generate the projections needed by Zernike3Deep in parallel. In the future, I was thinking of parallelizing the Zernike3Deep computations so we can have several "threads" on the GPU, which might lead to a significant increase in performance with a larger memory consumption. Probably, when this is implemented, it will make sense to have two different thr parameters to control the two parallelizations and isolate possible memory issues better.

I would say that the maximum number of cores the projection step will use will depend on the number of projections to generate. Since the projection is generated in parallel, when the 'numProjections' parameter is set to 80, the maximum number of threads the program can use would be 80 as well. In your case, you should get similar performances if thr is set to 80 or 128. To save resources in the future, we could modify the call so that the number of threads to be used is min(thr, numProjections).

I requested these resources

#!/bin/bash
#SBATCH --job-name=zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=ccb
#SBATCH -C rome
#SBATCH --time=99:00:00

And for the 80 x 80 map to map distance computation, and after the job was complete these are the post mortem stats:

> seff 4235706
Job ID: 4235706
Cluster: rusty
User/Group: gwoollard/gwoollard
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 128
CPU Utilized: 2-02:57:21
CPU Efficiency: 20.90% of 10-03:50:24 core-walltime
Job Wall-clock time: 01:54:18
Memory Utilized: 21.01 GB
Memory Efficiency: 2.10% of 1000.00 GB

In general the computational will be over 3861 x 80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants