zernike3d: performance, debug, and further documentation #104

geoffwoollard · 2024-12-19T18:00:24Z

@DavidHerreros I did some tests on a set of 80 smoothed gt maps against themselves. Results looked good so I merged in the zernike3d map to map distance code into dev.

However, I did notice one issue in the results: if you look in the first row, and last rows, the trend is different. Should I try to increase numProjections, or perhaps some other parameter? What might be going on here. Might it be an issue with the symmetry of the motion? The maps aren't out of order since this trend does not exist in the L2 distance. Why is it row wise vs column wise?
#100 (comment)

Some documentation on what gets written to disk inside the tmp directory (and what can be re-used for the same submission or gt maps, like the hd5 model, or reference/target .npy data)

Finally, if I have a large machine with X nodes with 128 cores per node, what value of thr in the config do you suggest I use, and how much memory do you suggest I use?

I requested these resources

#!/bin/bash
#SBATCH --job-name=zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=ccb
#SBATCH -C rome
#SBATCH --time=99:00:00

And for the 80 x 80 map to map distance computation, and after the job was complete these are the post mortem stats:

> seff 4235706
Job ID: 4235706
Cluster: rusty
User/Group: gwoollard/gwoollard
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 128
CPU Utilized: 2-02:57:21
CPU Efficiency: 20.90% of 10-03:50:24 core-walltime
Job Wall-clock time: 01:54:18
Memory Utilized: 21.01 GB
Memory Efficiency: 2.10% of 1000.00 GB

In general the computational will be over 3861 x 80

The text was updated successfully, but these errors were encountered:

DavidHerreros · 2024-12-20T07:08:14Z

Hi @geoffwoollard,

@DavidHerreros I did some tests on a set of 80 smoothed gt maps against themselves. Results looked good so I merged in the zernike3d map to map distance code into dev.

However, I did notice one issue in the results: if you look in the first row, and last rows, the trend is different. Should I try to increase numProjections, or perhaps some other parameter? What might be going on here. Might it be an issue with the symmetry of the motion? The maps aren't out of order since this trend does not exist in the L2 distance. Why is it row wise vs column wise? #100 (comment)

I might have found the bug related to the row issue in the distance matrix, but it would need some additional testing. Do you think it would be possible to run the fix on your data to check if everything works as expected? You will need to change your branch to devel in Flexutils-Toolkit to get the fix.

Some documentation on what gets written to disk inside the tmp directory (and what can be re-used for the same submission or gt maps, like the hd5 model, or reference/target .npy data)

This sounds good! Do you mind if I open an issue to keep track of this enhancement? I will work on it and make a PR as soon as the documentation is ready.

Finally, if I have a large machine with X nodes with 128 cores per node, what value of thr in the config do you suggest I use, and how much memory do you suggest I use?

Right now, the number of threads to be used should not be very memory-hungry, as they will only be used to generate the projections needed by Zernike3Deep in parallel. In the future, I was thinking of parallelizing the Zernike3Deep computations so we can have several "threads" on the GPU, which might lead to a significant increase in performance with a larger memory consumption. Probably, when this is implemented, it will make sense to have two different thr parameters to control the two parallelizations and isolate possible memory issues better.

I would say that the maximum number of cores the projection step will use will depend on the number of projections to generate. Since the projection is generated in parallel, when the 'numProjections' parameter is set to 80, the maximum number of threads the program can use would be 80 as well. In your case, you should get similar performances if thr is set to 80 or 128. To save resources in the future, we could modify the call so that the number of threads to be used is min(thr, numProjections).

I requested these resources

#!/bin/bash
#SBATCH --job-name=zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=ccb
#SBATCH -C rome
#SBATCH --time=99:00:00

And for the 80 x 80 map to map distance computation, and after the job was complete these are the post mortem stats:

> seff 4235706
Job ID: 4235706
Cluster: rusty
User/Group: gwoollard/gwoollard
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 128
CPU Utilized: 2-02:57:21
CPU Efficiency: 20.90% of 10-03:50:24 core-walltime
Job Wall-clock time: 01:54:18
Memory Utilized: 21.01 GB
Memory Efficiency: 2.10% of 1000.00 GB

In general the computational will be over 3861 x 80

geoffwoollard added the enhancement New feature or request label Dec 19, 2024

geoffwoollard assigned DavidHerreros Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zernike3d: performance, debug, and further documentation #104

zernike3d: performance, debug, and further documentation #104

geoffwoollard commented Dec 19, 2024

DavidHerreros commented Dec 20, 2024

zernike3d: performance, debug, and further documentation #104

zernike3d: performance, debug, and further documentation #104

Comments

geoffwoollard commented Dec 19, 2024

DavidHerreros commented Dec 20, 2024