Open
Description
CPU resource utilization metrics for MPI jobs on SLURM systems appear to be inaccurate, suspect since psutil
is not identifying the correct child processes of the main job:
executor _processes dict_values([<Popen: returncode: None args: ['mpiexec', '-n', '20', 'fds', 'office_atria_...>])
child psutil.Process(pid=2546114, name='srun', status='sleeping', started='14:29:20')
child psutil.Process(pid=2546115, name='srun', status='sleeping', started='14:29:20')
child psutil.Process(pid=2546115, name='srun', status='sleeping', started='14:29:20')
Would expect to see a series of FDS child processes for each of the 20 cores it is running across, instead just see srun
...