You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I have chanced upon a very interesting bug. So I am doing some ML training on CSD3 and due to the size of these models and the datasets I end up not having sufficient memory on the GPU and the run crashes, however the run instance is not stopping and it is being shown as running on simvue and will keep maintaining the run on the gpu within the cluster
I think this is a Python process using GPUs (i.e. the client isn't monitoring an external application).