Skip to content

Client gets stuck if GPU runs out of memory #507

@alahiff

Description

@alahiff

Message from Vignesh:

I think I have chanced upon a very interesting bug. So I am doing some ML training on CSD3 and due to the size of these models and the datasets I end up not having sufficient memory on the GPU and the run crashes, however the run instance is not stopping and it is being shown as running on simvue and will keep maintaining the run on the gpu within the cluster

I think this is a Python process using GPUs (i.e. the client isn't monitoring an external application).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions