backgroundWorker keeps dying at epoch 0 #56

AndrewForresterGit · 2024-08-06T15:31:11Z

I keep getting this same error at epoch 1. I've tried debugging and the only thing I believe to cause the error is that dist.is_initialized() returns False. Could this be the cause. If so, how would I fix this and if not what else could be the cause?

2024-08-05 11:23:25.063796: unpacking dataset...
2024-08-05 11:23:35.208095: unpacking done...
2024-08-05 11:23:35.208965: do_dummy_2d_data_aug: False
2024-08-05 11:23:35.220142: Unable to plot network architecture:
2024-08-05 11:23:35.220288: No module named 'hiddenlayer'
2024-08-05 11:23:35.227630:
2024-08-05 11:23:35.227772: Epoch 0
2024-08-05 11:23:35.227940: Current learning rate: 0.01
using pin_memory on device 0
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
  File "/home/anfor306/venvs/projet-Umamba/bin/nnUNetv2_train", line 33, in <module>
    sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
  File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/run/run_training.py", line 268, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/run/run_training.py", line 204, in run_training
    nnunet_trainer.run_training()
  File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1258, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
  File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__
    item = self.__get_next_item()
  File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

The text was updated successfully, but these errors were encountered:

AyacodeYa · 2024-08-14T08:55:05Z

Hi, I have solved a similar problem to yours, but my problem was about the version of causal_conv1d. So, I solved it with the following command.
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
git checkout v1.1.1.post2
CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .
nnUNetv2_train your_dataset_ID 2d all -tr nnUNetTrainerUMambaEnc -num_gpus 1

Younger330 · 2024-11-01T18:49:55Z

git checkout v1.1.1.post2 for runing git checkout v1.1.1.post2, do you have any suggestions?

hello, I return the error
error: pathspec 'v1.1.1.post2' did not match any file(s) known to git

AndrewForresterGit · 2024-11-01T19:50:14Z

My problem was solved by changing a100 to v100 GPUs.

AndrewForresterGit changed the title ~~backgroundWorker keeps dying at epoch 1~~ backgroundWorker keeps dying at epoch 0 Aug 6, 2024

AyacodeYa mentioned this issue Aug 14, 2024

RuntimeError: One or more background workers are no longer alive. #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backgroundWorker keeps dying at epoch 0 #56

backgroundWorker keeps dying at epoch 0 #56

AndrewForresterGit commented Aug 6, 2024

AyacodeYa commented Aug 14, 2024

Younger330 commented Nov 1, 2024

AndrewForresterGit commented Nov 1, 2024

backgroundWorker keeps dying at epoch 0 #56

backgroundWorker keeps dying at epoch 0 #56

Comments

AndrewForresterGit commented Aug 6, 2024

AyacodeYa commented Aug 14, 2024

Younger330 commented Nov 1, 2024

AndrewForresterGit commented Nov 1, 2024