You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I keep getting this same error at epoch 1. I've tried debugging and the only thing I believe to cause the error is that dist.is_initialized() returns False. Could this be the cause. If so, how would I fix this and if not what else could be the cause?
2024-08-05 11:23:25.063796: unpacking dataset...
2024-08-05 11:23:35.208095: unpacking done...
2024-08-05 11:23:35.208965: do_dummy_2d_data_aug: False
2024-08-05 11:23:35.220142: Unable to plot network architecture:
2024-08-05 11:23:35.220288: No module named 'hiddenlayer'
2024-08-05 11:23:35.227630:
2024-08-05 11:23:35.227772: Epoch 0
2024-08-05 11:23:35.227940: Current learning rate: 0.01
using pin_memory on device 0
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
File "/home/anfor306/venvs/projet-Umamba/bin/nnUNetv2_train", line 33, in <module>
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/run/run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/run/run_training.py", line 204, in run_training
nnunet_trainer.run_training()
File "/lustre06/project/6092638/anfor306/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1258, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__
item = self.__get_next_item()
File "/home/anfor306/venvs/projet-Umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
The text was updated successfully, but these errors were encountered:
AndrewForresterGit
changed the title
backgroundWorker keeps dying at epoch 1
backgroundWorker keeps dying at epoch 0
Aug 6, 2024
Hi, I have solved a similar problem to yours, but my problem was about the version of causal_conv1d. So, I solved it with the following command.
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
git checkout v1.1.1.post2
CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .
nnUNetv2_train your_dataset_ID 2d all -tr nnUNetTrainerUMambaEnc -num_gpus 1
I keep getting this same error at epoch 1. I've tried debugging and the only thing I believe to cause the error is that dist.is_initialized() returns False. Could this be the cause. If so, how would I fix this and if not what else could be the cause?
The text was updated successfully, but these errors were encountered: