-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: One or more background workers are no longer alive. #54
Comments
我也遇到同样的问题,请问你是否已经解决? |
我也遇到同样的问题,请问你是否已经解决?CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 11 3d_fullres 0 ############################ Using device: cuda:0 ####################################################################### 2024-07-28 01:12:55.969214: do_dummy_2d_data_aug: True |
我已经解决问题,我想应该很多问题最终都归于“One or more background workers...",所以你尝试追踪到上面的traceback. 我的问题是重新安装回那些需要的package. 不如你尝试3.10版本吧(因为我看见作者推荐3.10版本) |
Hi everyone, maybe this can help you. #56 (comment) |
Hi all, when I start training in the Windows environment, I get this error information. Eventhough I have tried the solution from MIC-DKFZ/nnUNet#1343 from the original nnUNet by setting the environment
OMP_NUM_THREADS=1
, It still not be solved.Thank you in advance for your help!
`This is the configuration used by this training:
Configuration name: 2d
{'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 14, 'patch_size': [512, 448], 'median_image_size_in_voxels': [512.0, 512.0], 'spacing': [0.7958984971046448, 0.7958984971046448], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 1, 1, 1], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 1, 1], 'num_pool_per_axis': [6, 6], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True}
These are the global plan.json settings:
{'dataset_name': 'Dataset701_AbdomenCT', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [2.5, 0.7958984971046448, 0.7958984971046448], 'original_median_shape_after_transp': [97, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 3071.0, 'mean': 97.29691314697266, 'median': 118.0, 'min': -1024.0, 'percentile_00_5': -958.0, 'percentile_99_5': 270.0, 'std': 137.85003662109375}}}
2024-07-24 17:20:43.049483: unpacking dataset...
2024-07-24 17:20:43.598747: unpacking done...
2024-07-24 17:20:43.599747: do_dummy_2d_data_aug: False
2024-07-24 17:20:43.666747: Unable to plot network architecture:
2024-07-24 17:20:43.666747: No module named 'hiddenlayer'
2024-07-24 17:20:43.759725:
2024-07-24 17:20:43.760716: Epoch 0
2024-07-24 17:20:43.761715: Current learning rate: 0.01
using pin_memory on device 0
Traceback (most recent call last):
File "\?\C:\ProgramData\Anaconda3\envs\umamba\Scripts\nnUNetv2_train-script.py", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "f:\u-mamba-main\umamba\nnunetv2\run\run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "f:\u-mamba-main\umamba\nnunetv2\run\run_training.py", line 204, in run_training
nnunet_trainer.run_training()
File "f:\u-mamba-main\umamba\nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py", line 1258, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "f:\u-mamba-main\umamba\nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py", line 900, in train_step
output = self.network(data)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "f:\u-mamba-main\umamba\nnunetv2\nets\UMambaBot_2d.py", line 432, in forward
skips[-1] = self.mamba_layer(skips[-1])
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "f:\u-mamba-main\umamba\nnunetv2\nets\UMambaBot_2d.py", line 61, in forward
x_mamba = self.mamba(x_norm)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\mamba_ssm\modules\mamba_simple.py", line 146, in forward
out = mamba_inner_fn(
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\mamba_ssm\ops\selective_scan_interface.py", line 317, in mamba_inner_fn
return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\autograd\function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 113, in decorate_fwd
return fwd(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\mamba_ssm\ops\selective_scan_interface.py", line 187, in forward
conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(
TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor
Invoked with: tensor([[[-0.3531, -0.3256, -0.5120, ..., -0.3845, -0.3780, -0.2731],
[-0.1226, 0.0515, 0.0443, ..., -0.0484, -0.0954, 0.2243],
[ 0.2591, 0.4765, 0.4899, ..., 0.2762, 0.2085, 0.1601],
...,
[-0.4706, 0.0122, -0.0670, ..., -0.6855, -1.0694, -0.7547],
[ 0.2710, 0.6020, 0.5813, ..., 0.0339, 0.0822, 0.5069],
[-0.0817, 0.1549, 0.1879, ..., -0.1216, -0.4358, -0.3873]],
tensor([-0.0066, -0.3897, 0.1920, ..., 0.1256, -0.0983, -0.4903],
device='cuda:0', requires_grad=True), None, None, None, True
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\umamba\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\envs\umamba\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "C:\ProgramData\Anaconda3\envs\umamba\lib\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message`
The text was updated successfully, but these errors were encountered: