-
Notifications
You must be signed in to change notification settings - Fork 2.5k
returned non-zero exit status 1. RuntimeError: _th_or not supported on CUDAType for Bool #1172
Comments
I have the same problem. Please let me know if you can solve it. Thank you |
Me ,too.The same problem has confused me so many days |
Traceback (most recent call last): |
I've solved the problem and started training. To reinstall, you'll need to upgrade pytorch to 1.3, and the corresponding cuda and drivers will also need to be updated, but that's pretty quick. |
But I can't update cuda, what should I do if I must use cuda9 |
You can try to change the bug code in a python file (such as a few lines of code in loss.py), but the reality is that an operating system can have several cudas co-existing, and you can switch cudas by modifying the soft connection. |
Thanks, but i can't change the drive (for some reasong) and that's the newest version of cuda i can use. |
https://github.com/facebookresearch/maskrcnn-benchmark/issues/1172#issuecomment-562123695 |
I met the problem in torch_nightly==1.0 , and I fixed it in #1182 |
Have you solved this problem by not upgrading the version of pytorch? |
Just replace all |
@hongfz16 The solution method you said works to me, thank you very much. |
Traceback (most recent call last):
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 201, in
main()
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 94, in train
arguments,
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 72, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/coco.py", line 94, in getitem
target = target.clip_to_image(remove_empty=True)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 223, in clip_to_image
return self[keep]
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 208, in getitem
bbox.add_field(k, v[item])
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py", line 555, in getitem
selected_instances = self.instances.getitem(item)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py", line 464, in getitem
selected_polygons.append(self.polygons[i])
IndexError: list index out of range
index created!
Traceback (most recent call last):
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 201, in
main()
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 94, in train
arguments,
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 84, in do_train
loss_dict = model(images, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 43, in forward
proposals = self.loss_evaluator.subsample(proposals, targets)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 111, in subsample
img_sampled_inds = torch.nonzero(pos_inds_img | neg_inds_img).squeeze(1)
RuntimeError: _th_or not supported on CUDAType for Bool
Traceback (most recent call last):
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 201, in
main()
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 94, in train
arguments,
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 84, in do_train
loss_dict = model(images, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 43, in forward
proposals = self.loss_evaluator.subsample(proposals, targets)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 111, in subsample
img_sampled_inds = torch.nonzero(pos_inds_img | neg_inds_img).squeeze(1)
RuntimeError: _th_or not supported on CUDAType for Bool
Traceback (most recent call last):
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 201, in
main()
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py", line 94, in train
arguments,
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 84, in do_train
loss_dict = model(images, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 43, in forward
proposals = self.loss_evaluator.subsample(proposals, targets)
File "/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 111, in subsample
img_sampled_inds = torch.nonzero(pos_inds_img | neg_inds_img).squeeze(1)
RuntimeError: _th_or not supported on CUDAType for Bool
Traceback (most recent call last):
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/home/ai/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/ai/anaconda3/envs/maskrcnn_benchmark/bin/python', '-u', '/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/tools/train_net.py', '--local_rank=0', '--config-file', '/media/ai/fcb4c527-7fb0-41df-b900-75a0d4a92991/lq/maskrcnn-benchmark/configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml']' returned non-zero exit status 1.
Environment
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
Nvidia driver version: 390.25
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.15.0
[pip3] numpydoc==0.7.0
[pip3] torch==1.1.0
[pip3] torchfile==0.1.0
[pip3] torchnet==0.0.5.1
[pip3] torchvision==0.3.0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] pytorch 1.1.0 py3.7_cuda9.0.176_cudnn7.5.1_0 pytorch
[conda] pytorch-nightly 1.0.0.dev20190328 py3.7_cuda9.0.176_cudnn7.4.2_0 pytorch
[conda] torchvision 0.3.0 py37_cu9.0.176_1 pytorch
how to fix it?
The text was updated successfully, but these errors were encountered: