cuDNN Error: CUDNN_STATUS_BAD_PARAM while training #2367

divid3d · 2020-12-13T20:33:40Z

While training in moment that mAP calculation should start I have assertion error:
(next mAP calculation at 1900 iterations) 1900: 185.299484, 158.218445 avg loss, 0.001000 rate, 6.550167 seconds, 121600 images, 16.111683 hours left 4cuDNN Error: CUDNN_STATUS_BAD_PARAM: File exists darknet: ./src/utils.c:331: error: Assertion 0 failed.

The text was updated successfully, but these errors were encountered:

Lerseb · 2020-12-20T02:40:52Z

I have the same issue.
I noticed it happens when I have 2 classes.
Right now I am training without the -map flag and it is training fine.
It is just not validating against your 20% images in your test.txt
I don't know if it is linked to darknet or Cudnn.

niemiaszek · 2020-12-20T04:51:23Z

EDIT: Didn't notice it's pjreedie repo. Opening issue on AlexeyAB fork
got the same issue with 1080 ti, built with ZED 3.2, opencv 4.5 with CUDA. Training only for 1 class.
Driver Version: 455.32.00, CUDA Version: 11.1, cuDNN 8.0.4
Issue didn't happend on cuda 10.1, ubuntu 18 on same setup without zed
/darknet$ git log
commit 14b196d

Crash on map:
(next mAP calculation at 1000 iterations)
1000: 0.831169, 0.674900 avg loss, 0.001000 rate, 4.804868 seconds, 64000 images, 3.878937 hours left
Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 89.65 MB
CUDA allocate done!

calculation mAP (mean average precision)...
Detection layer: 139 - type = 28
Detection layer: 150 - type = 28
Detection layer: 161 - type = 28
4
cuDNN status Error in: file: /home/patryk/darknet/src/convolutional_kernels.cu : () : line: 533 : build time: Nov 7 2020 - 09:24:25

cuDNN Error: CUDNN_STATUS_BAD_PARAM
cuDNN Error: CUDNN_STATUS_BAD_PARAM: Resource temporarily unavailable
./szkolenie.sh: line 2: 36330 Segmentation fault (core dumped) ./darknet detector train dataset/obj.data dataset/op14.cfg yolov4.conv.137 -map

Remco-Terwal-Bose · 2022-04-08T21:08:13Z

Was this ever understood and/or addressed?

command:
darknet detector train yolov3-ambulance-setup.data yolov3-ambulance-train.cfg ./darknet53.conv.74 -dont_show -map 2> train_log.txt

GPU: Nvidia 2028 SUPER, OPENCV 4.5.5 with CUDA, training for 1 class
Driver version 512.15
CUDA version: 11.6
cuDNN version 8.3

(next mAP calculation at 100 iterations)
100: 0.637962, 1.110619 avg loss, 0.001000 rate, 4.220000 seconds, 6400 images, 1.343723 hours left
Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 154.24 MB
CUDA allocate done!

calculation mAP (mean average precision)...
Detection layer: 82 - type = 28
Detection layer: 94 - type = 28
Detection layer: 106 - type = 28

cuDNN status Error in: file: C:\darknet\src\convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 555 : build time: Apr 8 2022 - 13:14:27

cuDNN Error: CUDNN_STATUS_BAD_PARAM

niemiaszek · 2022-04-10T09:17:42Z

@Remco-Terwal-Bose see mentioned issue above or here

Rizama03 · 2023-10-28T12:27:14Z

This is my own error message, just when it's about to calculate the mAP, everything crashes:
(next mAP calculation at 1000 iterations) H1000/10000: loss=14.6 hours left=4.9�
1000: 14.551660, 16.706905 avg loss, 0.002610 rate, 1.315968 seconds, 64000 images, 4.874600 hours left
4Darknet error location: ./src/convolutional_kernels.cu, forward_convolutional_layer_gpu(), line #541
cuDNN Error: CUDNN_STATUS_BAD_PARAM: Succes

PriyankaIITI · 2024-03-16T05:14:06Z

I also encountered the error when use -map in command,
"cuDNN status Error in: file: ./src/convolutional_kernels.cu function: forward_convolutional_layer_gpu() line: 541 cuDNN Error: CUDNN_STATUS_BAD_PARAM Darknet error location: ./src/convolutional_kernels.cu, forward_convolutional_layer_gpu(), line #541"

There are following solutions I found :

Downgrade CUDA link reply7153
use subdivisions=64 link
other comment
export CUDA_VISIBLE_DEVICES=0 link
GPU architecture link

But still, I was able to run with the following:
GPU=1
CUDNN=0
CUDNN_HALF=0
OPENCV=1

I don't know, Is it a correct way to do so?

@AlexeyAB

niemiaszek mentioned this issue Dec 20, 2020

cuDNN Error while training -map AlexeyAB/darknet#7153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDNN Error: CUDNN_STATUS_BAD_PARAM while training #2367

cuDNN Error: CUDNN_STATUS_BAD_PARAM while training #2367

divid3d commented Dec 13, 2020

Lerseb commented Dec 20, 2020

niemiaszek commented Dec 20, 2020 •

edited

Loading

Remco-Terwal-Bose commented Apr 8, 2022

niemiaszek commented Apr 10, 2022

Rizama03 commented Oct 28, 2023

PriyankaIITI commented Mar 16, 2024

cuDNN Error: CUDNN_STATUS_BAD_PARAM while training #2367

cuDNN Error: CUDNN_STATUS_BAD_PARAM while training #2367

Comments

divid3d commented Dec 13, 2020

Lerseb commented Dec 20, 2020

niemiaszek commented Dec 20, 2020 • edited Loading

Remco-Terwal-Bose commented Apr 8, 2022

niemiaszek commented Apr 10, 2022

Rizama03 commented Oct 28, 2023

PriyankaIITI commented Mar 16, 2024

niemiaszek commented Dec 20, 2020 •

edited

Loading