[BUG] When I used the ImageNet Training Script to train my model, an unknown error occurred. #2367
              
                Unanswered
              
          
                  
                    
                      stone-cloud
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 1 reply
-
| @stone-cloud the model is probably returning a tuple/list instead of just a single prediction tensor, you need to modify the train script to work with models that return list, tuples, dicts etc | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
When I used the ImageNet Training Script to train my model, the following error occurred. However, when I trained other models (segmentation models), my model file worked fine. I have been troubled by this issue for a long time and haven't found a detailed solution.
To Reproduce
Steps to reproduce the behavior:
My process:
Expected behavior
This is strange, I couldn't find a solution to the same problem. I suspect that the distributed training has sliced the data, but I don't understand why the output results haven't been merged.
Screenshots

Desktop (please complete the following information):
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Beta Was this translation helpful? Give feedback.
All reactions