[Bug]: 推理和转换ONNX时报错 #24

m00nLi · 2024-09-26T09:32:28Z

Bug

训练正常，但是使用inference.py推理或者转换ONNX时报错：

Traceback (most recent call last):
  File "/home/code/Relation-DETR/inference.py", line 165, in <module>
    inference()
  File "/home/user/code/Relation-DETR/inference.py", line 99, in inference
    model = Config(args.model_config).model.eval()
  File "/home/code/Relation-DETR/util/lazy_load.py", line 35, in __init__
    mod = importlib.import_module(module_name)
  File "/home/user/miniconda3/envs/detr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/code/Relation-DETR/configs/train_config.py", line 45, in <module>
    optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999))
TypeError: AdamW.__init__() missing 1 required positional argument: 'params'

环境信息

-------------------------------  ---------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
numpy                            1.24.4
PyTorch                          2.4.0+cu121 @/home/user/miniconda3/envs/detr/lib/python3.10/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0,1,2,3,4,5,6,7              NVIDIA H800 (arch=9.0)
Driver version                   560.35.03
CUDA_HOME                        /usr/local/cuda-12.4
Pillow                           10.4.0
torchvision                      0.19.0+cu121 @/home/user/miniconda3/envs/detr/lib/python3.10/site-packages/torchvision
torchvision arch flags           5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore                           0.1.5.post20221221
iopath                           0.1.10
cv2                              4.10.0
-------------------------------  ---------------------------------------------------------------------------------------

补充信息

No response

The text was updated successfully, but these errors were encountered:

xiuqhou · 2024-09-26T09:55:08Z

model_config应该指定的是模型配置而不是训练配置，例如：
configs/relation_detr/relation_detr_resnet50_800_1333.py

m00nLi · 2024-09-26T09:58:26Z

model_config应该指定的是模型配置而不是训练配置，例如： configs/relation_detr/relation_detr_resnet50_800_1333.py

好的，换成模型配置没问题了

m00nLi · 2024-09-27T03:40:51Z

导出ONNX时报错：

torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::_upsample_bilinear2d_aa' to ONNX opset version 17 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

ONNX版本

onnx                              1.16.2
onnxruntime                       1.16.0
onnxsim                           0.4.36
rapidocr-onnxruntime              1.3.24

xiuqhou · 2024-09-27T04:24:00Z

Hi @m00nLi 这是因为pytorch不支持抗锯齿resize导出ONNX，
请把models/detectors/base_detector.py第75行的antialias设置为False再导出

SEU-ZWW · 2024-09-27T07:33:44Z

Hi @m00nLi 这是因为pytorch不支持抗锯齿resize导出ONNX，请把models/detectors/base_detector.py第75行的antialias设置为False再导出

设置为False后还是会报同样的错，opset 11也不行

xiuqhou · 2024-09-28T11:44:37Z

请看一下报的错误是aten::_upsample_bilinear2d_aa还是aten::_upsample_bilinear2d。带_aa版本的算子目前pytorch都不支持，而设置为False使用的是不带_aa版本的ONNX算子，这个算子可能在opset 11不支持，但在opset 17是支持的，请试试opset 17可以吗？

SEU-ZWW · 2024-09-29T02:06:11Z

请看一下报的错误是aten::_upsample_bilinear2d_aa还是aten::_upsample_bilinear2d。带_aa版本的算子目前pytorch都不支持，而设置为False使用的是不带_aa版本的ONNX算子，这个算子可能在opset 11不支持，但在opset 17是支持的，请试试opset 17可以吗？
报错的是带_aa的算子，那请问作者目前有什么办法可以把pt转成onnx么，麻烦给个解决方案吧，谢谢

xiuqhou · 2024-09-30T03:08:00Z

Hi @SEU-ZWW 如果报错的是带_aa的算子，说明antialias还是True。麻烦检查一下参数设置，可以调试看看为什么antialias设置为False没有生效。

m00nLi · 2024-10-22T04:08:41Z

我的转ONNX成功了，antialias=False, opset=17， onnxruntime==1.19.2.
请问下用onnx检测的时候需要对图片进行什么预处理吗？我看了下tools/pytorch2onnx.py 的ONNXDetector,
我这样预处理输入图片是有检测结果的：

cv_img = cv2.imread(image_path)
cv_img = cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB)
cv_img = cv_img.astype(np.float32)
cv_img /= 255.0
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225)
# cv_img = (cv_img - mean) / std  # 加上这行没有检测结果
cv_img = cv_img.transpose(2, 0, 1)  # h, w, c => c, h, w
input_tensor = torch.from_numpy(cv_img)
result = onnx_detector.__call__([input_tensor])

但我看了训练时的数据处理，不是还有一步减均值除方差吗？但是加上的话就检测不到目标了，所以正确的预处理该怎么写？

xiuqhou · 2024-10-22T05:02:50Z

为了方便使用，我把推理时候的归一化作为nn.Module集成到模型中了，所以推理不需要再做额外的归一化，只需要读取chw(rgb通道)格式的图片，转成浮点数格式输入模型就可以了，您原本的预处理就是对的

m00nLi · 2024-10-22T05:24:24Z

为了方便使用，我把推理时候的归一化作为nn.Module集成到模型中了，所以推理不需要再做额外的归一化，只需要读取chw(rgb通道)格式的图片，转成浮点数格式输入模型就可以了，您原本的预处理就是对的

好的谢谢

m00nLi · 2024-10-24T03:09:39Z

为了方便使用，我把推理时候的归一化作为nn.Module集成到模型中了，所以推理不需要再做额外的归一化，只需要读取chw(rgb通道)格式的图片，转成浮点数格式输入模型就可以了，您原本的预处理就是对的

请问下导出ONNX时如何去掉这个缩放+归一化的算子？我已经在models/detectors/base_detector.py将self.eval_transform=None了，用Netron看应该是去掉那个算子了，但是图片的预处理还是像之前那样才有检测结果（自己做归一化没有检测结果）。

class BaseDetector(nn.Module):
    def __init__(self, min_size=None, max_size=None, size_divisible=32):
        ...
       # self.eval_transform = nn.Sequential(*eval_transform)
       self.eval_transform = None

另外，在我的个人数据集（约20万）上测试，ReleationDETR(FocalNet-L)的mAP50, mAP50-95这些的确比DINO(Swin-L)高出3个百分点，Recall也高3个点，但是Precision却低了7个点，意味着ReleationDETR模型偏向于查全率，但误检会很多，这个有什么改进思路吗？

xiuqhou · 2024-10-26T09:23:39Z

很奇怪，eval_transform只是对图片进行float+归一化操作，把它设置为None就不会再进行归一化了。您可以调试pytorch2onnx.py文件中的torch.onnx.export函数，看看在导出时是否真的把归一化去掉了，另外也要注意图片类型必须是float。

检测需要同时考虑分类和回归任务，所以主要是用mAP和AR来衡量性能。AP的定义是不同召回率水平下Precision的平均值，即PR曲线下方的面积，mAP是每个类别AP的平均值。所以我不太明白为什么Precision跟mAP的趋势不同，mAP比DINO高但Precision比DINO低，这是很奇怪的一件事，建议先可视化看看结果是否真的存在很多误检。如果存在很多误检的话，可以设置阈值来过滤掉一部分结果，或者对物体query进行优化，更精确地表示物体。

m00nLi added the bug Something isn't working label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: 推理和转换ONNX时报错 #24

[Bug]: 推理和转换ONNX时报错 #24

m00nLi commented Sep 26, 2024

xiuqhou commented Sep 26, 2024

m00nLi commented Sep 26, 2024

m00nLi commented Sep 27, 2024

xiuqhou commented Sep 27, 2024

SEU-ZWW commented Sep 27, 2024

xiuqhou commented Sep 28, 2024

SEU-ZWW commented Sep 29, 2024

xiuqhou commented Sep 30, 2024

m00nLi commented Oct 22, 2024

xiuqhou commented Oct 22, 2024

m00nLi commented Oct 22, 2024

m00nLi commented Oct 24, 2024

xiuqhou commented Oct 26, 2024

[Bug]: 推理和转换ONNX时报错 #24

[Bug]: 推理和转换ONNX时报错 #24

Comments

m00nLi commented Sep 26, 2024

Bug

环境信息

补充信息

xiuqhou commented Sep 26, 2024

m00nLi commented Sep 26, 2024

m00nLi commented Sep 27, 2024

xiuqhou commented Sep 27, 2024

SEU-ZWW commented Sep 27, 2024

xiuqhou commented Sep 28, 2024

SEU-ZWW commented Sep 29, 2024

xiuqhou commented Sep 30, 2024

m00nLi commented Oct 22, 2024

xiuqhou commented Oct 22, 2024

m00nLi commented Oct 22, 2024

m00nLi commented Oct 24, 2024

xiuqhou commented Oct 26, 2024