The performance gap #59

MandyMo · 2019-10-09T02:30:43Z

I have dowonload the pretrained model 'hrnet_w48_cityscapes_cls19_1024x2048_ohem_trainvalset.pth' and evaluate it on cityscapes test dataset, the results are as bellow,
https://www.cityscapes-dataset.com/anonymous-results/?id=500275d541b67dd462afa9235b0fbe188e2fdd304e26d8f239a52a8bbfa2fb0d
while the mIoU (76.6%) is largely behind the proposed results (81.6%).

I don't know why, is there something wrong?

sunke123 · 2019-10-09T08:20:56Z

Because this model is trained based on pytorch-0.4.1, you should also run the test based on pytorch-0.4.1.
The BN is different between pytorch-0.4.1 and pytorch-1.1, which results in worse performance.

MandyMo · 2019-10-09T08:46:38Z

Thank you.

MandyMo · 2019-10-09T16:58:08Z

I have tested the model with pytorch0.4.1 and pytorch 1.1.
I have random drawn several images as bellow, the left image presents the result for pytorch 0.4.1, and the right image presents the result for pytorch1.1.

It's strange that those the output from pytorch0.4.1 is identical with the counterpart produced by pythorch1.1.

Any details that I have missed ?

sunke123 · 2019-10-10T03:25:18Z

Could you provide your testing settings and test the model on the val firstly?

MandyMo · 2019-10-11T07:46:11Z

I didn't test the model on the val firstly, and the test code I used as bellow:

sunke123 · 2019-10-13T03:42:52Z

Yes.
Due to the limitation of submission, you can test the model on the val. Then, we can check the problem together.

sunke123 · 2019-10-13T04:56:20Z

@MandyMo
I have tested the model based on Pyotrch-0.4.1. You should get MIoU 91.97 on val.

MandyMo · 2019-10-14T03:38:27Z

Thank you, I will evaluate it on val part.

MandyMo · 2019-10-21T09:36:26Z

I am sorry to trouble you again! I have evaluated the performance of the model on val set with Pytorch-0.4.1 (windows), while I can't reach the 91.97 iou.
So I pick serveral image from the val set, can you offer me your evaluated results on the following five images.

frankfurt_000000_000294_gtFine_labelIds.png

frankfurt_000000_000576_gtFine_labelIds.png

frankfurt_000000_001016_gtFine_labelIds.png

frankfurt_000000_001236_gtFine_labelIds.png

frankfurt_000000_001751_gtFine_labelIds.png

huangfuts · 2019-11-05T05:51:05Z

@MandyMo
hello, I'm tring the HRNet code as with you. But when running "python tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml", I encounter the problem "ninja: build stopped: subcommand failed.". How about you? Are you encounter the same problem? If No, can you tell me you some suggestions? my pytorch is 1.1.0 and cuda 10.0.
In addition, how do you install the packet "ninja",? the way:pip install ninja? if OK, I really hope to contact with you. my Email:1072319209qq.com. Thanks a lot!

huangfuts · 2019-11-06T08:25:12Z

@MandyMo
Thanks very much for your help! the problem, "ninja: build stopped: subcommand failed.", has been sovled.
Wish you all the best~~~

welleast · 2019-12-04T08:44:15Z

@MandyMo: have you re-produced the results?

MandyMo · 2019-12-04T09:57:18Z

@MandyMo: have you re-produced the results?

No, I didn't reproduce the proposed results.

welleast · 2019-12-11T03:21:43Z

@MandyMo did you use the same settings? what are your results?

ajithvcoder · 2021-01-24T05:02:19Z

@huangfuts could you tell how you solved it my mail id is [email protected]

StuvX · 2021-07-01T02:55:29Z

I suspect this is an error with how the model is loading, you need to explicitly map the model weights to your device, otherwise whatever weights are not mapped will be randomly initiated.

See here for further info: https://pytorch.org/tutorials/intermediate/ddp_tutorial.html (this is my guess for now, I am working through these issues)

MandyMo closed this as completed Oct 9, 2019

MandyMo reopened this Oct 9, 2019

sngyo mentioned this issue Dec 27, 2019

HR-Netの精度が悪い axinc-ai/ailia-models#22

Closed

YijianLiu mentioned this issue Jan 2, 2020

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error' #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance gap #59

The performance gap #59

MandyMo commented Oct 9, 2019

sunke123 commented Oct 9, 2019

MandyMo commented Oct 9, 2019

MandyMo commented Oct 9, 2019

sunke123 commented Oct 10, 2019

MandyMo commented Oct 11, 2019 •

edited

Loading

sunke123 commented Oct 13, 2019

sunke123 commented Oct 13, 2019

MandyMo commented Oct 14, 2019

MandyMo commented Oct 21, 2019 •

edited

Loading

huangfuts commented Nov 5, 2019

huangfuts commented Nov 6, 2019

welleast commented Dec 4, 2019

MandyMo commented Dec 4, 2019

welleast commented Dec 11, 2019

ajithvcoder commented Jan 24, 2021

StuvX commented Jul 1, 2021

The performance gap #59

The performance gap #59

Comments

MandyMo commented Oct 9, 2019

sunke123 commented Oct 9, 2019

MandyMo commented Oct 9, 2019

MandyMo commented Oct 9, 2019

sunke123 commented Oct 10, 2019

MandyMo commented Oct 11, 2019 • edited Loading

sunke123 commented Oct 13, 2019

sunke123 commented Oct 13, 2019

MandyMo commented Oct 14, 2019

MandyMo commented Oct 21, 2019 • edited Loading

huangfuts commented Nov 5, 2019

huangfuts commented Nov 6, 2019

welleast commented Dec 4, 2019

MandyMo commented Dec 4, 2019

welleast commented Dec 11, 2019

ajithvcoder commented Jan 24, 2021

StuvX commented Jul 1, 2021

MandyMo commented Oct 11, 2019 •

edited

Loading

MandyMo commented Oct 21, 2019 •

edited

Loading