Image size differece between training and testing #76

AtsukiOsanai · 2019-12-13T15:57:33Z

Thank you for providing nice repostiroy.
I'd like to ask about image size during training and testing on cityscapes dataset.
For cityscapes training, you use (512, 1024) size of cropped image.
In the single scale testing, however, inference has been done for whole image size, i.e. (1024, 2048).
I found some works employ sliding inference with cropsize during training.
(https://github.com/junfu1115/DANet/blob/master/encoding/models/base.py#L78-L179)

So, my questions are;

why do you use whole image as is ?
have you checked influence of them ?
Thanks.

sunke123 · 2019-12-16T02:31:22Z

If adopting the whole images to train the network, the batchsize is too small, e.g. 1 images/gpu, which has a negative influence on BN and makes the training unstable.

I think that DANet use the sliding inference in the process of testing not training.
We enter the whole image into the network for single-scale testing for fairly comparing to other methods.
You can try multi-scale testing if you want to use the sliding inference.
Sliding inference surely brings about gains but also results in speed loss.

AtsukiOsanai · 2019-12-16T13:27:48Z

Thank you for answering my question.

Maybe I could not express what I want to say exactly.
I understand the importance of batchsize in semantic segmenatation
and keep in mind to set the batchsize more than 8 when training.

You are correct about the pipeline of DANet.
As well as DANet did, I train the FCN with random (769, 769) cropping,
and try to measure the scores for both sliding inference with (769, 769) patch
and whole image inference with (1024, 2048) size.
When I apply the sliding method, I get 70% mIoU on cityscapes dataset.
(The training has been done only for 40 epochs, which leads to worse score.)
However, whole image inference achieves only 65% mIoU.
I can't understand why my whole image inference does not predict the output successfully.

Do you have any knowledge around here ?
Does it depend on the network architecture ?
I really hope to be able to reduce my testing time.
Thanks.

sunke123 · 2019-12-17T10:39:00Z

Sliding method is often used in image processing, such as image de-blocking and deblur, and leads to better performance. It crops the images into many overlapped patches, which also increasing the inference timecost.

HRNet with multi-scale testing (including sliding process) can improve 1%~1.5% on mIoU. I have no idea why there is 5% performance gap. It's too large. I'm not sure whether the architecture results in this problem. You can try other network, HRNet or PSPNet.

If you want to reduce testing time, you can concatenate the cropped image patches at the batchisze axis.

AtsukiOsanai · 2019-12-18T20:45:50Z

Accumulating images to batchsize axis is great idea for me.
This should be used only for the same size input image, such as cityscapes, right ?

I compared my code with your one and found a difference.
I apply torch.nn.functional.interpolate with align_corners=True to control the image size,
however, you employ cv2.resize.
Maybe align_corners=False is equivalent to cv2.resize, so I will check whole image inference again with align_corners=False.

After sharing my update, I will close this issue.
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image size differece between training and testing #76

Image size differece between training and testing #76

AtsukiOsanai commented Dec 13, 2019

sunke123 commented Dec 16, 2019

AtsukiOsanai commented Dec 16, 2019

sunke123 commented Dec 17, 2019

AtsukiOsanai commented Dec 18, 2019

Image size differece between training and testing #76

Image size differece between training and testing #76

Comments

AtsukiOsanai commented Dec 13, 2019

sunke123 commented Dec 16, 2019

AtsukiOsanai commented Dec 16, 2019

sunke123 commented Dec 17, 2019

AtsukiOsanai commented Dec 18, 2019