Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image size differece between training and testing #76

Open
AtsukiOsanai opened this issue Dec 13, 2019 · 4 comments
Open

Image size differece between training and testing #76

AtsukiOsanai opened this issue Dec 13, 2019 · 4 comments

Comments

@AtsukiOsanai
Copy link

Thank you for providing nice repostiroy.
I'd like to ask about image size during training and testing on cityscapes dataset.
For cityscapes training, you use (512, 1024) size of cropped image.
In the single scale testing, however, inference has been done for whole image size, i.e. (1024, 2048).
I found some works employ sliding inference with cropsize during training.
(https://github.com/junfu1115/DANet/blob/master/encoding/models/base.py#L78-L179)

So, my questions are;

  • why do you use whole image as is ?
  • have you checked influence of them ?
    Thanks.
@sunke123
Copy link
Member

If adopting the whole images to train the network, the batchsize is too small, e.g. 1 images/gpu, which has a negative influence on BN and makes the training unstable.

I think that DANet use the sliding inference in the process of testing not training.
We enter the whole image into the network for single-scale testing for fairly comparing to other methods.
You can try multi-scale testing if you want to use the sliding inference.
Sliding inference surely brings about gains but also results in speed loss.

@AtsukiOsanai
Copy link
Author

Thank you for answering my question.

Maybe I could not express what I want to say exactly.
I understand the importance of batchsize in semantic segmenatation
and keep in mind to set the batchsize more than 8 when training.

You are correct about the pipeline of DANet.
As well as DANet did, I train the FCN with random (769, 769) cropping,
and try to measure the scores for both sliding inference with (769, 769) patch
and whole image inference with (1024, 2048) size.
When I apply the sliding method, I get 70% mIoU on cityscapes dataset.
(The training has been done only for 40 epochs, which leads to worse score.)
However, whole image inference achieves only 65% mIoU.
I can't understand why my whole image inference does not predict the output successfully.

Do you have any knowledge around here ?
Does it depend on the network architecture ?
I really hope to be able to reduce my testing time.
Thanks.

@sunke123
Copy link
Member

Sliding method is often used in image processing, such as image de-blocking and deblur, and leads to better performance. It crops the images into many overlapped patches, which also increasing the inference timecost.

HRNet with multi-scale testing (including sliding process) can improve 1%~1.5% on mIoU. I have no idea why there is 5% performance gap. It's too large. I'm not sure whether the architecture results in this problem. You can try other network, HRNet or PSPNet.

If you want to reduce testing time, you can concatenate the cropped image patches at the batchisze axis.

@AtsukiOsanai
Copy link
Author

Accumulating images to batchsize axis is great idea for me.
This should be used only for the same size input image, such as cityscapes, right ?

I compared my code with your one and found a difference.
I apply torch.nn.functional.interpolate with align_corners=True to control the image size,
however, you employ cv2.resize.
Maybe align_corners=False is equivalent to cv2.resize, so I will check whole image inference again with align_corners=False.

After sharing my update, I will close this issue.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants