Skip to content

Fix loading dataset #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

TrefoIV
Copy link

@TrefoIV TrefoIV commented Dec 22, 2022

Fixed loading dataset because images were discarded incorrectly datasets.py#L87. In particular in the original code, it is modified the list referenced by the for loop, this brings to unwanted behavior.
The code now behaves a little different when loading dataset: it first loads all the annotation files and, for each file, loads the corresponding image file specified in the filename tag in the annotation file.

Moreover, I've add a parameter to discard the negative examples (images without bonding-boxes/annotations) from the dataset. The default behavior is discard as in the original code. If negative example are keeped, for each negative example a bounding box of a single pixel is added.

@sovit-123
Copy link
Owner

Hello @TrefoIV Thank you for your contributions. I will surely take a closer look at your code.
In the meantime, I see that you have added code to take care of negative examples. We already had this code datasets.py#L299 in datasets.py. I think this code is still there even after your code has been added.
Can you please let me know if your code provides any benefits over the present code that takes care of negative examples?
I am open to suggestions.

@TrefoIV
Copy link
Author

TrefoIV commented Jan 9, 2023

Hi, thank you for the answer, and sorry for my late one.
I looked at the code at datasets.py#L299; I have to admit that I worked on a previous version of your code, where this part for negative examples was not present, but I added the changes on this branch starting from your updated main branch, and so I didn't notice that this piece of code was added.
I was just wondering why your code at datasets.py#L299 doesn't add a label for the "dummy" bounding box added for negative examples (the one with all zeros), since I labeled it as "__background__" since I had troubles with it (using Faster RCNN_resnet50_fpn). But I run a test with your code and it seems working well, so I removed the unnecessary code I added.
I left intact the part to load the images and labels, and also the "discard_negative" parameter.

@sovit-123
Copy link
Owner

@TrefoIV
Thanks for trying to contribute.
From the first look, your code looks pretty good. I will just do a test run to double-check everything. But it may be 1 or 2 days before I can test it. In case, any errors/issues arise I will let you know.

@sovit-123
Copy link
Owner

@TrefoIV
I tested your branch on my local system but I think there are still some issues with the code. I cannot point out the exact line in datasets.py which causes the issues. I am laying out some points here which may help you debug the code:

  • In the main branch, the datasets preparation code does not discard files that have empty bndbox in XML file.
  • But if an XML file is not present for a file, then the image is discarded.

Maybe you can curate a dummy scenario to replicate this and debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants