-
Notifications
You must be signed in to change notification settings - Fork 19.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make ImageDataGenerator generator repeatable #7613
Conversation
Following keras-team#3751. I ran more tests to see if ImageDataGenerator generator can transform images and masks in the same way. It can for NumpyArrayIterator but not for DirectoryIterator. So I added seeds to fix this and now it can transform images and masks together. I have unit tests to prove this, and I can add them to the PR if desired?
Could someone give this a review? I found a issue in keras_retinanet that would be fixed by this, so it's still an issue. It wont take long I just added |
I think these changes would need a unit test that demonstrates the problem that is fixed to have any chance of being merged. |
I don't think resetting the random number generator to the seed for every invocation of the RNG is a good idea. You're effectively reducing it to generate one random number and repeating it a lot of times. The proper way to solve the problem (at-least for keras-retinanet) I think is to split the generation and usage of random transformations into separate steps. Then the random transformation can be used multiple times on different things (like the image and the bounding boxes). |
@de-vri-es That's true, my fix is not ideal, and your proposal would be much cleaner. But it would take much longer and it seems like no one is keen to spend much time on this part of keras. Which is fair enough, perhaps it could be split into a separate package to shift the burden of keras maintainers. @ahundt In the original posts I offered unit tests, but I deleted them since there was no interest. I could look into writing them again, but is there any interest in merging this fix, or would it be a waste of my time? |
@wassname Yeah you're right it probably wouldn't be merged considering it repeats values and would thus not actually be random if a seed is specified. I think a solution like the one @de-vri-es suggested would be necessary or something like SegDataGenerator as I mentioned in the jaccard distance PR. For other readers' reference: SegDataGenerator from Keras-FCN transforms both an image and a mask. A usage example is available in the Keras-FCN's train.py file. |
I agree this was just a quick fix, sounds like we all want to leave it until there is a better solution. Yeah SegDataGenerator is pretty nice for segmentation, I've used it and found it easy to edit too. Might be worth adapting for keras_retinanet too, thanks for the suggestion. |
We want ImageDataGenerator to be repeatable so we can 1) have repeatable research 2) so we can transform a mask and image in parallel (like in the keras docs).
In PR #3751 I made it mostly repeatable but I recently ran more tests and found problems. It's not repeatable when using flow_from_directory, and it's not repeatable if the mask and image have different number of channels.
This PR fixes that by adding
np.random.seed(seed)
where needed.I have two unit tests I can add if desired?