Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ImageDataGenerator generator repeatable #7613

Closed
wants to merge 8 commits into from

Conversation

wassname
Copy link
Contributor

@wassname wassname commented Aug 12, 2017

We want ImageDataGenerator to be repeatable so we can 1) have repeatable research 2) so we can transform a mask and image in parallel (like in the keras docs).

In PR #3751 I made it mostly repeatable but I recently ran more tests and found problems. It's not repeatable when using flow_from_directory, and it's not repeatable if the mask and image have different number of channels.

This PR fixes that by adding np.random.seed(seed) where needed.

I have two unit tests I can add if desired?

Following keras-team#3751. I ran more tests to see if ImageDataGenerator generator can transform images and masks in the same way. It can for NumpyArrayIterator but not for DirectoryIterator. 

So I added seeds to fix this and now it can transform images and masks together. I have unit tests to prove this, and I can add them to the PR if desired?
@wassname wassname closed this Aug 12, 2017
@wassname wassname reopened this Aug 12, 2017
@wassname
Copy link
Contributor Author

wassname commented Dec 4, 2017

Could someone give this a review? I found a issue in keras_retinanet that would be fixed by this, so it's still an issue. It wont take long I just added np.random.seed(seed) where needed.

@ahundt
Copy link
Contributor

ahundt commented Dec 4, 2017

I think these changes would need a unit test that demonstrates the problem that is fixed to have any chance of being merged.

@de-vri-es
Copy link
Contributor

de-vri-es commented Dec 5, 2017

I don't think resetting the random number generator to the seed for every invocation of the RNG is a good idea. You're effectively reducing it to generate one random number and repeating it a lot of times.

The proper way to solve the problem (at-least for keras-retinanet) I think is to split the generation and usage of random transformations into separate steps. Then the random transformation can be used multiple times on different things (like the image and the bounding boxes).

@wassname
Copy link
Contributor Author

wassname commented Dec 5, 2017

@de-vri-es That's true, my fix is not ideal, and your proposal would be much cleaner. But it would take much longer and it seems like no one is keen to spend much time on this part of keras. Which is fair enough, perhaps it could be split into a separate package to shift the burden of keras maintainers.

@ahundt In the original posts I offered unit tests, but I deleted them since there was no interest. I could look into writing them again, but is there any interest in merging this fix, or would it be a waste of my time?

@ahundt
Copy link
Contributor

ahundt commented Dec 5, 2017

@wassname Yeah you're right it probably wouldn't be merged considering it repeats values and would thus not actually be random if a seed is specified.

I think a solution like the one @de-vri-es suggested would be necessary or something like SegDataGenerator as I mentioned in the jaccard distance PR.

For other readers' reference: SegDataGenerator from Keras-FCN transforms both an image and a mask. A usage example is available in the Keras-FCN's train.py file.

@wassname
Copy link
Contributor Author

wassname commented Dec 5, 2017

I agree this was just a quick fix, sounds like we all want to leave it until there is a better solution.

Yeah SegDataGenerator is pretty nice for segmentation, I've used it and found it easy to edit too. Might be worth adapting for keras_retinanet too, thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants