-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarification #3
Comments
@CharlesShang I cannot find any issues with in particular over the repo. If in case you require any help in a particular direction, let know. |
I must misunderstand some details in your paper.
|
@parhartanvir
Sorry for the delayed replay, just back from a vacation |
@CharlesShang: Thanks for your effort to implement this very nice work!
As far as I understood, the branch predicts binary segmentation masks for each object class - so there is no need for a background mask. |
@CharlesShang , I believe for the mask there should not be a background class. That is because there are K binary masks for each of the K classes. Having a background class for the Faster-RCNN / Region proposal part makes sense. But since they are not computing mask loss in between classes, a background mask is not needed. As far as training goes, I think what you are saying is right. i.e. forward pass the images, add/average gradients then backward pass. I apologize, I haven't gone through the FPN paper yet. I'll go through and see, if I can help. |
@parhartanvir For consistance, I'll adopt |
I have gone over the FPN paper. I think just one RPN is OK. ancher_targer_layer with inputs of P2 through P5, generate anchors and merge together and random sample inside, normal proposal_layer and proposal_target_layer follows. For backforward, assign each RoI of width w and height h (on the input image to the network) to the level Pk of our feature pyramid by eqn(1). I think it is not elegant and time consumming with four RPNs followed by four heads. And it is hard to trade off the four parts. |
Hi Charles,
Thank you for your interest and implementing Mask R-CNN!
I would like to clarify some descriptions in your Readme (which may suggest misunderstanding of our work):
"The original work involves two stages, a pyramid Faster-RCNN for object detection and another network (with the same structure) for instance level segmentation."
This is not true. In our original work, object detection and instance segmentation are in one stage. They are in parallel, and they are two tasks of a multi-task learning network.
I hope this will ease your effort of a correct reproduction.
The text was updated successfully, but these errors were encountered: