-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Extract features from bounding boxes #665
base: main
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Hi @TheShadow29 Thanks for the PR. |
@botcs sounds good. Actually #164 does two things at once (if I have understood correctly). During the forward pass, it retains the image proposals as well as the image features. This pr instead requires ground-truth box to be given first and then uses the box to retrieve the image features. There are two advantages to this: The only down-side is that (ii) would take a bit more time to get the features (I don't have timing comparisons but I guess it would be aroudn 1.5 times slower). However, this is usually a one-time process so getting better features might be valuable at the cost of more processing time. Let me know what you think. Thank you for your patience. |
Thanks for your effort. But when I extracted the features using bbox(such as [13,4]), I got features of [15,1024]. So |
@kangkang59812 Thanks for checking it out. What network are you using? I think I tested with res50 fpn maskrcnn architecture. My guess is there were some changes made to the repo (the pr was made quite some time back) and this pr would have to updated. |
@TheShadow29 |
@kangkang59812 Awesome. Thanks for confirming |
Hi @TheShadow29 , I used your implementation to do retrieval but it gets worse performance that using a pre trained ResNet50 (in imagenet, without further training) to extract features after the detection. Maybe the bbox has to be resized according to image padding? edit: it seems that is not necessary to pad the boxes, as said in https://github.com/facebookresearch/maskrcnn-benchmark/issues/965#issuecomment-510926086 , so I don't know why the worse performance |
@simaiden Could you briefly explain your setup of retrieval? It doesn't seem to be same as object detection |
I use Mask R-CNN to detect clothes in an image, after that get a feature vector with the cropped region. This region became an input to ResNet50, then I do global average pooling in some layer to get the feature vector. With this approach I get good results but not with the roi feature using your implementation. |
@simaiden Could you verify it for object detection on coco? There might a few more things taking place under the hood in your use-case. |
Hi. First thanks for the amazing repository.
Features extracted from a detection network are often used in other tasks (like vqa). The code shows how to extract out the features given the bounding boxes. Currently, I have just added some utility functions to
demo/predictor.py
. This possibly solves #164 with minor changes.Currently, I am not sure how to test if everything is correct. A sanity check I have done is to re-classify the extracted boxes, and the results seem to be consistent.
Thanks