Skip to content

when inquiring about the relative positional realtionship of items, the returned results are always incorrect. #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zmf2022 opened this issue Apr 1, 2025 · 7 comments

Comments

@zmf2022
Copy link

zmf2022 commented Apr 1, 2025

No description provided.

@RussRobin
Copy link
Collaborator

Hi @zmf2022 ,

Thank you for your interest in our work. You may want to provide the rgb, depth and text inputs so we can further assist you. Greatly appreciated!

@zmf2022
Copy link
Author

zmf2022 commented Apr 1, 2025

Image

Image

@zmf2022
Copy link
Author

zmf2022 commented Apr 1, 2025

the question: orange and apple, which one is closest to me?

@RussRobin
Copy link
Collaborator

Let's test if the model can recognitize the objects first. Can you try 3 QAs in a sequence?
-What is the bounding box of the object: orange?
-What is the bounding box of the object: apple?
-Which is closer to the camera, orange or apple?

@zmf2022
Copy link
Author

zmf2022 commented Apr 1, 2025

ok, i will try, thanks!

@ThePassedWind
Copy link

Let's test if the model can recognitize the objects first. Can you try 3 QAs in a sequence? -What is the bounding box of the object: orange? -What is the bounding box of the object: apple? -Which is closer to the camera, orange or apple?

Can I input masks and semantic label instead of asking QAs like -What is the bounding box of the object?

@RussRobin
Copy link
Collaborator

You can tell the model the bounding boxes of objects you are interested in.

The reason for SpatialBot to fail your original question may be:

  1. It does not recognize the objects as they are small. So adding bounding boxes or masks would help.
  2. The confusion of 'me' with the people in your image. If you replace 'me' with 'camera', SpatialBot may be able to answer it correctly.

Hope it makes sense to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants