Skip to content

feat: add camera vision to look around#84

Open
flov wants to merge 2 commits intoThokoop:mainfrom
flov:look-around
Open

feat: add camera vision to look around#84
flov wants to merge 2 commits intoThokoop:mainfrom
flov:look-around

Conversation

@flov
Copy link

@flov flov commented Mar 13, 2026

This PR adds camera support for the billy-b-assistant. Right now it can look around and describe what it sees.
GPTARS inspired me to have it play a round of chess with me.
In the future I would like to get people to play a round of chess with me via the camera.

Summary

  • Adds look_around tool that lets Billy take a photo and describe what he sees using GPT-4o-mini
    vision
  • Billy has no innate visual capability — the tool is the only way he can describe his surroundings,
    enforced via system prompt
  • Camera capture supports Pi Camera Module (via picamera2), USB webcams, and MacBook FaceTime camera
    (via OpenCV) for dev/mockfish use

New files

  • core/camera.py — frame capture (picamera2 → OpenCV fallback) + GPT-4o-mini vision API call
  • test/list-cameras.py — utility to enumerate camera devices and pick the right CAMERA_DEVICE
    index

Configuration

CAMERA_ENABLED=true   # default: false
CAMERA_DEVICE=0       # camera index, use test/list-cameras.py to find the right one

Test plan

- Run python test/list-cameras.py to confirm cameras are detected
- Set CAMERA_ENABLED=true and CAMERA_DEVICE=<correct index> in .env
- Start a session and ask "what do you see?" — Billy should call look_around and describe the scene
- Confirm Billy does NOT hallucinate a scene when camera is disabled (CAMERA_ENABLED=false)
- Test on Raspberry Pi with Pi Camera Module (picamera2 path)

@Thokoop
Copy link
Owner

Thokoop commented Mar 13, 2026

Hi @flov , thanks for your contribution.

My billy hangs on the toilet wall, so I didn't want to add a camera myself haha. But it would be a fun optional upgrade !

Could you update your branch by merging my recent changes from main first? I have done some refactoring recently, causing the session.py file to split up.
Please also set the destination branch to my Dev branch, that will make sure I can first merge it to test before releasing it as a main version.

I will definitely try it out, I will order a picamera but in the meantime also test with a usb cam.

I think we can just hook it also into the 'normal' realtime session directly as the gpt-realtime models support also image inputs:

base64_image = "<BASE64_IMAGE_BYTES>"

message_event = {
    "type": "conversation.item.create",
    "item": {
        "type": "message",
        "role": "user",
        "content": [
            {
                "type": "input_text",
                "text": "Tell me what you see in this image."
            },
            {
                "type": "input_image",
                "image_url": f"data:image/jpeg;base64,{base64_image}",
                "detail": "high",
            }
        ],
    },
}

ws.send(json.dumps(message_event))

response_event = {
    "type": "response.create",
    "response": {
        "output_modalities": ["text", "audio"]
    },
}

ws.send(json.dumps(response_event))

@flov
Copy link
Author

flov commented Mar 14, 2026

haha, yes, having a fish see you while you're on the toilet is indeed a bit creepy 😂
I will refactor the code and rebase it to your dev branch.
I've been testing it out on my macbook pro together with MOCKFISH=true
It's working incredibly well. He can tell me exactly what I'm wearing, what's visible in the background, etc...
it works incredibly well. But my Billy is still way too friendly.
A friendly fish is not funny. I want a fish that roasts me based on my looks and tells me sarcastic jokes related to it.

@flov
Copy link
Author

flov commented Mar 14, 2026

btw. I've tried to play chess with him with the python-chess library, but it didn't work so well. He kept on making illegal chess moves :D. I think chat gpt is not good at playing chess. I think it would be possible with the integration of stockfish, but it's not that trivial

flov added 2 commits March 14, 2026 19:58
Billy can now take a photo and describe what he sees. Adds:
- core/camera.py: capture via picamera2 (Pi) or OpenCV (Mac/USB)
- look_around tool registered in base_tools and handled in session.py
- CAMERA_ENABLED / CAMERA_DEVICE config vars
- test/list-cameras.py helper to identify camera device indices
- README section M documenting setup and usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants