TSR

Hackillinois Submission for 2025

The Idea

What if we could build a robot that given any language query, could detect and retrieve an object?

Implementation

We use MobileCLIP, a CLIP encoder-decoder by apple optimized for mobile hardware such as our robot, to process individual queries. This process runs in parallel with a video buffer, from which we encode frames on intervals using CLIP. We calculate a cosine similarity for the image across several input text queries, and if it matches one, we use computer vision to path to the object.

Future Steps

Pathfinding algorithm to object is very bad, maybe we can take the image encoding and then get like a similarity score across a mobilenet bounding box or something, then running something like SORT on the object bounding box
Voice commands
Using hardware acceleration for CLIP, need to figure out how to do this with smth like

How to run

Set up a virtual environment and activate it

/ml-mobileclip/demo.py for the object detector demo /ml-mobileclip/robot-movement.py fo

Hardware

Running on rasberry pi 4 with Freenove Car Motor Shield I2C. Link to full kit here

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ml-mobileclip		ml-mobileclip
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSR

The Idea

Implementation

Future Steps

How to run

Hardware

About

Releases

Packages

Contributors 2

Languages

AdityaJain1030/TSR

Folders and files

Latest commit

History

Repository files navigation

TSR

The Idea

Implementation

Future Steps

How to run

Hardware

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages