tiny-cua/recipe.md at master · loadmill/tiny-cua

We’re building a tiny educational AI browser agent in Node.js. It should stay minimal and easy to explain. The agent will observe the page with screenshots, use OpenAI 4.5 with the new computer tool capabilities to decide the next action, and use Playwright to execute it. Let’s start by creating an empty npm project.
Add the dependencies we need: playwright, openai, dotenv, and minimist.
Create browser.js and add a helper that launches a visible Chromium browser and opens a new page. Pick a fixed window size, because we’ll want to use the same size later with the OpenAI computer tool.
In browser.js, add a helper that takes a screenshot of the current page and returns it as base64.
Create index.js. Parse a --url flag with minimist, default it to https://bank-demo.loadmill.com/, launch the browser, and open that URL right away.
In index.js, add a tiny terminal prompt loop with readline so I can type tasks until I enter exit.
Create actions.js with one function that executes a single action on the Playwright page. Start with click, type, keypress, and scroll.
Connect index.js to actions.js with a temporary manual mode. Let me type simple action commands in the terminal, parse them, and execute them on the page so we can confirm browser control works before adding the model. At the end, print a very short note with a few example commands I can try.
Create openai.js, load the OpenAI API key from .env, and create a .env file with a placeholder OPENAI_API_KEY= value so I can fill it in before the next step. Keep this step minimal so I can confirm the setup before we call the API.
In openai.js, add a small helper that calls the OpenAI Responses API with the computer tool. Send the current task and the latest browser screenshot, use the same screen size as browser.js, and include a short instruction telling the model to keep acting until the task is actually complete and only return done at the end. Follow the official OpenAI computer-use guide for the request shape.
Update index.js so one user task runs a single observe-decide-act step: take a screenshot, send it to openai.js, print the returned action, and execute it. If the model returns a normal message instead of an action, print that clearly too.
Turn that into a loop. After each action, take a fresh screenshot and send it back so the model can continue from the new page state. If the model returns multiple actions in one step, execute them in order. Stop when the model returns done or when you hit a small step limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

recipe.md

Latest commit

History

recipe.md

File metadata and controls