- We’re building a tiny educational AI browser agent in Node.js. It should stay minimal and easy to explain. The agent will observe the page with screenshots, use OpenAI 4.5 with the new computer tool capabilities to decide the next action, and use Playwright to execute it. Let’s start by creating an empty npm project.
- Add the dependencies we need:
playwright,openai,dotenv, andminimist. - Create
browser.jsand add a helper that launches a visible Chromium browser and opens a new page. Pick a fixed window size, because we’ll want to use the same size later with the OpenAI computer tool. - In
browser.js, add a helper that takes a screenshot of the current page and returns it as base64. - Create
index.js. Parse a--urlflag withminimist, default it tohttps://bank-demo.loadmill.com/, launch the browser, and open that URL right away. - In
index.js, add a tiny terminal prompt loop withreadlineso I can type tasks until I enterexit. - Create
actions.jswith one function that executes a single action on the Playwright page. Start withclick,type,keypress, andscroll. - Connect
index.jstoactions.jswith a temporary manual mode. Let me type simple action commands in the terminal, parse them, and execute them on the page so we can confirm browser control works before adding the model. At the end, print a very short note with a few example commands I can try. - Create
openai.js, load the OpenAI API key from.env, and create a.envfile with a placeholderOPENAI_API_KEY=value so I can fill it in before the next step. Keep this step minimal so I can confirm the setup before we call the API. - In
openai.js, add a small helper that calls the OpenAI Responses API with the computer tool. Send the current task and the latest browser screenshot, use the same screen size asbrowser.js, and include a short instruction telling the model to keep acting until the task is actually complete and only returndoneat the end. Follow the official OpenAI computer-use guide for the request shape. - Update
index.jsso one user task runs a single observe-decide-act step: take a screenshot, send it toopenai.js, print the returned action, and execute it. If the model returns a normal message instead of an action, print that clearly too. - Turn that into a loop. After each action, take a fresh screenshot and send it back so the model can continue from the new page state. If the model returns multiple actions in one step, execute them in order. Stop when the model returns
doneor when you hit a small step limit.