Living list of things we know we want to do but haven't done yet. Each item should explain why it matters, not just what to change, so that future contributors (or future-us) can decide whether the entry is still relevant without re-deriving the analysis.
Status: Partially resolved. Compact prompt routing is implemented in both Chrome and Firefox as an explicit per-provider opt-in. The remaining question is prompt quality and model-tier selection, not browser parity or dead code.
Where the code lives:
- Compact prompt bodies —
src/chrome/src/agent/tools.jsandsrc/firefox/src/agent/tools.jsSYSTEM_PROMPT_ACT_COMPACT - Full ACT prompt bodies — same files,
SYSTEM_PROMPT_ACT - Dispatch —
src/chrome/src/agent/agent.jsandsrc/firefox/src/agent/agent.js_getActPrompt()route Act mode to compact prompts when the active provider hasuseCompactPrompt. - Provider opt-in —
BaseLLMProvider.useCompactPromptgetter + per-provider config (openai.js,llamacpp.js, inherited by compatible OpenAI-style local providers).
The actual contradiction:
The compact prompt was introduced for small models (~7B–13B). Two stated reasons:
- Small models have shorter effective attention windows — info from the front of a 30K prompt may not influence late-conversation decisions.
- Their context windows are smaller (often 8K–32K) so a 7K prompt eats a lot.
But small models also need more direction, not less:
- Their reasoning is shallower — they can't infer "I shouldn't re-download" from "scratchpad facts"; you have to literally tell them.
- They pattern-match more than they reason — examples help more than abstract rules.
- They need scaffolding (do A, then B, then C) where larger models can plan A→B→C themselves.
So "less prompt" pulls one way and "more explicit guidance" pulls the other.
What the compact prompt actually cuts (and why this is the wrong cut):
- All worked examples (e.g. UI-vs-API has 5 examples in full, 0 in compact).
- Whole sections judged "edge cases small models won't encounter": IFRAMES, the
/allow-apioverride, extended FORMS reasoning. - Replaces multi-paragraph rules with single-sentence imperatives.
The "drop examples to save tokens" choice is exactly backwards: examples are how small models get unstuck. Removing nuance and reasoning while keeping bare imperatives gives the small model orders without the gradient information needed to follow them.
The 27B trace evidence:
webbrain-trace-qwen3.6-27b-run_1777441198379_v1rqkk.json — qwen3.6-27b on llama.cpp. Asked to upload dist/*.zip to a v5.1.0 GitHub release. Re-downloaded the same files three times because each auto-screenshot pushed the original download_files result out of recent attention, and the model re-derived "I need to fetch the files" from current visual state. Pattern-matched on intent, not on prior tool history. This is the failure mode small-model compactness was meant to address — and yet the compact prompt would have made it worse by stripping the SCRATCHPAD section that says explicitly to pin download paths.
Per-step input tokens for that run: 21K -> 21K -> 28K -> 30K -> 40K (auto-screenshot growth, not summarization growth). The model paid the tax of the full prompt (~7.4K) AND lost track of state. The previous "everyone gets full prompt" decision was the right local fix.
What an actual resolution would look like:
Three tiers, not a binary:
| Tier | Models | Prompt shape | Approx size |
|---|---|---|---|
| Frontier | Sonnet, Opus, GPT-4o, Gemini Pro | Trim worked examples, keep rules. Trust their planning. | ~3K tokens |
| Mid | Llama 70B, Qwen 35B, GPT-4o-mini | Full rules + 1-2 examples per rule. | ~5K tokens |
| Small | 7B–30B local (qwen3.6-27b, etc.) | Full rules + many examples + simpler imperative vocabulary, + extra failure-mode reminders. Larger, not smaller, than current full prompt. | ~6K-7K tokens |
Per-model-class prompt selection wired through _getActPrompt(). Tier inferred from provider config (useCompactPrompt is the wrong axis; it should be tier: 'frontier' | 'mid' | 'small').
Why this is on the TODO list and not in flight:
- Requires picking the tier per model rather than per-provider, which means a model→tier mapping (or a heuristic).
- Examples need to be written deliberately, not extracted from the existing full prompt.
- Full-prompt defaults work for frontier-skewed users (the dominant cohort), so the urgency is on the small-model end which is also where local-host iteration is hardest to test.
Concrete next steps when picking this up:
- Define the tier enum and a
getTier()method on each provider class. Default frontend models tofrontier, OpenAI/Anthropic configs with non-flagship model names tomid, llama.cpp / lmstudio / ollama tosmall. - Author
SYSTEM_PROMPT_ACT_FRONTIER(trimmed) andSYSTEM_PROMPT_ACT_SMALL(expanded). KeepSYSTEM_PROMPT_ACTas the mid-tier default. - Replace the current compact/full dispatch in
_getActPrompt()with tier-based routing. - Re-run the qwen3.6-27b trace scenario and verify the small-tier prompt prevents the re-download loop.
- Token-budget the prompt against each model's context window so prompt + first turn fits.
The Firefox build is meaningfully weaker than Chrome (already noted in the README's "Known Issues"). Some gaps are platform-real (no CDP, no Manifest V3 service worker), but several are just unported features. Worth ticking off one at a time:
upload_file— not yet in Firefox. The dispatcher path exists for downloads but not for uploads. Likely a few hours of work; webextensions has the same<input type="file">mechanics.- Conversation persistence across background restarts — Chrome persists per-tab chats to
chrome.storage.session; Firefox keeps them in-memory only. This is why the scratchpad port deliberately skips the_persistcall. Real fix would persist viabrowser.storage.session+ restore on background page reload. full_page_screenshot— Chrome uses CDPcaptureBeyondViewport; Firefox would needtabs.captureFullPageor a scroll-and-stitch fallback. Lower priority.shadow_dom_query— CDP-dependent. Hardest port; may not be worth it until a concrete user case emerges.
Recently closed Firefox parity items:
- Firefox now has
downloadspermission anddownload_files; the old singulardownload_fileTODO is obsolete because the tool surface was consolidated on pluraldownload_files. - Firefox Ask mode can access the accessibility tree again (10.0.2).
kind: "tool" events previously stored data.step === null even though the
surrounding llm_request / llm_response events carried the right step. Fixed
by threading the loop's steps counter through _executeToolBatch (new step
parameter) to the trace.recordToolCall call in both the Chrome and Firefox
agent loops. Tool rows in the Traces Compare view now carry their step number.
That trace (webbrain-trace-gpt-4o-run_1777328860857_tb4voc.json — model labeled gpt-4o but provider was lmstudio, so a local model in disguise) showed two re-occurring patterns the LISTINGS & PAGINATION prompt addition (commit landed already) directly targets:
- Re-fetched
?sd=2three times in a row via three different tools (research_url ×2, fetch_url ×1) without ever extracting an item from any of them. - Hit
get_accessibility_tree({filter:"all"})overflow twice with differentmaxCharsvalues, never switching to a different tool.
The prompt rules now name these failures explicitly. Worth re-running the same prompt on a fresh trace once a small-tier prompt exists to see whether the rules alone fix it or whether the model still ignores them at small parameter counts.
The root manifest (manifest.json) is not equivalent to the actual Chrome
extension manifest under src/chrome/manifest.json. The root manifest points
at src/background.js and injects only src/content/content.js, while the real
Chrome code lives under src/chrome/src/ and also needs
accessibility-tree.js, CDP helpers, offscreen fetch, and the fuller permission
set.
This makes the "Load unpacked" path easy to get wrong. The root README currently
needs to be unambiguous about whether developers should load src/chrome/ or a
generated release directory.
Partially fixed already: npm run build:zip now creates deterministic Chrome
and Firefox submission zips from HEAD:src/<browser> into dist/, so release
zips no longer depend on ad hoc PowerShell/archive behavior. The remaining work
is the development install story and the misleading root manifest.
Concrete next steps:
- Decide whether root
manifest.jsonshould be deleted, generated, or made a thin redirect-free copy of the Chrome manifest. - Consider adding
build:chrome,build:firefox, andbuild:allscripts that produce unpacked development directories in addition to the existing zips. - Update the README quick-start instructions to point at the canonical load-unpacked directory.
Chrome currently requests broad permissions up front: debugger, downloads,
unlimitedStorage, offscreen, privateNetworkAccess, broad host permissions,
and connect-src *. Most of these map to real features, but the initial install
surface is large.
Store review and user trust would improve if sensitive capabilities are grouped by feature and requested/explained at the moment they are needed where the browser APIs permit it.
Partially fixed already: docs/security-model.md now includes a permission risk
table and SECURITY.md points to the detailed security model. The remaining
work is staging/optionality and in-product explanations.
Concrete next steps:
- Keep the permission-to-feature table current for Chrome and Firefox.
- Identify which permissions can be optional or triggered by an explicit enablement path.
- Add UI copy for high-risk capabilities: debugger control, downloads, all-site access, local/private-network LLM access.
src/chrome/src/ui/settings.js accepts WB_AUTH_TOKEN from window.message
and writes the token into extension storage, then auto-configures the WebBrain
Cloud provider. The handler should validate the sender before trusting the
payload.
Concrete next steps:
- Require
event.origin === 'https://auth.webbrain.one'. - Track the auth popup/tab/window that was opened and require
event.sourceto match when the platform makes that reliable. - Validate payload shape before storing: token non-empty string, email string, default model string or absent.
- Consider a one-time nonce/state value so an unrelated page cannot spoof the completion message.
The Chrome and Firefox source trees are mostly mirrored but not shared. Many
files differ across agent, tools, providers, network, trace, and ui.
Some differences are platform-real, but the current layout makes accidental
parity regressions likely.
Concrete next steps:
- Extract browser-neutral code into a shared module tree, e.g. provider logic, prompt/tool definitions, adapters, trace formatting, and pure helpers.
- Keep browser-specific APIs behind small platform adapters
(
chrome.scriptingvsbrowser.tabs.executeScript, CDP vs non-CDP, side panel vs sidebar). - Add a parity check that fails when shared files are changed in one browser tree but not the other, until the common module extraction exists.
test/run.js duplicates pieces of Agent logic in LoopDetectorShim because
agent.js imports browser-only modules. That means tests can pass while the real
agent implementation drifts.
Concrete next steps:
- Move loop detection, coordinate-click bucketing, image budget sizing, and other pure logic into browser-free modules.
- Import those modules directly from both
agent.jsandtest/run.js. - Add regression tests for the text tool-call parser and context trimming, since both are high-impact agent reliability code.
The original gap — OpenAICompatibleProvider.chat() applying options.extraBody
while chatStream() did not — is already resolved: both methods now apply
extraBody in src/chrome/src/providers/openai.js and the Firefox copy. The
remaining (lower-priority) work is breadth:
Concrete next steps:
- Add provider-level tests or small request-shape probes for OpenAI-compatible, llama.cpp, and Anthropic providers.
- Document which provider-specific request fields are intentionally supported.