v2: Agent orchestration — Hive side (Phases 0–5 + notifications, prompts, React surface)#175
Merged
Conversation
…tatus - agent_status Select on Hive Task (specs/v2 §4.1 full option list; empty = not agent-managed; separate from the kanban status, which is untouched) - bwh_hive/bwh_hive/agent_api.py::report_agent_status — box -> Hive callback, gated by _assert_agent_caller (Agent Bot role); sets agent_status and appends a Hive Task Comment. Type-annotated per require_type_annotated_api_methods. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Assigning a Hive Task to the Agent user now auto-provisions a BenchSpace box, tracks it on the task, and drives agent_status via one validated state machine. - Data model: Hive Task agent_* fields (agent_section), Hive Project agent settings, Hive Settings orchestration/BenchSpace/callback/Telegram fields. - orchestrator/ package: BenchSpaceClient, build_boot_env (MMDS §3 + CONTROL_TOKEN), provision_for_task (enqueued, guarded, concurrency cap), set_agent_status (§4.2 transition table + per-actor authorization), dispatch, deprovision stub. - Trigger via the ToDo doc-event (assign_to.add doesn't fire Hive Task on_update). - agent_api.py: full §5.1 callback surface, each guarded by Agent-Bot role AND task-assigned-to-agent; status changes route through set_agent_status. - install.py: ensure Agent Bot role + agent@hive.local user + Hive Member. - Minimal desk UX (hive_task.js): status indicator + Open Code/Site/PR. E2e-validated cross-site against local benchspace (real HTTP, token auth). See specs/v2/progress.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
provision_for_task failure paths set Failed + a comment but left agent_last_error empty (found during live-server e2e when a missing agent-v16 template returned 417). Add a _fail() helper that writes agent_last_error before the Failed transition (specs/v2 §B.5). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Close the human↔box loop around the spec. A reviewer approves a Spec Created
spec in Hive, which advances the task to Spec Approved → Implementing and
dispatches POST {control_url}/implement/start to the box.
- Hive Task.approve_spec(note): human action → set_agent_status(Spec Approved).
Guard checks Agent-bot identity (not role — Administrator implicitly holds all
roles), plus write permission.
- service._react: entering Spec Approved enqueues start_implementation_for_task
(dispatch is HTTP, must not block the desk request).
- service.start_implementation_for_task: idempotent; flips to Implementing then
dispatches; on failure reverts to Spec Approved + agent_last_error (Phase 5
watchdog retries).
- hive_task.js: Approve Spec button on Spec Created.
Validated live on pms.localhost (8/8): approve→dispatch, failure→revert,
idempotency, bot-blocked, invalid human transition rejected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
request_agent_changes (PR Ready → Changes Requested → Implementing + sync dispatch /changes/apply, rollback on failure) and mark_agent_merged (PR Ready → Merged) as Hive Task doctype methods, guarded by the identity reviewer check; PR Ready desk buttons. Merged stays terminal → existing _react deprovision (00-architecture §4.2). Orchestration live-validated on pms.localhost (11/11). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…cy cap - deprovision_for_task: idempotent single sink; clears agent_control_token only on a successful teardown (retain-on-failure → watchdog retries). Merged/Cancelled tear down immediately; Failed defers to the grace sweep. - reconcile_agent_tasks watchdog on a new */10 cron: phase timeouts (sparing a cap-blocked Queued), vanished-box reconcile, idle/grace/orphan sweeps, FIFO queue drain. - max_concurrent_agent_boxes cap (default 5) + provisioning/spec/implement timeouts + idle/failed-grace Hive Settings fields. - cancel/retry/tear-down-now reviewer-gated desk actions + hive_task.js buttons; Failed→Queued retry edge; provision_for_task unassign-race guard. - build_boot_env emits SPEC_RUN_TIMEOUT/IMPL_RUN_TIMEOUT (derived from the watchdog budgets). Built + syntax/ruff-clean; live e2e on the test server is the open item. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…_down marker
Live validation surfaced a real teardown bug: Frappe keeps Password values in `__Auth`
(the doc column is a permanent `*****` placeholder), so deprovision_for_task's
`db_set("agent_control_token", None)` neither destroyed the secret nor gave a queryable
"torn down" signal — the watchdog's terminal-teardown sweep would re-enqueue teardown for
every historical terminal task forever.
- deprovision_for_task now calls remove_encrypted_password to actually destroy the token,
and sets a new agent_box_torn_down Check field on a successful teardown (and no-ops when
already torn down).
- watchdog pass D filters on agent_box_torn_down=0 instead of the (unreliable) token column.
- retry_agent_task resets agent_box_torn_down so a re-provisioned box is tear-down-able.
Validated end-to-end on a real Firecracker VM (teardown-on-merge: VM + all routes removed,
token cleared, audit retained) plus out-of-band-kill, orphan sweep, idempotent deprovision.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…allbacks
New self-contained bwh_hive/bwh_hive/notifications/ package turns box-callback
transitions into Telegram alerts behind a channel abstraction (07-notifications.md):
- events.py: EventType, frozen picklable NotificationEvent + from_task (reads
agent_code_url/agent_site_url/pr_link + task_url), strict MarkdownV2 escaper,
markdown + plain-text renderers with graceful link degradation.
- base.py: NotificationChannel ABC.
- channels/telegram.py: the only sending channel (HTTPS sendMessage, timeout=10,
captures the Telegram 4xx body on failure). channels/{email,frappe_log}.py are
real subclasses, is_enabled()->False in v2 so fan-out stays honest.
- dispatcher.py: notify() — defensive, kill-switch on notifications_enabled,
CHATTY_EVENTS gate for optional events, one enqueue_after_commit job per enabled
channel; _deliver() fully isolated (log_error, never resurfaces).
- agent_api.py: the four callbacks emit (SPEC_CREATED/PR_READY/FAILED/PROVISIONING)
only on an actual transition (prev != target), so retries don't re-alert.
- service._notify stays a documented no-op (event-driven, not transition-driven).
Verified: rendering/escaper 20/20 + dispatcher 13/13 (bench venv) + live isolation
7/7 on pms.localhost against the real Telegram API (bad token -> one Error Log, no
raise; kill-switch -> clean no-op). Real @Botfather send is the one manual leg left.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…abels
Live @bwh_hive_bot send surfaced a real bug: the SPEC_CREATED message failed
with Telegram 400 "can't parse entities: Character '(' is reserved" because the
link label "Review spec (code)" carries unescaped parens. MarkdownV2 reserves
those inside the link *text* too, not just in dynamic interpolations — the other
four events sent fine only because their labels have no reserved chars.
Fix: _link() now runs the label through escape_md2 (URL part still escapes \ and
). Re-sent live: SPEC_CREATED now gets a 2xx, zero Error Logs. Unit suite updated
(20/20) for the now-escaped label.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…evel
The spec/implement/changes prompts were baked into the box image (SKILLS_REPO
file -> shipped default). Make them editable in Hive, resolved server-side and
delivered to the box via get_task — no MMDS size limit, no image rebuild to
change a prompt.
- Hive Settings: agent_spec_prompt / agent_implement_prompt / agent_changes_prompt
(Code, "Agent Prompts (global defaults)" section).
- Hive Project: matching override fields ("Agent Prompt Overrides", depends_on
agent_enabled).
- agent_api.resolve_prompts(project): project override -> global default -> omit
(blank/whitespace falls through so the box uses its built-in). get_task now
returns a `prompts` dict consumed by control-plane/agent.py.
Verified live on pms.localhost (12/12 resolution) + box precedence isolated (6/6).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Bring the v2 agent lifecycle into the product UI — until now it lived only in
the Frappe desk. Frontend + 8 thin whitelisted api.py wrappers; no change to the
callback API, state machine, box control plane, or any doctype schema.
- api.py: agent_{approve_spec,request_changes,mark_merged,retry,cancel,
teardown_now,handoff} + resolved_prompts. Each wraps the existing Hive Task
doctype method, which re-asserts _assert_agent_reviewer (no new trust boundary).
- Surface 1 — AgentPanel in TaskDetailSheet: status badge + caption, deep links,
Failed error callout, state-gated reviewer actions + Hand-to-agent.
- Surface 2 — global Agent settings tab in SettingsDialog (Hive Settings).
- Surface 3 — per-project Agent tab on ProjectDetailPage (Hive Project) with a
View-global popover for inherited prompt defaults.
- types.ts + lib/agent.ts + reusable settings/agent-fields.tsx.
Password fields are write-only; a blank submit preserves the stored secret.
yarn build green, new files lint-clean, wrappers verified whitelisted on
pms.localhost. Request Changes is offered only at PR Ready (backend transition
is PR-Ready-only) — corrects the spec's surface-1 action list.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Replace the planned revalidate-while-open poll with true realtime — the task
panel, comment feed, and project board update live as the agent progresses.
Server (bwh_hive):
- service._publish_agent_update: publish_realtime("hive_agent_update", {task,
project, agent_status}, after_commit=True) from set_agent_status — the single
transition choke point, so every status change (box callbacks, human actions,
watchdog) is covered. No room/user -> site broadcast to all Desk users.
- append_agent_log: publish_realtime("hive_agent_log", {task}) per log line.
- Minimal payloads; the client refetches to pick up urls/pr_link/last_error set
in the same transaction.
Client (frontend):
- FrappeProvider gets siteName (window.site_name, injected via index.html; host
fallback) + dev-only socketPort — required for the Frappe v15+ socket.
- hooks/useAgentEvents.ts: useAgentTaskEvents / useAgentProjectEvents over
useFrappeEventListener, filtered by task/project.
- TaskDetailSheet refetches the task on hive_agent_update; TaskCommentsSection
refetches the feed on hive_agent_log; ProjectDetailPage refetches the board.
Stable SWR mutate fns as callbacks so subscriptions aren't churned.
yarn build green; publish verified on pms.localhost (1 event, room=all,
after_commit). No schema change, no migrate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hive-side implementation of the Doppio Box v2 agent platform — turning a Hive Task assigned to the Agent bot into an autonomous Firecracker box that specs → implements → opens a PR → iterates on review → tears down. Pairs with BenchSpace PR #74. Specs live in
benchspace/specs/v2/.What's in here
orchestrator/package (BenchSpace REST client,build_boot_env,set_agent_statusstate machine), assign-to-Agent trigger (via ToDo doc-event), Agent bot auto-provisioning.agent_api.py):report_agent_status,set_spec_ready,set_pr_ready,report_agent_error,append_agent_log,get_task— guarded by Agent-Bot-role + task-assignment checks.approve_spec→Spec Approved→ enqueued dispatch/implement/start.request_agent_changes(round-trips comments to/changes/apply) +mark_agent_merged.reconcile_agent_taskswatchdog (*/10cron), concurrency cap + FIFO queue drain, retry/cancel/teardown-now,agent_box_torn_downmarker.get_task.prompts.api.pywrappers, realtime socket.io push so all three update live.Validation
All phases e2e-validated on
pms.localhost+ against real Firecracker VMs on the test server (seebenchspace/specs/v2/progress.md). Migrations are additive; the whole surface is gated behindagent_orchestration_enabled(Hive Settings) — inert until enabled.Deploy
Deploys via Frappe Cloud on merge. Post-deploy:
bench migrate(auto-creates the Agent bot), rebuild thefrontend/(yarn build), confirm the scheduler runs the reconcile cron, then configure Hive Settings (BenchSpace API creds, callback keys,default_agent_template_slug, Telegram) before flippingagent_orchestration_enabled.🤖 Generated with Claude Code