Skip to content

v2: Agent orchestration — Hive side (Phases 0–5 + notifications, prompts, React surface)#175

Merged
NagariaHussain merged 12 commits into
developfrom
v2
Jul 1, 2026
Merged

v2: Agent orchestration — Hive side (Phases 0–5 + notifications, prompts, React surface)#175
NagariaHussain merged 12 commits into
developfrom
v2

Conversation

@NagariaHussain

Copy link
Copy Markdown
Contributor

Hive-side implementation of the Doppio Box v2 agent platform — turning a Hive Task assigned to the Agent bot into an autonomous Firecracker box that specs → implements → opens a PR → iterates on review → tears down. Pairs with BenchSpace PR #74. Specs live in benchspace/specs/v2/.

What's in here

  • Phase 1 — data model & orchestration: agent fields on Hive Task/Project/Settings, the orchestrator/ package (BenchSpace REST client, build_boot_env, set_agent_status state machine), assign-to-Agent trigger (via ToDo doc-event), Agent bot auto-provisioning.
  • Phase 0/1 callback API (agent_api.py): report_agent_status, set_spec_ready, set_pr_ready, report_agent_error, append_agent_log, get_task — guarded by Agent-Bot-role + task-assignment checks.
  • Phase 3 — spec flow: approve_specSpec Approved → enqueued dispatch /implement/start.
  • Phase 4 — PR review loop: request_agent_changes (round-trips comments to /changes/apply) + mark_agent_merged.
  • Phase 5 — lifecycle/cost: idempotent teardown sink, reconcile_agent_tasks watchdog (*/10 cron), concurrency cap + FIFO queue drain, retry/cancel/teardown-now, agent_box_torn_down marker.
  • ⊕ Notifications: OO Telegram dispatcher wired into the four box callbacks (MarkdownV2, after-commit enqueue, kill-switch).
  • ⊕ Configurable prompts: spec/implement/changes prompt editors at global (Hive Settings) + per-project level, resolved via get_task.prompts.
  • ⊕ React frontend agent surface (spec 09): per-task AgentPanel, global Agent settings tab, per-project Agent tab, 8 thin api.py wrappers, realtime socket.io push so all three update live.

Validation

All phases e2e-validated on pms.localhost + against real Firecracker VMs on the test server (see benchspace/specs/v2/progress.md). Migrations are additive; the whole surface is gated behind agent_orchestration_enabled (Hive Settings) — inert until enabled.

Deploy

Deploys via Frappe Cloud on merge. Post-deploy: bench migrate (auto-creates the Agent bot), rebuild the frontend/ (yarn build), confirm the scheduler runs the reconcile cron, then configure Hive Settings (BenchSpace API creds, callback keys, default_agent_template_slug, Telegram) before flipping agent_orchestration_enabled.

🤖 Generated with Claude Code

NagariaHussain and others added 12 commits June 30, 2026 15:32
…tatus

- agent_status Select on Hive Task (specs/v2 §4.1 full option list; empty =
  not agent-managed; separate from the kanban status, which is untouched)
- bwh_hive/bwh_hive/agent_api.py::report_agent_status — box -> Hive callback,
  gated by _assert_agent_caller (Agent Bot role); sets agent_status and appends
  a Hive Task Comment. Type-annotated per require_type_annotated_api_methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Assigning a Hive Task to the Agent user now auto-provisions a BenchSpace box,
tracks it on the task, and drives agent_status via one validated state machine.

- Data model: Hive Task agent_* fields (agent_section), Hive Project agent
  settings, Hive Settings orchestration/BenchSpace/callback/Telegram fields.
- orchestrator/ package: BenchSpaceClient, build_boot_env (MMDS §3 + CONTROL_TOKEN),
  provision_for_task (enqueued, guarded, concurrency cap), set_agent_status
  (§4.2 transition table + per-actor authorization), dispatch, deprovision stub.
- Trigger via the ToDo doc-event (assign_to.add doesn't fire Hive Task on_update).
- agent_api.py: full §5.1 callback surface, each guarded by Agent-Bot role AND
  task-assigned-to-agent; status changes route through set_agent_status.
- install.py: ensure Agent Bot role + agent@hive.local user + Hive Member.
- Minimal desk UX (hive_task.js): status indicator + Open Code/Site/PR.

E2e-validated cross-site against local benchspace (real HTTP, token auth).
See specs/v2/progress.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
provision_for_task failure paths set Failed + a comment but left
agent_last_error empty (found during live-server e2e when a missing
agent-v16 template returned 417). Add a _fail() helper that writes
agent_last_error before the Failed transition (specs/v2 §B.5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Close the human↔box loop around the spec. A reviewer approves a Spec Created
spec in Hive, which advances the task to Spec Approved → Implementing and
dispatches POST {control_url}/implement/start to the box.

- Hive Task.approve_spec(note): human action → set_agent_status(Spec Approved).
  Guard checks Agent-bot identity (not role — Administrator implicitly holds all
  roles), plus write permission.
- service._react: entering Spec Approved enqueues start_implementation_for_task
  (dispatch is HTTP, must not block the desk request).
- service.start_implementation_for_task: idempotent; flips to Implementing then
  dispatches; on failure reverts to Spec Approved + agent_last_error (Phase 5
  watchdog retries).
- hive_task.js: Approve Spec button on Spec Created.

Validated live on pms.localhost (8/8): approve→dispatch, failure→revert,
idempotency, bot-blocked, invalid human transition rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
request_agent_changes (PR Ready → Changes Requested → Implementing + sync
dispatch /changes/apply, rollback on failure) and mark_agent_merged
(PR Ready → Merged) as Hive Task doctype methods, guarded by the identity
reviewer check; PR Ready desk buttons. Merged stays terminal → existing
_react deprovision (00-architecture §4.2).

Orchestration live-validated on pms.localhost (11/11).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…cy cap

- deprovision_for_task: idempotent single sink; clears agent_control_token only on a
  successful teardown (retain-on-failure → watchdog retries). Merged/Cancelled tear down
  immediately; Failed defers to the grace sweep.
- reconcile_agent_tasks watchdog on a new */10 cron: phase timeouts (sparing a cap-blocked
  Queued), vanished-box reconcile, idle/grace/orphan sweeps, FIFO queue drain.
- max_concurrent_agent_boxes cap (default 5) + provisioning/spec/implement timeouts +
  idle/failed-grace Hive Settings fields.
- cancel/retry/tear-down-now reviewer-gated desk actions + hive_task.js buttons;
  Failed→Queued retry edge; provision_for_task unassign-race guard.
- build_boot_env emits SPEC_RUN_TIMEOUT/IMPL_RUN_TIMEOUT (derived from the watchdog budgets).

Built + syntax/ruff-clean; live e2e on the test server is the open item.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…_down marker

Live validation surfaced a real teardown bug: Frappe keeps Password values in `__Auth`
(the doc column is a permanent `*****` placeholder), so deprovision_for_task's
`db_set("agent_control_token", None)` neither destroyed the secret nor gave a queryable
"torn down" signal — the watchdog's terminal-teardown sweep would re-enqueue teardown for
every historical terminal task forever.

- deprovision_for_task now calls remove_encrypted_password to actually destroy the token,
  and sets a new agent_box_torn_down Check field on a successful teardown (and no-ops when
  already torn down).
- watchdog pass D filters on agent_box_torn_down=0 instead of the (unreliable) token column.
- retry_agent_task resets agent_box_torn_down so a re-provisioned box is tear-down-able.

Validated end-to-end on a real Firecracker VM (teardown-on-merge: VM + all routes removed,
token cleared, audit retained) plus out-of-band-kill, orphan sweep, idempotent deprovision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…allbacks

New self-contained bwh_hive/bwh_hive/notifications/ package turns box-callback
transitions into Telegram alerts behind a channel abstraction (07-notifications.md):

- events.py: EventType, frozen picklable NotificationEvent + from_task (reads
  agent_code_url/agent_site_url/pr_link + task_url), strict MarkdownV2 escaper,
  markdown + plain-text renderers with graceful link degradation.
- base.py: NotificationChannel ABC.
- channels/telegram.py: the only sending channel (HTTPS sendMessage, timeout=10,
  captures the Telegram 4xx body on failure). channels/{email,frappe_log}.py are
  real subclasses, is_enabled()->False in v2 so fan-out stays honest.
- dispatcher.py: notify() — defensive, kill-switch on notifications_enabled,
  CHATTY_EVENTS gate for optional events, one enqueue_after_commit job per enabled
  channel; _deliver() fully isolated (log_error, never resurfaces).
- agent_api.py: the four callbacks emit (SPEC_CREATED/PR_READY/FAILED/PROVISIONING)
  only on an actual transition (prev != target), so retries don't re-alert.
- service._notify stays a documented no-op (event-driven, not transition-driven).

Verified: rendering/escaper 20/20 + dispatcher 13/13 (bench venv) + live isolation
7/7 on pms.localhost against the real Telegram API (bad token -> one Error Log, no
raise; kill-switch -> clean no-op). Real @Botfather send is the one manual leg left.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…abels

Live @bwh_hive_bot send surfaced a real bug: the SPEC_CREATED message failed
with Telegram 400 "can't parse entities: Character '(' is reserved" because the
link label "Review spec (code)" carries unescaped parens. MarkdownV2 reserves
those inside the link *text* too, not just in dynamic interpolations — the other
four events sent fine only because their labels have no reserved chars.

Fix: _link() now runs the label through escape_md2 (URL part still escapes \ and
). Re-sent live: SPEC_CREATED now gets a 2xx, zero Error Logs. Unit suite updated
(20/20) for the now-escaped label.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
…evel

The spec/implement/changes prompts were baked into the box image (SKILLS_REPO
file -> shipped default). Make them editable in Hive, resolved server-side and
delivered to the box via get_task — no MMDS size limit, no image rebuild to
change a prompt.

- Hive Settings: agent_spec_prompt / agent_implement_prompt / agent_changes_prompt
  (Code, "Agent Prompts (global defaults)" section).
- Hive Project: matching override fields ("Agent Prompt Overrides", depends_on
  agent_enabled).
- agent_api.resolve_prompts(project): project override -> global default -> omit
  (blank/whitespace falls through so the box uses its built-in). get_task now
  returns a `prompts` dict consumed by control-plane/agent.py.

Verified live on pms.localhost (12/12 resolution) + box precedence isolated (6/6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Bring the v2 agent lifecycle into the product UI — until now it lived only in
the Frappe desk. Frontend + 8 thin whitelisted api.py wrappers; no change to the
callback API, state machine, box control plane, or any doctype schema.

- api.py: agent_{approve_spec,request_changes,mark_merged,retry,cancel,
  teardown_now,handoff} + resolved_prompts. Each wraps the existing Hive Task
  doctype method, which re-asserts _assert_agent_reviewer (no new trust boundary).
- Surface 1 — AgentPanel in TaskDetailSheet: status badge + caption, deep links,
  Failed error callout, state-gated reviewer actions + Hand-to-agent.
- Surface 2 — global Agent settings tab in SettingsDialog (Hive Settings).
- Surface 3 — per-project Agent tab on ProjectDetailPage (Hive Project) with a
  View-global popover for inherited prompt defaults.
- types.ts + lib/agent.ts + reusable settings/agent-fields.tsx.

Password fields are write-only; a blank submit preserves the stored secret.
yarn build green, new files lint-clean, wrappers verified whitelisted on
pms.localhost. Request Changes is offered only at PR Ready (backend transition
is PR-Ready-only) — corrects the spec's surface-1 action list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
Replace the planned revalidate-while-open poll with true realtime — the task
panel, comment feed, and project board update live as the agent progresses.

Server (bwh_hive):
- service._publish_agent_update: publish_realtime("hive_agent_update", {task,
  project, agent_status}, after_commit=True) from set_agent_status — the single
  transition choke point, so every status change (box callbacks, human actions,
  watchdog) is covered. No room/user -> site broadcast to all Desk users.
- append_agent_log: publish_realtime("hive_agent_log", {task}) per log line.
- Minimal payloads; the client refetches to pick up urls/pr_link/last_error set
  in the same transaction.

Client (frontend):
- FrappeProvider gets siteName (window.site_name, injected via index.html; host
  fallback) + dev-only socketPort — required for the Frappe v15+ socket.
- hooks/useAgentEvents.ts: useAgentTaskEvents / useAgentProjectEvents over
  useFrappeEventListener, filtered by task/project.
- TaskDetailSheet refetches the task on hive_agent_update; TaskCommentsSection
  refetches the feed on hive_agent_log; ProjectDetailPage refetches the board.
  Stable SWR mutate fns as callbacks so subscriptions aren't churned.

yarn build green; publish verified on pms.localhost (1 event, room=all,
after_commit). No schema change, no migrate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011C9jzHhfxDqtU13uWtazZm
@NagariaHussain NagariaHussain merged commit 0224beb into develop Jul 1, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant