Skip to content

governance: Local blackboard — agent work registration, task claiming, and token quota coordination #78

@jcfischer

Description

@jcfischer

Context

Issues #77 (change awareness) and #76 (formal blackboard schema) focus on the shared blackboard — how operators and their agents coordinate across the network. But there's a gap one level down: how do multiple agents within a single operator's PAI coordinate with each other?

Today, agents are session-scoped. An operator spawns Claude Code, does work, exits. If they spawn a second agent (via Task tool, delegation, or a separate terminal), that agent has no visibility into what the first one is doing. There's no work registry, no claiming protocol, and no resource awareness.

Daniel's ULWork model has TASKLIST.md as the central coordination surface, but it's human-managed and doesn't address the multi-agent coordination problem within a single operator's infrastructure.

The gap in one sentence: We have a shared blackboard for inter-operator coordination but no local blackboard for intra-operator agent coordination.

Proposal: Local Agent Blackboard

A lightweight coordination surface on the operator's machine where agents register work, claim tasks, log progress, and track resource consumption. Four components:

1. Agent Work Ledger

When an agent spawns (or resumes), it registers itself and what it's working on.

# ~/.pai/blackboard/agents.yaml (auto-maintained)
agents:
  - id: "ivy-session-a3f2"
    name: "Ivy"
    started: "2026-02-01T10:30:00Z"
    lastHeartbeat: "2026-02-01T11:15:00Z"
    status: active          # active | idle | completed | stale
    currentWork:
      project: "pai-content-filter"
      task: "Address PR #56 review findings"
      issueRef: "mellanon/pai-collab#56"
      claimedAt: "2026-02-01T10:31:00Z"
    tokensUsed:
      session: 45_200
      inputTokens: 38_000
      outputTokens: 7_200

  - id: "ivy-delegate-b7c1"
    name: "Ivy (delegate)"
    started: "2026-02-01T10:45:00Z"
    lastHeartbeat: "2026-02-01T11:10:00Z"
    status: active
    currentWork:
      project: "pai-collab"
      task: "Review Steffen025 introduction #68"
      issueRef: "mellanon/pai-collab#68"
      claimedAt: "2026-02-01T10:46:00Z"
    tokensUsed:
      session: 12_800
      inputTokens: 11_000
      outputTokens: 1_800

Stale detection: If lastHeartbeat is older than a configurable threshold (e.g. 5 minutes for interactive sessions, 30 minutes for background delegates), the agent is marked stale and its claimed work becomes available again.

2. Task Claiming Protocol

When an agent spawns, wakes via heartbeat, or finishes its current work, it checks the blackboard:

Agent starts
  → Read agents.yaml — who else is active? What's claimed?
  → Read tasks.yaml — what's available?
  → Check token quota — do I have budget to take on work?
  → Claim a task (atomic write to agents.yaml)
  → Begin work
  → Heartbeat every N minutes (update lastHeartbeat + progress)
  → Complete → update status, release claim, check for next task
# ~/.pai/blackboard/tasks.yaml (populated from multiple sources)
tasks:
  - id: "collab-pr-56"
    source: "github:mellanon/pai-collab#56"
    title: "Address schema findings on pai-content-filter PR"
    priority: P1
    status: claimed            # available | claimed | completed | blocked
    claimedBy: "ivy-session-a3f2"
    estimatedTokens: 30_000    # rough estimate for quota planning

  - id: "collab-issue-77"
    source: "github:mellanon/pai-collab#77"
    title: "Draft response to change awareness discussion"
    priority: P2
    status: available
    estimatedTokens: 20_000

  - id: "local-test-run"
    source: "local"
    title: "Run pai-content-filter test suite after changes"
    priority: P1
    status: blocked
    blockedBy: "collab-pr-56"
    estimatedTokens: 5_000

Task sources: Tasks can come from GitHub issues (synced via collab CLI), local work queues, or operator-defined priorities. The blackboard doesn't replace GitHub issues — it's the local working copy that agents read for claiming.

3. Progress Broadcasting

Agents write progress to a shared log that other agents (and the operator) can read:

# ~/.pai/blackboard/progress.yaml (append-only during session, pruned on rotation)
entries:
  - agent: "ivy-session-a3f2"
    timestamp: "2026-02-01T11:00:00Z"
    task: "collab-pr-56"
    event: "milestone"
    detail: "JOURNAL.md added, STATUS.md updated. CaMeL claims relabeled per azmaveth review."

  - agent: "ivy-delegate-b7c1"
    timestamp: "2026-02-01T11:05:00Z"
    task: "collab-issue-68"
    event: "completed"
    detail: "Reviewed Steffen025 introduction. Responded with SpecFlow interop perspective."

This is the intra-operator nervous system that #77 asks about. When a new agent spawns, it can read the progress log to understand what happened recently — no need to ask the operator "what have we been working on?"

4. Token Quota Tracking

The novel component. Agents need resource awareness to avoid exhausting the operator's token budget.

# ~/.pai/blackboard/quota.yaml
quota:
  provider: "anthropic"
  plan:
    tier: "scale"              # or pro, team, enterprise
    rateLimit:
      tokensPerMinute: 80_000
      requestsPerMinute: 60

  windows:
    5h:
      budget: 500_000          # operator-configured: max tokens in rolling 5h
      used: 58_000
      remaining: 442_000
      resetAt: "2026-02-01T15:30:00Z"
    7d:
      budget: 5_000_000        # operator-configured: max tokens in rolling 7d
      used: 1_230_000
      remaining: 3_770_000
      resetAt: "2026-02-07T10:30:00Z"

  reservation:
    operator: 200_000          # always reserved for human interactive use
    agents:
      maxConcurrent: 3
      maxPerAgent: 100_000     # per-session cap

  currentLoad:
    activeAgents: 2
    combinedRate: 12_000       # tokens/minute across all agents
    headroom: 68_000           # tokensPerMinute - combinedRate

Why this matters: If three background delegates are each consuming 20k tokens/minute, the operator's interactive session gets rate-limited. Token quota tracking lets agents self-throttle and ensures the human always has priority access.

How agents use this:

  • Before claiming work, check quota.windows.5h.remaining > task.estimatedTokens
  • Respect reservation.operator — never consume into the human's reserved budget
  • If headroom drops below a threshold, agents pause or reduce output verbosity
  • When remaining hits a warning threshold, notify the operator

How This Connects to Existing Issues

#77 (Change awareness across hub-spoke network)

The local blackboard IS the Level 2 "heartbeat" from the #77 discussion. The progression:

Level Mechanism Where it lives
0. Manual Operator checks Human memory
1. Session-scoped CLI queries on demand collab status
2. Local blackboard Agents self-coordinate locally ~/.pai/blackboard/
3. Hub-spoke sync Local blackboard publishes to spoke .collab/status.yaml
4. Event-driven Real-time cross-network events Signal / webhooks

The local blackboard answers "what's the simplest thing that makes the network aware of itself?" — it starts with agents on a single machine being aware of each other. The spoke schema (status.yaml) then becomes a projection of local blackboard state to the network.

#76 (Formal blackboard schema)

The local blackboard schema proposed here extends the five ULWork components with agent-native coordination:

ULWork component Local blackboard equivalent
TASKLIST.md tasks.yaml — machine-readable, claimable
Issues Task sources synced from GitHub
SOPs Unchanged — agents read SOPs from the PAI skill system
TELOS Unchanged — feeds into task prioritization
Context agents.yaml + progress.yaml — live execution context

The new addition is quota.yaml — resource awareness has no ULWork equivalent because the single-operator model doesn't have multi-agent contention.

#72 (SpecFirst / Cedars milestone-based orchestration)

The local blackboard is orchestration-agnostic. Whether work is organized as:

  • SpecFlow phases (specify → build → harden → release)
  • Cedars milestones (independent units with dependency graphs)
  • Ad-hoc tasks (operator assigns directly)

...the blackboard doesn't care. It tracks who is working on what and how many resources remain, not how work is organized. Both SpecFlow and Cedars could use the claiming protocol to assign work to agents. The blackboard is the coordination layer beneath the orchestration layer.

┌─────────────────────────────┐
│  Orchestration               │  SpecFlow, Cedars, or manual
│  (how work is structured)    │
├─────────────────────────────┤
│  Local Blackboard            │  This proposal
│  (who is doing what + quota) │
├─────────────────────────────┤
│  Spoke Schema                │  .collab/status.yaml
│  (what the network sees)     │
├─────────────────────────────┤
│  Hub Blackboard              │  pai-collab
│  (cross-operator coordination│
└─────────────────────────────┘

Implementation Sketch

This could be a PAI skill (blackboard) or part of the collab-bundle CLI:

# Agent lifecycle
blackboard register --name "Ivy" --session-id $SESSION_ID
blackboard heartbeat --progress "Addressed 2/3 review findings"
blackboard complete --task collab-pr-56 --summary "PR ready for merge"
blackboard deregister

# Task management
blackboard tasks list                      # show available tasks
blackboard tasks claim collab-issue-77     # claim a task
blackboard tasks release collab-pr-56      # release without completing

# Quota
blackboard quota status                    # show current budget
blackboard quota check --tokens 30000      # can I afford this?

# Operator view
blackboard status                          # who's active, what's claimed, quota health

Questions for Discussion

For @azmaveth (Arbor Claude / Andreas)

  1. Stale detection and recovery — Your onboarding SOP work showed careful thinking about state machines. How would you handle the case where an agent crashes mid-task? Timer-based stale detection has a race condition where two agents could try to reclaim the same work. Is a simple file lock sufficient, or do we need something more robust?

  2. Security review integration — You reviewed our content filter and proposed PII patterns. If the local blackboard tracks what agents are working on, does that create a new attack surface? (An injected task in tasks.yaml could direct an agent to malicious work.) Should the blackboard itself go through content filtering?

  3. Token quota granularity — The proposal uses operator-configured budgets. Should these be API-driven instead (query Anthropic's rate limit headers for actual remaining quota)?

For @Steffen025 (Jeremy / OpenCode)

  1. Platform agnosticism — You're on OpenCode, not Claude Code. The ~/.pai/blackboard/ path assumes PAI directory conventions. Would .blackboard/ at the project root (like .collab/) be more platform-agnostic? Or does agent coordination inherently belong in the operator's home directory?

  2. Cedars integration — Your milestone dependency graph (depends_on: [core-engine]) is a natural fit for the task claiming protocol. A Cedars orchestrator could populate tasks.yaml with milestone-derived tasks, and agents claim them through the blackboard. Does this align with how you envision Cedars' execution model?

  3. Token quota for multi-provider setups — OpenCode supports multiple providers (Claude, OpenAI, Gemini, local). The quota schema above is single-provider. How would you extend it for a multi-provider budget where different tasks might use different models?

For everyone

  1. Where does this live? Options:

    • A PAI skill (~/.claude/skills/Blackboard/)
    • Part of collab-bundle CLI (collab agent register, collab quota status)
    • A standalone tool
    • Some combination
  2. Is YAML the right format? For a coordination surface that multiple concurrent agents write to, YAML has merge conflict risks. SQLite (like SpecFlow's features.db) would handle concurrent access better. But YAML is human-readable and diffable. Which matters more for this use case?

What We're NOT Proposing

  • Replacing GitHub issues — The local blackboard is a working cache, not a source of truth. GitHub issues remain the canonical work queue.
  • Building a scheduler — This isn't cron or Kubernetes. It's a coordination surface that agents read/write during their normal lifecycle.
  • Mandating heartbeat daemons — The protocol works with session-scoped agents (register on start, deregister on exit) and gets better with heartbeats. Heartbeats are an optimization, not a requirement.

Related Issues


From a conversation between @jcfischer and Ivy about the gap between "agents exist" and "agents coordinate."

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumDo this sooncompeting-proposalsMultiple approaches sought — propose your solutiongovernanceRepo-level policy, trust model, and processproject/collab-bundleScope: collab-bundle CLI projectproject/collab-infraBlackboard and daemon infrastructureseeking-contributorsWe want help with this — open to new contributorstype/ideaProposal or concept to exploreworkstream/hub-spoke-infraWorkstream: Hub-spoke infrastructure and coordination layer

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions