hummbl-dev · hummbl-dev · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026
@@ -0,0 +1,3 @@
+{"timestamp": "2026-04-19T15:08:34.257267+00:00", "repo": "arbiter", "score": 89.0, "grade": "CERTIFIED", "findings": 51, "loc": 17379, "dimensions": {"code": 94.7, "governance": 80.5, "dependencies": 100.0, "vitality": 75.0}, "record_hash": "43bce85687ac64eb4c3ff9a4464327896c1770036528920fbb14bad965d382ce", "prev_hash": ""}
+{"timestamp": "2026-04-19T15:33:53.324807+00:00", "repo": "agent-governance-demo", "score": 76.5, "grade": "PROVISIONAL", "findings": 5, "loc": 997, "dimensions": {"code": 91.6, "governance": 57.2, "dependencies": 100.0, "vitality": 40.0}, "record_hash": "f7ab4e514711127ed28530f36330d1dc8773414174254c990c1efdb18a238987", "prev_hash": "43bce85687ac64eb4c3ff9a4464327896c1770036528920fbb14bad965d382ce"}
+{"timestamp": "2026-04-19T15:34:12.175720+00:00", "repo": "agent-governance-demo", "score": 86.8, "grade": "CERTIFIED", "findings": 1, "loc": 994, "dimensions": {"code": 98.6, "governance": 85.9, "dependencies": 100.0, "vitality": 40.0}, "record_hash": "30f6bf0187a0ebf32cf1f9fda6a1769cb3da7a254f83699902f5021754c22729", "prev_hash": "f7ab4e514711127ed28530f36330d1dc8773414174254c990c1efdb18a238987"}
@@ -0,0 +1,220 @@
+---
+packet-version: 1.0
+from: claude-code
+to: gemini
+type: DISPATCH
+task-id: governance-beyond-artifacts
+priority: HIGH
+execution-mode: side_effecting
+authorized-by: human (Reuben, 2026-04-19)
+---
+
+## Context
+
+Arbiter is a deterministic code quality + governance scoring CLI at `/Users/others/PROJECTS/arbiter/`.
+Install: `pip install -e ".[analyzers]"` from repo root.
+Run tests: `PYTHONPATH=src python -m pytest tests/ -v`
+
+We ran Arbiter against 201 open-source repos and published the results. An ARCANA peer review
+(7 analytical lenses) identified a structural weakness: the governance scorer measures artifact
+*presence*, not governance *practice*. This is the Goodhart/Scott problem — file-presence checks
+are trivially gameable and miss informal governance that actually works.
+
+This handoff authorizes Gemini to build two new scoring modules that move Arbiter toward
+measuring practice, not just artifacts.
+
+## Finding
+
+**Current governance scorer** (`src/arbiter/governance_score.py`): 10 binary file-presence checks.
+`(repo_path / "SECURITY.md").exists()` → 15 points. No content analysis. No history analysis.
+Same score whether SECURITY.md says "email us" or describes a funded bug bounty with SLA.
+
+**Structural gap 1 — Content quality**: Files exist but quality is unmeasured.
+**Structural gap 2 — Temporal/vitality**: Point-in-time snapshot; gameable by adding files today.
+
+**Existing foundation**: `src/arbiter/git_historian.py` already walks git log via subprocess
+(stdlib only, no git library). `walk_commits()` returns `CommitInfo` with hash, author, timestamp,
+files_changed, loc_added, loc_removed. Gemini builds ON this, not from scratch.
+
+## Recommended Action
+
+### Sprint 1: Governance Quality Scorer (file content analysis)
+
+**Create**: `src/arbiter/governance_quality.py`
+
+Score the *content* of governance files, not just their existence. Pure local filesystem reads,
+stdlib only, no network.
+
+Scoring targets (all heuristic/regex, not NLP):
+
+**SECURITY.md quality** (0–15 pts):
+- Has a contact method (email, URL, form) → +5
+- Mentions a response timeline ("within 48 hours", "5 business days") → +5
+- Has a disclosure process (public vs private, CVE process) → +5
+
+**CONTRIBUTING.md quality** (0–15 pts):
+- Describes how to run tests → +5
+- Describes PR/review process → +5
+- Has a code style or linting section → +5
+
+**CI workflow quality** (0–15 pts) — parse `.github/workflows/*.yml`:
+- Runs on PR (not just push to main) → +5
+- Has more than one job (test + lint, or matrix) → +5
+- References a coverage or test command → +5
+
+**README quality** (0–10 pts):
+- Length > 500 chars (already partial in governance_score.py) → base
+- Has installation instructions (pip install, npm install, cargo add) → +5
+- Has usage example or code block → +5
+
+**Output dataclass**:
+```python
+@dataclass
+class GovernanceQualityReport:
+    security_score: float       # 0-15
+    contributing_score: float   # 0-15
+    ci_quality_score: float     # 0-15
+    readme_score: float         # 0-10
+    total: float                # 0-55, normalized to 0-100
+    details: list[str]          # human-readable findings
+```
+
+**Integration point**: `governance_score.py` calls `score_governance_quality(repo_path)` and
+blends it into the governance dimension. Suggested weighting within governance:
+- Artifacts sub-score (current 10 checks): 50%
+- Quality sub-score (new): 50%
+
+### Sprint 2: Git Vitality Scorer (history-based governance signals)
+
+**Create**: `src/arbiter/git_vitality.py`
+
+Use the existing `git_historian.walk_commits()` to extract governance-relevant signals from
+commit history. Addresses the Goodhart vulnerability: a repo that added all governance files
+last week scores differently from one that's had them for 3 years with active contributors.
+
+**Signals to compute**:
+
+**Bus factor** (0–25 pts): count unique committers in last 90 days
+- 1 committer → 5 pts (high concentration risk)
+- 2–3 committers → 15 pts
+- 4+ committers → 25 pts
+
+**Commit recency** (0–25 pts): days since last commit
+- 0–30 days → 25 pts
+- 31–90 days → 15 pts
+- 91–180 days → 8 pts
+- 180+ days → 0 pts (effectively unmaintained)
+
+**Release cadence** (0–25 pts): call `git tag --sort=-creatordate` via subprocess
+- Has ≥ 1 tag → 10 pts
+- Has ≥ 3 tags → 20 pts
+- Tags follow SemVer pattern → +5 pts
+
+**Signed commit ratio** (0–25 pts): percentage of commits with "Signed-off-by" in message
+- >75% → 25 pts (DCO genuinely enforced)
+- 25–75% → 15 pts
+- <25% → 5 pts
+- 0% → 0 pts (DCO artifact exists but nothing is actually signed)
+
+**Output dataclass**:
+```python
+@dataclass
+class GitVitalityReport:
+    bus_factor: int             # unique committers, 90 days
+    days_since_commit: int
+    release_count: int
+    signed_commit_ratio: float  # 0.0–1.0
+    score: float                # 0–100
+    details: list[str]
+```
+
+**Integration point**: Add `git_vitality` as an optional 4th scoring dimension in `scoring.py`.
+Weight suggestion when vitality is available: Code (45%) + Governance (25%) + Deps (15%) + Vitality (15%).
+When git history unavailable (shallow clone or no commits): fall back to existing 3-dimension weights.
+
+## File Map
+
+```
+src/arbiter/
+  governance_score.py       # MODIFY: call quality scorer, blend into governance dim
+  governance_quality.py     # CREATE: Sprint 1
+  git_vitality.py           # CREATE: Sprint 2
+  scoring.py                # MODIFY: add vitality dimension (optional)
+
+tests/
+  test_governance_quality.py  # CREATE: Sprint 1 tests
+  test_git_vitality.py        # CREATE: Sprint 2 tests
+```
+
+## Evidence
+
+- ARCANA review findings: Scott lens (metis erasure), Measurement lens (Goodhart HIGH), Foucault lens (artifact-vs-practice)
+- `src/arbiter/governance_score.py` lines 62–227: all checks are `Path.exists()` booleans
+- `src/arbiter/git_historian.py`: existing walk_commits() foundation for Sprint 2
+- `src/arbiter/dep_score.py`: reference pattern for dataclass + scoring function structure
+
+## Tests to Add
+
+Sprint 1 (governance_quality.py):
+- `test_security_md_with_contact_scores_higher_than_empty`
+- `test_contributing_md_with_test_instructions_gets_full_marks`
+- `test_ci_workflow_pr_trigger_detected`
+- `test_missing_files_score_zero_not_error`
+- `test_quality_score_normalized_to_100`
+
+Sprint 2 (git_vitality.py):
+- `test_single_committer_scores_low_bus_factor`
+- `test_recent_commit_scores_max_recency`
+- `test_semver_tags_detected`
+- `test_signed_commit_ratio_computed`
+- `test_shallow_clone_degrades_gracefully` (no git history → score=None, not crash)
+
+## Verification Criteria
+
+- All new tests pass: `PYTHONPATH=src python -m pytest tests/test_governance_quality.py tests/test_git_vitality.py -v`
+- Full test suite green: `PYTHONPATH=src python -m pytest tests/ -v`
+- Self-grade passes: `arbiter score . --fail-under 85`
+- No new third-party imports (stdlib + existing arbiter deps only)
+- Both new modules have module-level docstrings explaining what they measure vs. what they don't
+
+## Constraints
+
+- **Stdlib only** — no new third-party imports. Regex, pathlib, subprocess, dataclasses only.
+- **Branch**: `feat/gemini/governance-beyond-artifacts`
+- **Bus identity**: `gemini` (no variants, no parentheticals)
+- **Commit format**: Conventional Commits (`feat:`, `test:`, `fix:`)
+- **Soft limit**: 500 LOC / 10 files per PR
+- **TDD**: write failing tests first, then implement
+- **Closeout packet required** — final STATUS must include: artifact paths, test count delta,
+  self-grade score before/after, open questions deferred, caveats
+- **No DESIGN.md unless explicitly requested**
+- **No modifications to**: `.github/`, `.claude/`, `docs/blog/`, `docs/CERTIFICATION_REPORT.md`
+
+## Grading Criteria (Claude will audit this PR)
+
+Gemini will be graded on:
+1. Correct file paths (all under `src/arbiter/` and `tests/`)
+2. TDD discipline (tests written before implementation, or simultaneous)
+3. Stdlib-only compliance (no new imports)
+4. Graceful degradation (missing files, shallow clones → score=None, not crash)
+5. Closeout packet completeness
+6. Self-grade score maintained above 85
+
+## Session Start Protocol (mandatory first 5 commands)
+
+```bash
+# 1. Confirm correct repo
+ls /Users/others/PROJECTS/arbiter/src/arbiter/
+
+# 2. Confirm worktree state
+git -C /Users/others/PROJECTS/arbiter status --short
+
+# 3. Create branch
+git -C /Users/others/PROJECTS/arbiter checkout -b feat/gemini/governance-beyond-artifacts
+
+# 4. Confirm working directory
+pwd
+
+# 5. Run existing tests to establish baseline
+cd /Users/others/PROJECTS/arbiter && PYTHONPATH=src python -m pytest tests/ -q --tb=no
+```
@@ -0,0 +1,166 @@
+---
+packet-version: 1.0
+from: claude-code
+to: gemini
+type: FOLLOW-UP
+task-id: governance-beyond-artifacts
+priority: HIGH
+execution-mode: side_effecting
+authorized-by: human (Reuben, 2026-04-19)
+---
+
+## Context
+
+Sprint 1+2 files already exist on branch `feat/gemini/governance-beyond-artifacts` from a prior
+partial session. You are resuming, not starting. All 686 tests pass. Nothing is committed yet.
+
+## Current State (verified 2026-04-19 ~16:00 ET)
+
+**Untracked (Sprint 1+2 deliverables — your work):**
+```
+src/arbiter/governance_quality.py    183 lines
+src/arbiter/git_vitality.py          144 lines
+tests/test_governance_quality.py      76 lines
+tests/test_git_vitality.py            73 lines
+```
+
+**Modified unstaged (integration work — your work):**
+```
+src/arbiter/__main__.py         +12 / -2
+src/arbiter/certify.py          +23 / -3
+src/arbiter/governance_score.py +23 / -4
+tests/test_certify.py           +8
+tests/test_governance_score.py  +20 / -7
+```
+
+**Test result**: 686 passed, 0 failed, 0 errors (56s).
+
+## Required Actions
+
+### Step 1 — Session start protocol (mandatory)
+
+```bash
+# 1. Confirm branch
+git -C /Users/others/PROJECTS/arbiter branch --show-current
+# Expected: feat/gemini/governance-beyond-artifacts
+
+# 2. Confirm state
+git -C /Users/others/PROJECTS/arbiter status --short
+
+# 3. Run baseline tests
+cd /Users/others/PROJECTS/arbiter && PYTHONPATH=src python -m pytest tests/ -q --tb=no
+# Expected: 686 passed
+
+# 4. Bus post
+# gemini → all STATUS "Resuming governance-beyond-artifacts: 686 tests green, starting self-grade + PR"
+```
+
+### Step 2 — Self-grade
+
+```bash
+cd /Users/others/PROJECTS/arbiter && arbiter score . 2>/dev/null || \
+  PYTHONPATH=src python -m arbiter score /Users/others/PROJECTS/arbiter
+```
+
+Record score before and after your changes. Must be ≥ 85 to pass audit.
+
+### Step 3 — Review integration diffs
+
+Before committing, verify the 5 modified files are correct:
+
+```bash
+git -C /Users/others/PROJECTS/arbiter diff src/arbiter/governance_score.py
+git -C /Users/others/PROJECTS/arbiter diff src/arbiter/__main__.py
+git -C /Users/others/PROJECTS/arbiter diff src/arbiter/certify.py
+```
+
+Verify:
+- `governance_score.py` calls `score_governance_quality()` and blends result (50/50 artifacts/quality)
+- `__main__.py` exposes the new dimensions in CLI output
+- `certify.py` includes vitality dimension when git history is available
+- No new third-party imports in any file (`import re`, `import pathlib`, `import subprocess`, `import dataclasses` are all fine)
+
+### Step 4 — Commit (two-commit strategy)
+
+```bash
+cd /Users/others/PROJECTS/arbiter
+
+# Commit 1: Sprint 1 — governance quality scorer
+git add src/arbiter/governance_quality.py tests/test_governance_quality.py \
+        src/arbiter/governance_score.py tests/test_governance_score.py
+git commit -m "feat(governance): add content-quality scoring to governance dimension
+
+Adds governance_quality.py that scores SECURITY.md, CONTRIBUTING.md,
+CI workflows, and README content via regex (not just existence checks).
+Addresses Goodhart/Scott finding from ARCANA peer review: file presence
+was trivially gameable; content heuristics are not.
+
+Blends quality sub-score (50%) with artifact sub-score (50%) in governance_score.py."
+
+# Commit 2: Sprint 2 — git vitality scorer
+git add src/arbiter/git_vitality.py tests/test_git_vitality.py \
+        src/arbiter/certify.py src/arbiter/__main__.py tests/test_certify.py
+git commit -m "feat(vitality): add git history vitality dimension to scoring
+
+Adds git_vitality.py that scores bus factor, commit recency, release
+cadence, and signed-commit ratio from git log. Addresses temporal
+Goodhart vulnerability: repos cannot game history by adding files today.
+
+Vitality is an optional 4th dimension (15% weight) when git history
+is available. Degrades gracefully on shallow clones."
+```
+
+### Step 5 — Push and open PR
+
+```bash
+git -C /Users/others/PROJECTS/arbiter push origin feat/gemini/governance-beyond-artifacts
+gh pr create \
+  --repo $(git -C /Users/others/PROJECTS/arbiter remote get-url origin | sed 's/.*github.com\///' | sed 's/\.git$//') \
+  --title "feat(arbiter): governance quality + git vitality scoring (Sprint 1+2)" \
+  --body "..."
+```
+
+PR body must include:
+- What changed and why (ARCANA peer review finding)
+- Test count before/after
+- Self-grade before/after
+- Closeout packet (see Step 6)
+
+### Step 6 — Closeout packet (mandatory, post to bus)
+
+Bus post format:
+```
+gemini → all STATUS "governance-beyond-artifacts COMPLETE.
+ARTIFACTS: governance_quality.py (183L), git_vitality.py (144L), 4 test files.
+TESTS: 686 total (10 new), 0 failures.
+SELF-GRADE: before=<N>, after=<N>.
+PR: <url>.
+OPEN: <any deferred questions>.
+CAVEATS: <any uncertainty>.
+SOURCES: ARCANA peer review (Scott/Measurement/Foucault lenses), git_historian.py foundation."
+```
+
+## Verification Criteria (Claude audit checklist)
+
+- [ ] `PYTHONPATH=src python -m pytest tests/ -v` → all 686+ pass
+- [ ] `git diff --name-only HEAD` shows only expected files
+- [ ] `governance_quality.py` has no `import requests` / `import numpy` / any non-stdlib
+- [ ] `git_vitality.py` uses only `subprocess`, `re`, `dataclasses`, `datetime`
+- [ ] `governance_score.py` blends quality 50/50 (not replaces)
+- [ ] Self-grade ≥ 85
+- [ ] Closeout packet posted to bus
+- [ ] PR title follows Conventional Commits
+- [ ] No modifications to `.github/`, `.claude/`, `docs/blog/`
+
+## Constraints
+
+- **Stdlib only** — no new third-party imports
+- **Branch**: `feat/gemini/governance-beyond-artifacts` (already checked out)
+- **Bus identity**: `gemini` (no parentheticals)
+- **GEMINI_SESSION=true** — scope gate enforces this
+- Do NOT modify `docs/CERTIFICATION_REPORT.md`, `docs/blog/`, `.github/`
+
+---
+
+*This packet supersedes the dispatch spec at `docs/GEMINI_HANDOFF.md` for current session state.
+GEMINI_HANDOFF.md remains authoritative for Sprint 1+2 requirements.*