Wave 2: ollama-backfill 104 seed passports#4
Merged
Merged
Conversation
Backfills every archived package with a structured passport generated by hermes3:8b (local Ollama, 4.7GB Q4_0) using schema-constrained JSON output. All entries are marked ingest.manualReview=true so humans verify the LLM's category/tag/pattern assignments at their own pace — the schema contract is what ships, the per-seed content is the opening bid. How it runs: - scripts/seed-backfill.mjs (pnpm seed:backfill) iterates packages/*, builds a 1-6KB corpus per package (package.json + truncated README + up to 3 source files preferring entrypoints), and calls Ollama's /api/generate with the passport partial schema as the "format" constraint. - LLM fills the narrow subset it can infer: title, description, taxonomy (category + tags from the registry), technical (kind + programming languages), discovery (oneLiner + whyItMatters), patterns with registry-enforced categories, agentCapsule insight, confidence. - Script merges the LLM output with deterministic defaults (id, version from package.json, license, consolidation date 2026-04-08, lifecycle state=dormant, codeRepository URL, author, ingest provenance) and validates both the partial (post-LLM) and full (post-merge) schemas. Prompt calibration: - Three-package calibration pass (voice-soundboard, deltamind, mcpt) revealed two issues fixed before the full run: voice-and-sound routing (added category-routing hints) and language-tag hallucination (added "only claim languages verifiable from file extensions or package.json"). - After full run, 5 packages showed prompt-hint contamination (model echoed the "Source file extensions observed:" grounding line into its oneLiner). Removed that hint from the visible prompt and retried the 5; claude-memories required one hand-edit because its README contained instruction-like text that kept leaking. Results: - 104 passports, 100% schema-valid. - Confidence histogram: 16 at >=0.95, 67 at 0.85-0.94, 21 at 0.70-0.84, 0 below. - Category distribution: developer-tools 56, desktop-apps 12, voice-and-sound 6, ml-and-training 5, vscode-extensions 5, crypto-and-provenance 5, governance-and-policy 4, typing-and-input 3, mouse-and-cursor 2, websketch 2, games-and-creative 1, suites-and-infrastructure 1, original-archive 2. - Health block auto-computed from git + filesystem: 90 have tests, 104 README, 101 LICENSE, 104 fresh (<=90d since consolidation commits). Derived artifacts regenerated: - site/src/data/seeds.json (104 seeds) - README.md category tables (between GENERATED markers) - llms.txt at repo root (104 seeds, grouped by category) Review workflow: - pnpm seed:doctor lists all 104 under "Flagged for manual review". Expected for Wave 2 — clear manualReview=false as each passport is verified. - Single source of contamination risk left: ~10% of oneLiners in the llms.txt sample show obvious weakness (tautological, README-fragment leaks). Cheaper to fix incrementally than to re-run all 104. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Data-only PR. 104 new
packages/*/passport.jsonfiles generated by hermes3:8b via the local Ollama HTTP API with JSON-schema-constrained output. Only new code:scripts/seed-backfill.mjsandscripts/backfill-report.json. Regenerated derived artifacts:site/src/data/seeds.json,README.mdcategory tables,llms.txt.Results
Review workflow
All 104 are flagged
ingest.manualReview = true.pnpm seed:doctorsurfaces them as the review queue. Clear the flag on each as you verify.🤖 Generated with Claude Code