Skip to content

Sync harbor-adapter-creator and harbor-task-creator with upstream changes (2026-04-19)#6

Open
benediktstroebl wants to merge 2 commits intomainfrom
sync/harbor-upstream-2026-04-19
Open

Sync harbor-adapter-creator and harbor-task-creator with upstream changes (2026-04-19)#6
benediktstroebl wants to merge 2 commits intomainfrom
sync/harbor-upstream-2026-04-19

Conversation

@benediktstroebl
Copy link
Copy Markdown
Collaborator

Syncs two skills with upstream changes merged into harbor-framework/harbor in the last 48 hours, primarily from commit 9ad34d5 (Split adapter tutorial to human/ai, update registry handling) and related adapter PRs.

Skills updated

harbor-adapter-creator

  • Directory structure: Updated from flat layout to src layout. Adapter code now lives under src/<adapter_name>/ with adapter.py, main.py, and task-template/ (was template/). Added pyproject.toml, .python-version, __init__.py to the structure diagram.
  • Run command: Updated from python3 adapters/mybenchmark/run_adapter.py to uv run python -m <adapter_name>.main, matching the new package-based layout.
  • task.toml: Added required [task] section with name field. The upstream fix (commit 9ad34d5) explicitly documents that name must be under [task], not at the TOML top level — tasks without it cannot be registered. Added naming requirements guidance.
  • parity_experiment.json: Renamed original_trials/harbor_trials to original_runs/harbor_runs throughout. Added new required fields: adapted_benchmark_size, parity_benchmark_size, number_of_runs. Updated the field reference table and example. The validator on main now enforces *_runs naming.
  • adapter_metadata.json: Added added_agents and parity_unmatching_agents fields to harbor_adapter entries. Updated parity_costs format to include $ prefix.
  • Reference adapters: Added scenario table covering cooperbench (multi-agent workflow via messaging/sidecars) and featurebench (GPU tasks, comprehensive Docker + Modal example), plus adebench, evoeval, bixbench, financeagent, medagentbench.
  • GPU tasks: Added dedicated section pointing to featurebench as the canonical GPU adapter example.
  • Gotchas: Added entries for missing [task] section and unstable task names.

harbor-task-creator

  • task.toml: Added [task] section to the minimal example with name, description, and keywords fields. This matches the upstream fix for the task.toml format where name must be under [task], not at the top level.
  • Keywords guidance: Added "Always populate keywords" note — pick 3–8 lowercase tokens covering domain, verifier style, and hardware. Leaving it empty makes the task invisible to registry search.
  • Step 9: Update README.md: Added new section documenting what to put in README before publishing (agent description, environment, verifier, layout, run commands).
  • Common Pitfalls: Added rows for keywords = [], missing [task] section, and stub README.

Skills not updated

harbor-cli: No confirmed CLI command changes in the 48-hour window that would affect the reference. Other merged PRs (llmsr-bench adapter, ScienceAgentBench adapter, RExBench Modal support) add new adapters but do not change CLI commands or flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant