feat: safety boundary, experiment logging, and evaluation funnel (Phase 1)#3
Open
physics91 wants to merge 12 commits into
Open
feat: safety boundary, experiment logging, and evaluation funnel (Phase 1)#3physics91 wants to merge 12 commits into
physics91 wants to merge 12 commits into
Conversation
AgentWorkflow / EvaluatorContract Protocol 정의로 editable harness와 fixed adapter 사이의 경계를 추상화함. AST 기반 테스트로 heavy runtime dependency 없이 agent.py 호환성 검증.
Harbor 통합 코드(to_atif, AutoAgent)를 별도 adapter.py로 분리해 editable harness(agent.py)와 고정 어댑터 경계를 물리적으로 구분함. agent.py 하단에서 re-export해 Harbor의 agent:AutoAgent 진입점 호환성 유지.
- adapter.py: lazy import to avoid circular dependency on standalone import - adapter.py: track pending tool calls by call_id (dict) instead of single slot - preflight.py: exact root-relative path matching to avoid false positives - contracts.py: fix run_task return type to tuple[Any, int] - tests: add test for nested __init__.py not being flagged as fixed file
- preflight: broaden forbidden patterns (from/import variants, __import__) - run_eval.sh: add writable tmpfs for /task, /logs, /app/output - tests: add cases for from importlib and __import__ bypass
ijlu
referenced
this pull request
in ijlu/autoagent
Apr 25, 2026
The position_health_log schema has settlement_result + settlement_pnl_cents columns described as "filled post-settlement for learning" — but nothing ever filled them. Rows stayed NULL forever, starving the bandit that trains on which health-band decisions held winners vs losers. record_settlements now runs a scoped UPDATE per settled ticker so every prior per-cycle health snapshot carries the eventual outcome. NULL-guard on the WHERE clause keeps this idempotent across multiple settlements. Audit note: the other four Phase 1 step #3 learning sinks (calibration, timing_patterns, edge_convergence, loss_postmortems) are already fully populated from alpha_backtest via bot/learning/populate_from_alpha.py and from the legacy trade.py writers — no change needed there. Step #4 (Platt in get_independent_estimate) and step #6 (directional shadow evaluator with KXBTC/KXETH/KXHIGHDEN blocks) were likewise already wired. position_health was the one real gap. Co-Authored-By: Claude Opus 4.7 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AgentWorkflow/EvaluatorContractProtocol 정의 — harness↔adapter 인터페이스 계약FIXED ADAPTER BOUNDARY이하 코드를 별도 모듈로 추출, lazy import로 순환 의존성 해결, 멀티 tool call 추적 (call_id dict)--read-only+--network=none+ tmpfs writable paths + seccompexperiments.jsonlATIF sidecar index — trace_id/trajectory_uri로 Harbor ATIF trajectory 연결Motivation
6라운드 다관점 토론(Claude Code vs Codex)을 거쳐 도출된 AutoAgent 개선안 Phase 1 구현.
주석 기반 소프트 가드레일 → 물리적 격리(Docker read-only, network isolation) + preflight policy gate.
Test plan
python -m pytest tests/ -v)