PrimeIntellect-ai · willccbb · May 14, 2026 · May 13, 2026 · May 13, 2026 · May 14, 2026
diff --git a/.cursor/BUGBOT.md b/.cursor/BUGBOT.md
@@ -77,3 +77,13 @@ When these files change, verify and update any affected skill files:
 - `skills/brainstorm/SKILL.md`
 
 If workflow-relevant changes are detected without matching skill updates, request that the author update the impacted skills before merge.
+
+## Environment Rollout Logic
+
+Do not request library utilities solely because two or more environments contain similar message, state, or rollout-loop data manipulation. A few explicit lines inside an environment are often the clearest and most discoverable implementation.
+
+In particular, do not suggest moving small helpers for selecting messages, extracting text from `state`, or juggling rollout-local fields into hidden library modules. Buried helpers are not easily discoverable by end users, clutter the public API when promoted, and make the docs responsible for enumerating every three-line convenience function.
+
+Prefer explicit environment-local code unless the repeated logic is a framework contract, fixes a correctness bug at the boundary, or is already part of documented user-facing API. Do not ask authors to create one-off private helpers for simple rollout logic; if a few lines are used once, they should usually stay inline at the call site.
+
+Helpers are acceptable when the logic is reused in multiple places, is a taskset-bound object that forms part of the environment contract, or is complex enough to deserve a named secondary module. Excess reliance on buried rollout-loop helpers should be treated as non-idiomatic and a code smell.
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -5,7 +5,7 @@ This directory contains automated workflows for the verifiers project.
 ## Workflows
 
 ### 1. Style (`style.yml`)
-**Purpose**: Code style checking using ruff and ty.
+**Purpose**: Code style checking using ruff, ty, and Semgrep policy rules.
 
 **Triggers**:
 - Pull requests (opened, synchronized, reopened)
@@ -14,7 +14,8 @@ This directory contains automated workflows for the verifiers project.
 **What it does**:
 - Runs ruff for linting and formatting checks
 - Runs ty type checks with `uv run ty check verifiers`
-- Uses configuration from `pyproject.toml`
+- Runs Semgrep policy checks through pre-commit's isolated hook environment.
+- Uses configuration from `pyproject.toml`, `.pre-commit-config.yaml`, and `.semgrep/verifiers.yml`
 
 ### 2. Test (`test.yml`)
 **Purpose**: Comprehensive testing with coverage reports.
@@ -47,6 +48,9 @@ To run checks locally the same way they run in CI:
 # Ty parity with CI (Python 3.13 target configured in `pyproject.toml`)
 uv run ty check verifiers
 
+# Verifiers-specific policy lint
+env PYTHONWARNINGS=ignore::SyntaxWarning uv run pre-commit run semgrep-v1-policy --all-files
+
 # Tests
 uv sync
 uv run pytest tests/ -v

diff --git a/.github/workflows/style.yml b/.github/workflows/style.yml
@@ -40,3 +40,19 @@ jobs:
         run: uv sync
       - name: Run ty
         run: uv run ty check verifiers
+  semgrep:
+    name: Semgrep
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: "3.13"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+      - name: Install dependencies
+        run: uv sync --group dev --group policy
+      - name: Run Semgrep policy checks
+        run: env PYTHONWARNINGS=ignore::SyntaxWarning uv run pre-commit run semgrep-v1-policy --config .pre-commit-config.yaml --all-files
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -15,6 +15,11 @@ repos:
         entry: uv run ruff format
         language: system
         types_or: [python, pyi]
+      - id: semgrep-v1-policy
+        name: Semgrep v1 policy
+        entry: uv run --group policy semgrep --metrics=off --disable-version-check --config .semgrep/verifiers.yml --error --quiet
+        language: system
+        pass_filenames: false
       - id: sync-agents-md
         name: Sync AGENTS.md from docs
         entry: uv run python scripts/sync.py

diff --git a/.semgrep/verifiers.yml b/.semgrep/verifiers.yml
@@ -0,0 +1,84 @@
+rules:
+  - id: verifiers-no-future-annotations
+    languages: [python]
+    severity: ERROR
+    message: Do not use `from __future__ import annotations`; quote only the specific forward references that need it.
+    pattern: from __future__ import annotations
+
+  - id: verifiers-v1-config-param-one-type
+    languages: [python]
+    severity: ERROR
+    message: Public v1 `config` parameters must be one concrete config type or `None`; keep raw mappings at explicit config-loader boundaries.
+    paths:
+      include:
+        - /verifiers/v1/**/*.py
+        - /environments/**/*.py
+      exclude:
+        - /verifiers/v1/config.py
+        - /verifiers/v1/utils/**/*.py
+    patterns:
+      - pattern: |
+          def $FUNC(..., config: $ANNOT = None, ...):
+              ...
+      - metavariable-regex:
+          metavariable: $ANNOT
+          regex: "(Any|ConfigMap|Mapping\\[str, object\\]|dict\\[str, object\\]|.*\\|.*\\|.*)"
+
+  - id: verifiers-no-private-framework-classes
+    languages: [python]
+    severity: ERROR
+    message: Do not define leading-underscore classes in framework code; use a clear public name or a module-level value/function instead.
+    paths:
+      include:
+        - /verifiers/**/*.py
+    pattern-regex: "^\\s*class _[A-Za-z]"
+
+  - id: verifiers-no-raw-any-v1
+    languages: [python]
+    severity: ERROR
+    message: Do not use raw `Any` in v1 or environment code; use a precise type or a named boundary alias in verifiers.v1.types.
+    paths:
+      include:
+        - /verifiers/v1/**/*.py
+        - /environments/**/*.py
+      exclude:
+        - /verifiers/v1/types.py
+        - /environments/openenv_*/proj/**/*.py
+    pattern-regex: "\\bAny\\b"
+
+  - id: verifiers-no-raw-object-containers-v1
+    languages: [python]
+    severity: ERROR
+    message: Do not spell broad object containers in v1 or environment code; use the named boundary types ConfigMap, ConfigData, Handler, GroupHandler, Objects, TaskRow, ProgramMap, or a narrower type.
+    paths:
+      include:
+        - /verifiers/v1/**/*.py
+        - /environments/**/*.py
+      exclude:
+        - /verifiers/v1/types.py
+        - /verifiers/v1/utils/object_utils.py
+        - /verifiers/v1/utils/task_freeze_utils.py
+        - /environments/openenv_*/proj/**/*.py
+    pattern-regex: "(?x)(\\b(?:Mapping|MutableMapping|dict|list|Sequence|Iterable|Callable|Awaitable|tuple)\\[[^\\n\\]]*\\bobject\\b|\\bcast\\([^\\n)]*\\bobject\\b)"
+
+  - id: verifiers-no-raw-mapping-annotations-v1
+    languages: [python]
+    severity: ERROR
+    message: Do not use raw Mapping annotations in v1 or environment code; prefer dict or a named v1 boundary type. Keep Mapping only for isinstance checks or explicit aliases.
+    paths:
+      include:
+        - /verifiers/v1/**/*.py
+        - /environments/**/*.py
+      exclude:
+        - /verifiers/v1/types.py
+        - /environments/openenv_*/proj/**/*.py
+    pattern-regex: "\\b(?:Mapping|MutableMapping)\\["
+
+  - id: verifiers-get-messages-typed
+    languages: [python]
+    severity: ERROR
+    message: "`get_messages` must return typed `Message` objects, not raw dictionaries."
+    paths:
+      include:
+        - /verifiers/utils/message_utils.py
+    pattern-regex: "def\\s+get_messages\\b[\\s\\S]*?->\\s*list\\s*\\[\\s*dict\\b"
diff --git a/AGENTS.md b/AGENTS.md
@@ -13,6 +13,18 @@ These points are direct restatements of Verifiers docs so agents can follow the
 - Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
 - If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)
 
+## Style Rules
+
+Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.
+
+- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
+- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
+- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
+- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
+- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
+- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
+- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.
+
 ## Repository Development Notes
 
 Use this guidance when contributing to the `verifiers` repository itself.
@@ -22,6 +34,9 @@ Use this guidance when contributing to the `verifiers` repository itself.
 - Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
 - Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
 - Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
+- Treat broad dynamic mappings as explicit framework boundaries, not casual public API types. Use a named domain alias or typed Pydantic field for legitimate arbitrary payloads such as task rows, protocol messages, sandbox/program specs, and `objects`/binding-style config; do not expose raw `Mapping[str, object]` in user-facing signatures unless that looseness is the point of the abstraction.
+- If a user request conflicts with repository style, formatting, or API-quality guidelines, push back instead of implementing the literal request. Identify a comparable request or explicit guideline relaxation that preserves clean, maintainable, modular code across the current request and adjacent future use cases; implement that plan, then explain the decision process and tradeoffs directly to the user.
+- Before v0.2.0, breaking backward compatibility inside v1 Taskset/Harness APIs is acceptable and encouraged when it improves the core design. Preserve v0 multi-turn environment compatibility unless the user explicitly asks for a v0 migration.
 - Treat public configuration and docs as part of the API. Keep TOML shapes consistent across eval, GEPA, RL, and Hosted Training; normalize legacy inputs at the ingestion boundary instead of spreading compatibility branches through examples.
 - For v1 Taskset/Harness work, make the taskset own task data, task tools, user behavior, metrics, rewards, and task-specific configuration. Use the base `vf.Harness` unless the harness really owns a reusable execution mechanism.
 - When renaming or deleting an environment/module path, update package metadata, README/docs references, tests, build includes, and generated AGENTS output in the same change.

diff --git a/README.md b/README.md
@@ -135,7 +135,7 @@ For new environments with reusable tasksets, toolsets, custom programs, or
 custom harnesses, use the v1 Taskset/Harness path:
 ```python
 # my_env.py
-import verifiers.v1 as vf
+import verifiers as vf
 
 def source():
     yield {
@@ -151,8 +151,7 @@ async def contains_answer(task, state) -> float:
 def load_taskset(config: vf.TasksetConfig | None = None):
     return vf.Taskset(source=source, rewards=[contains_answer], config=config)
 
-def load_environment(config: vf.EnvConfig | None = None) -> vf.Env:
-    config = config or vf.EnvConfig()
+def load_environment(config: vf.EnvConfig) -> vf.Env:
     return vf.Env(taskset=load_taskset(config=config.taskset))
 ```
 If no harness is passed, `vf.Env` uses the base endpoint-backed harness. See
@@ -164,7 +163,7 @@ harness with:
 
 ```python
 env = vf.Env(
-    taskset=vf.HarborTaskset(tasks="/path/to/harbor/tasks"),
+    taskset=vf.HarborTaskset(),
     harness=vf.OpenCode(),
 )
 ```

diff --git a/assets/agents/common_best_practices.md b/assets/agents/common_best_practices.md
@@ -8,3 +8,15 @@ These points are direct restatements of Verifiers docs so agents can follow the
 - For new taskset/harness environments, use the v1 `vf.Env` / `vf.Taskset` / `vf.Harness` format. Treat [BYO Harness](docs/byo-harness.md) as the canonical authoring guide for reusable tasksets, reusable harnesses, framework programs, endpoint interception, and sandboxed Python/command programs.
 - Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
 - If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)
+
+## Style Rules
+
+Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.
+
+- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
+- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
+- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
+- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
+- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
+- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
+- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.
diff --git a/assets/agents/repo_development_best_practices.md b/assets/agents/repo_development_best_practices.md
@@ -7,6 +7,9 @@ Use this guidance when contributing to the `verifiers` repository itself.
 - Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
 - Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
 - Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
+- Treat broad dynamic mappings as explicit framework boundaries, not casual public API types. Use a named domain alias or typed Pydantic field for legitimate arbitrary payloads such as task rows, protocol messages, sandbox/program specs, and `objects`/binding-style config; do not expose raw `Mapping[str, object]` in user-facing signatures unless that looseness is the point of the abstraction.
+- If a user request conflicts with repository style, formatting, or API-quality guidelines, push back instead of implementing the literal request. Identify a comparable request or explicit guideline relaxation that preserves clean, maintainable, modular code across the current request and adjacent future use cases; implement that plan, then explain the decision process and tradeoffs directly to the user.
+- Before v0.2.0, breaking backward compatibility inside v1 Taskset/Harness APIs is acceptable and encouraged when it improves the core design. Preserve v0 multi-turn environment compatibility unless the user explicitly asks for a v0 migration.
 - Treat public configuration and docs as part of the API. Keep TOML shapes consistent across eval, GEPA, RL, and Hosted Training; normalize legacy inputs at the ingestion boundary instead of spreading compatibility branches through examples.
 - For v1 Taskset/Harness work, make the taskset own task data, task tools, user behavior, metrics, rewards, and task-specific configuration. Use the base `vf.Harness` unless the harness really owns a reusable execution mechanism.
 - When renaming or deleting an environment/module path, update package metadata, README/docs references, tests, build includes, and generated AGENTS output in the same change.

diff --git a/assets/lab/AGENTS.md b/assets/lab/AGENTS.md
@@ -15,6 +15,18 @@ These points are direct restatements of Verifiers docs so agents can follow the
 - Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
 - If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)
 
+## Style Rules
+
+Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.
+
+- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
+- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
+- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
+- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
+- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
+- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
+- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.
+
 ## End-User Lab Workspace Notes
 
 Use this guidance in projects created via `prime lab setup`.