Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .cursor/BUGBOT.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,13 @@ When these files change, verify and update any affected skill files:
- `skills/brainstorm/SKILL.md`

If workflow-relevant changes are detected without matching skill updates, request that the author update the impacted skills before merge.

## Environment Rollout Logic

Do not request library utilities solely because two or more environments contain similar message, state, or rollout-loop data manipulation. A few explicit lines inside an environment are often the clearest and most discoverable implementation.

In particular, do not suggest moving small helpers for selecting messages, extracting text from `state`, or juggling rollout-local fields into hidden library modules. Buried helpers are not easily discoverable by end users, clutter the public API when promoted, and make the docs responsible for enumerating every three-line convenience function.

Prefer explicit environment-local code unless the repeated logic is a framework contract, fixes a correctness bug at the boundary, or is already part of documented user-facing API. Do not ask authors to create one-off private helpers for simple rollout logic; if a few lines are used once, they should usually stay inline at the call site.

Helpers are acceptable when the logic is reused in multiple places, is a taskset-bound object that forms part of the environment contract, or is complex enough to deserve a named secondary module. Excess reliance on buried rollout-loop helpers should be treated as non-idiomatic and a code smell.
8 changes: 6 additions & 2 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This directory contains automated workflows for the verifiers project.
## Workflows

### 1. Style (`style.yml`)
**Purpose**: Code style checking using ruff and ty.
**Purpose**: Code style checking using ruff, ty, and Semgrep policy rules.

**Triggers**:
- Pull requests (opened, synchronized, reopened)
Expand All @@ -14,7 +14,8 @@ This directory contains automated workflows for the verifiers project.
**What it does**:
- Runs ruff for linting and formatting checks
- Runs ty type checks with `uv run ty check verifiers`
- Uses configuration from `pyproject.toml`
- Runs Semgrep policy checks through pre-commit's isolated hook environment.
- Uses configuration from `pyproject.toml`, `.pre-commit-config.yaml`, and `.semgrep/verifiers.yml`

### 2. Test (`test.yml`)
**Purpose**: Comprehensive testing with coverage reports.
Expand Down Expand Up @@ -47,6 +48,9 @@ To run checks locally the same way they run in CI:
# Ty parity with CI (Python 3.13 target configured in `pyproject.toml`)
uv run ty check verifiers

# Verifiers-specific policy lint
env PYTHONWARNINGS=ignore::SyntaxWarning uv run pre-commit run semgrep-v1-policy --all-files

# Tests
uv sync
uv run pytest tests/ -v
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/style.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,19 @@ jobs:
run: uv sync
- name: Run ty
run: uv run ty check verifiers
semgrep:
name: Semgrep
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.13"
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install dependencies
run: uv sync --group dev --group policy
- name: Run Semgrep policy checks
run: env PYTHONWARNINGS=ignore::SyntaxWarning uv run pre-commit run semgrep-v1-policy --config .pre-commit-config.yaml --all-files
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ repos:
entry: uv run ruff format
language: system
types_or: [python, pyi]
- id: semgrep-v1-policy
name: Semgrep v1 policy
entry: uv run --group policy semgrep --metrics=off --disable-version-check --config .semgrep/verifiers.yml --error --quiet
language: system
pass_filenames: false
- id: sync-agents-md
name: Sync AGENTS.md from docs
entry: uv run python scripts/sync.py
Expand Down
84 changes: 84 additions & 0 deletions .semgrep/verifiers.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
rules:
- id: verifiers-no-future-annotations
languages: [python]
severity: ERROR
message: Do not use `from __future__ import annotations`; quote only the specific forward references that need it.
pattern: from __future__ import annotations

- id: verifiers-v1-config-param-one-type
languages: [python]
severity: ERROR
message: Public v1 `config` parameters must be one concrete config type or `None`; keep raw mappings at explicit config-loader boundaries.
paths:
include:
- /verifiers/v1/**/*.py
- /environments/**/*.py
exclude:
- /verifiers/v1/config.py
- /verifiers/v1/utils/**/*.py
patterns:
- pattern: |
def $FUNC(..., config: $ANNOT = None, ...):
...
- metavariable-regex:
metavariable: $ANNOT
regex: "(Any|ConfigMap|Mapping\\[str, object\\]|dict\\[str, object\\]|.*\\|.*\\|.*)"

- id: verifiers-no-private-framework-classes
languages: [python]
severity: ERROR
message: Do not define leading-underscore classes in framework code; use a clear public name or a module-level value/function instead.
paths:
include:
- /verifiers/**/*.py
pattern-regex: "^\\s*class _[A-Za-z]"

- id: verifiers-no-raw-any-v1
languages: [python]
severity: ERROR
message: Do not use raw `Any` in v1 or environment code; use a precise type or a named boundary alias in verifiers.v1.types.
paths:
include:
- /verifiers/v1/**/*.py
- /environments/**/*.py
exclude:
- /verifiers/v1/types.py
- /environments/openenv_*/proj/**/*.py
pattern-regex: "\\bAny\\b"

- id: verifiers-no-raw-object-containers-v1
languages: [python]
severity: ERROR
message: Do not spell broad object containers in v1 or environment code; use the named boundary types ConfigMap, ConfigData, Handler, GroupHandler, Objects, TaskRow, ProgramMap, or a narrower type.
paths:
include:
- /verifiers/v1/**/*.py
- /environments/**/*.py
exclude:
- /verifiers/v1/types.py
- /verifiers/v1/utils/object_utils.py
- /verifiers/v1/utils/task_freeze_utils.py
- /environments/openenv_*/proj/**/*.py
pattern-regex: "(?x)(\\b(?:Mapping|MutableMapping|dict|list|Sequence|Iterable|Callable|Awaitable|tuple)\\[[^\\n\\]]*\\bobject\\b|\\bcast\\([^\\n)]*\\bobject\\b)"

- id: verifiers-no-raw-mapping-annotations-v1
languages: [python]
severity: ERROR
message: Do not use raw Mapping annotations in v1 or environment code; prefer dict or a named v1 boundary type. Keep Mapping only for isinstance checks or explicit aliases.
paths:
include:
- /verifiers/v1/**/*.py
- /environments/**/*.py
exclude:
- /verifiers/v1/types.py
- /environments/openenv_*/proj/**/*.py
pattern-regex: "\\b(?:Mapping|MutableMapping)\\["

- id: verifiers-get-messages-typed
languages: [python]
severity: ERROR
message: "`get_messages` must return typed `Message` objects, not raw dictionaries."
paths:
include:
- /verifiers/utils/message_utils.py
pattern-regex: "def\\s+get_messages\\b[\\s\\S]*?->\\s*list\\s*\\[\\s*dict\\b"
15 changes: 15 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ These points are direct restatements of Verifiers docs so agents can follow the
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)

## Style Rules

Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.

- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.

## Repository Development Notes

Use this guidance when contributing to the `verifiers` repository itself.
Expand All @@ -22,6 +34,9 @@ Use this guidance when contributing to the `verifiers` repository itself.
- Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
- Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
- Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
- Treat broad dynamic mappings as explicit framework boundaries, not casual public API types. Use a named domain alias or typed Pydantic field for legitimate arbitrary payloads such as task rows, protocol messages, sandbox/program specs, and `objects`/binding-style config; do not expose raw `Mapping[str, object]` in user-facing signatures unless that looseness is the point of the abstraction.
- If a user request conflicts with repository style, formatting, or API-quality guidelines, push back instead of implementing the literal request. Identify a comparable request or explicit guideline relaxation that preserves clean, maintainable, modular code across the current request and adjacent future use cases; implement that plan, then explain the decision process and tradeoffs directly to the user.
- Before v0.2.0, breaking backward compatibility inside v1 Taskset/Harness APIs is acceptable and encouraged when it improves the core design. Preserve v0 multi-turn environment compatibility unless the user explicitly asks for a v0 migration.
- Treat public configuration and docs as part of the API. Keep TOML shapes consistent across eval, GEPA, RL, and Hosted Training; normalize legacy inputs at the ingestion boundary instead of spreading compatibility branches through examples.
- For v1 Taskset/Harness work, make the taskset own task data, task tools, user behavior, metrics, rewards, and task-specific configuration. Use the base `vf.Harness` unless the harness really owns a reusable execution mechanism.
- When renaming or deleting an environment/module path, update package metadata, README/docs references, tests, build includes, and generated AGENTS output in the same change.
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ For new environments with reusable tasksets, toolsets, custom programs, or
custom harnesses, use the v1 Taskset/Harness path:
```python
# my_env.py
import verifiers.v1 as vf
import verifiers as vf

def source():
yield {
Expand All @@ -151,8 +151,7 @@ async def contains_answer(task, state) -> float:
def load_taskset(config: vf.TasksetConfig | None = None):
return vf.Taskset(source=source, rewards=[contains_answer], config=config)

def load_environment(config: vf.EnvConfig | None = None) -> vf.Env:
config = config or vf.EnvConfig()
def load_environment(config: vf.EnvConfig) -> vf.Env:
return vf.Env(taskset=load_taskset(config=config.taskset))
```
If no harness is passed, `vf.Env` uses the base endpoint-backed harness. See
Expand All @@ -164,7 +163,7 @@ harness with:

```python
env = vf.Env(
taskset=vf.HarborTaskset(tasks="/path/to/harbor/tasks"),
taskset=vf.HarborTaskset(),
harness=vf.OpenCode(),
)
```
Expand Down
12 changes: 12 additions & 0 deletions assets/agents/common_best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,15 @@ These points are direct restatements of Verifiers docs so agents can follow the
- For new taskset/harness environments, use the v1 `vf.Env` / `vf.Taskset` / `vf.Harness` format. Treat [BYO Harness](docs/byo-harness.md) as the canonical authoring guide for reusable tasksets, reusable harnesses, framework programs, endpoint interception, and sandboxed Python/command programs.
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)

## Style Rules

Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.

- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.
3 changes: 3 additions & 0 deletions assets/agents/repo_development_best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Use this guidance when contributing to the `verifiers` repository itself.
- Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
- Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
- Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
- Treat broad dynamic mappings as explicit framework boundaries, not casual public API types. Use a named domain alias or typed Pydantic field for legitimate arbitrary payloads such as task rows, protocol messages, sandbox/program specs, and `objects`/binding-style config; do not expose raw `Mapping[str, object]` in user-facing signatures unless that looseness is the point of the abstraction.
- If a user request conflicts with repository style, formatting, or API-quality guidelines, push back instead of implementing the literal request. Identify a comparable request or explicit guideline relaxation that preserves clean, maintainable, modular code across the current request and adjacent future use cases; implement that plan, then explain the decision process and tradeoffs directly to the user.
- Before v0.2.0, breaking backward compatibility inside v1 Taskset/Harness APIs is acceptable and encouraged when it improves the core design. Preserve v0 multi-turn environment compatibility unless the user explicitly asks for a v0 migration.
- Treat public configuration and docs as part of the API. Keep TOML shapes consistent across eval, GEPA, RL, and Hosted Training; normalize legacy inputs at the ingestion boundary instead of spreading compatibility branches through examples.
- For v1 Taskset/Harness work, make the taskset own task data, task tools, user behavior, metrics, rewards, and task-specific configuration. Use the base `vf.Harness` unless the harness really owns a reusable execution mechanism.
- When renaming or deleting an environment/module path, update package metadata, README/docs references, tests, build includes, and generated AGENTS output in the same change.
Expand Down
12 changes: 12 additions & 0 deletions assets/lab/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,18 @@ These points are direct restatements of Verifiers docs so agents can follow the
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)

## Style Rules

Use these rules when shaping user-facing Verifiers APIs, configs, and environment files.

- Prefer Verifiers-native interfaces over stdlib-pure plumbing in user code. A stdlib-pure expression that forces every environment to write path manipulation, import-resource handling, ad hoc discovery, or boilerplate constants is a style bug; put that logic behind a Verifiers abstraction instead.
- Keep user-facing APIs incredibly minimal and elegant. The best surface is usually golfy but intuitive: one obvious field, one obvious constructor, and no redundant knobs unless there is a concrete long-term reason.
- Use Pydantic config models wherever structured configuration is needed. Pydantic is always acceptable and preferred over loose dictionaries when it clarifies the contract.
- Prefer strict, narrow types. Use `object`, broad unions, or untyped mappings only at explicit framework boundaries where arbitrary user values are genuinely part of the contract.
- Basic environments should fit in a few dozen self-contained, idiomatic lines: import `verifiers`, define `load_environment`, pipe bindings/config through constructors, and keep policy values in config subclasses or literal constructor kwargs when needed.
- Environment modules should not define global helper functions. Put reusable logic in well-named utility modules, taskset/harness classes, toolsets, or small local classes owned by the abstraction. Rare exceptions are process-level handles, such as a lock or semaphore, when that is the only reasonable way to enforce the intended runtime control.
- Additional code should have a clear home. Do not hide utilities at the bottom of files or scatter one-off helpers through environment entrypoints.

## End-User Lab Workspace Notes

Use this guidance in projects created via `prime lab setup`.
Expand Down
Loading
Loading