Add tools to `human_cli` by tadamcz · Pull Request #3053 · UKGovernmentBEIS/inspect_ai

tadamcz · 2026-01-13T14:23:01Z

This PR adds an optional tools: list[Tool] parameter to human_cli, that allows humans to call tools in the same way LMs can.

Motivation: to give humans and LMs the same tools, without having to maintain them in two different places. (Think of some complex custom tool; this isn't relevant for something like file editing tools)

Flagging my main design decisions (happy to make different choices on any of these):

CLI input: convert to argparse, but also provide raw JSON escape hatch

Named arguments for simple types (task tool addition --x 12 --y 34)
Maps JSON Schema types → argparse for simple types: str, int, float,
bool, and arrays of primitives.
JSON escape hatch for complex types (task tool db_lookup --raw-json-escape-hatch '{"config": {"nested": true}}')
Tools with complex parameter types (dicts, nested objects, unions) show
a help message directing users to use --raw-json-escape-hatch. The
JSON schema is displayed so users know what to pass.

Nesting under task keyword (i.e. task tool db_lookup, not tool db_lookup)
This seemed simplest with the current setup.
Result display
ToolResult can be str, int, images, audio, etc. Serialize types to string if possible; raise NotImplementedError for image/audio/video.

This PR contains:

(not very DRY)

jjallaire · 2026-01-13T16:37:16Z

Thank you! Great idea :-)

The one thing I'd like to see improved is making the argument handling at the CLI a bit more flexible. Perhaps we could:

(1) Support positional arguments
(2) Support multiple. named arguments separated by spaces, with YAML parsing of arguments (e.g. foo=true)

My reasoning is that users trying to write JSON will get kind of fiddly and error prone.

Related: we recently added "centaur mode" to Inspect SWE to let a human_cli() solver have access to Claude Code or Codex CLI: https://meridianlabs-ai.github.io/inspect_swe/claude_code.html#centaur-mode

tadamcz · 2026-01-13T17:51:47Z

OK that makes sense, let me think about improving the CLI API then before opening this PR! :-)

~~Just in case it's easy for you to answer: any idea why my test_human_cli_with_tools hangs forever, but test_human_cli does not?~~ (nvm, figured it out thanks to @tbroadley!)

…hangs!)

tadamcz · 2026-01-14T14:35:53Z

I've fixed the hanging test, now working on changing the CLI API

In argparse, an argument is either positional or named. You can't define one argument that accepts both task `tool addition 12 34` and `task tool addition --x 12 --y 34`. This is supported by libraries like `click`, but installing click and its dependencies in the human container doesn't seem worth it

…mands

misleading since users must use --raw-json-escape-hatch for everything anyway

… HumanAgentCommands work

tadamcz · 2026-01-14T19:33:20Z

Here's an attempt using argparse. There's a --raw-json-escape-hatch argument for passing raw JSON. Simple types are converted to argparse, but if a tool has for complex types raw JSON must be used.

However, it doesn't support positional arguments, all arguments must be passed as named arguments.

@jjallaire how much do you care about positional arguments? My impression is that argparse force each argument to be either positional or named, unlike Python functions where if I define def addition(x: int, y: int), x and y can be passed either as positional args or as kwargs.

(Separately, I think relying on CLI standards like argparse is preferable to introducing YAML parsing. Disagree?)

If at this point you'd prefer to take over the PR and do things differently I'd also be happy with that!

jjallaire · 2026-01-16T16:19:18Z

I'm fine with using cli args (no positional). The YAML was just do that string like "true" get properly typed (perhaps argparse already does that though?)

tadamcz · 2026-01-23T14:49:19Z

Got it!

argparse already handles integers properly when defined using parser.add_argument('--foo', type=int)

The standard way of handling booleans in argparse is by the presence/absence of flags, rather than by passing in True or False as strings. This is done with parser.add_argument('--foo', action='store_true'), i.e. if --foo is passed, set to Python True, else to Python False. This is what my PR already does here https://github.com/UKGovernmentBEIS/inspect_ai/pull/3053/files#diff-cf86947e84ea37fd3c9ff20195bad2baa196c90f72a77aeec1221fd6e65eaadbR364

The only case I can think of where this could be confusing is with a tool that has an optional boolean argument that defaults to True instead of False. Even in that case I would rather just do it the default argparse way, rather than doing a conversion with e.g. str2bool, or introducing --no-foo, which I aver would be more surprising to most users. Users can always use the JSON escape hatch for these cases as well.

I think in any case what we are doing here (automatically generating code for a CLI interface) is always going to be pretty imperfect.

idavidrein · 2026-01-23T17:57:17Z

src/inspect_ai/agent/_human/install.py

+        escape_hatch_preparse = dedent("""
+    # Pre-parse for --raw-json-escape-hatch (bypasses argparse validation)
+    import json
+    ESCAPE_HATCH = "--raw-json-escape-hatch"


What's the motivation behind calling it --raw-json-escape-hatch vs. the shorter/easier to type --json or --raw-json?

The name is verbose on purpose: less likely to conflict with an existing tool argument.

tadamcz added 4 commits January 12, 2026 17:02

initial work on human_cli tool calls

ac31591

use an explicit loop to satisfy mypy

e4b12d3

initial work on tests

0718e46

(not very DRY)

make e2e test stricter

2f8d349

tadamcz force-pushed the human-cli-tool-calls branch from 3855d31 to 2f8d349 Compare January 13, 2026 14:35

tadamcz added 3 commits January 14, 2026 13:02

Always call task start/submit to unblock eval thread (otherwise test …

2f5b681

…hangs!)

don't need this final assertion

1d9bbbd

test: factor out helper

b59281f

tadamcz added 13 commits January 14, 2026 17:30

WIP: move from raw JSON to argparse

2b67b60

WIP: move from raw JSON to argparse

065721f

WIP: move from raw JSON to argparse

c5c3b23

test: Json escape hatch only applied to task tool, not to other com…

cd5eaab

…mands

remove pointless service-side code; argparse handles this

e0cb27a

add test for complex types

3e44b95

If a tool has any complex params, showing the simple ones is

a7c3136

misleading since users must use --raw-json-escape-hatch for everything anyway

test for tool with no arguments

9d3ac73

move shared code into helper

bdcb931

if complex params, show full json schema in help

33c1dad

do json.loads on the CLI side; this is more consistent with how other…

fb2fd92

… HumanAgentCommands work

tadamcz added 3 commits January 23, 2026 14:49

Merge branch 'main' into human-cli-tool-calls

26bff7a

ruff format

86b354e

remove repetition

ea35cd2

tadamcz added 3 commits January 23, 2026 15:03

comment

9257241

fix collision when a tool has an argument called name

4d09880

add test documenting boolean behaviour

21fa2c9

tadamcz marked this pull request as ready for review January 23, 2026 15:46

idavidrein reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tools to `human_cli`#3053

Add tools to `human_cli`#3053
tadamcz wants to merge 26 commits intoUKGovernmentBEIS:mainfrom
tadamcz:human-cli-tool-calls

tadamcz commented Jan 13, 2026 •

edited

Loading

Uh oh!

jjallaire commented Jan 13, 2026

Uh oh!

tadamcz commented Jan 13, 2026 •

edited

Loading

Uh oh!

tadamcz commented Jan 14, 2026

Uh oh!

tadamcz commented Jan 14, 2026 •

edited

Loading

Uh oh!

jjallaire commented Jan 16, 2026

Uh oh!

tadamcz commented Jan 23, 2026 •

edited

Loading

Uh oh!

idavidrein Jan 23, 2026

Uh oh!

tadamcz Jan 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tadamcz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR contains:

Uh oh!

jjallaire commented Jan 13, 2026

Uh oh!

tadamcz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tadamcz commented Jan 14, 2026

Uh oh!

tadamcz commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjallaire commented Jan 16, 2026

Uh oh!

tadamcz commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idavidrein Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

tadamcz Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tadamcz commented Jan 13, 2026 •

edited

Loading

tadamcz commented Jan 13, 2026 •

edited

Loading

tadamcz commented Jan 14, 2026 •

edited

Loading

tadamcz commented Jan 23, 2026 •

edited

Loading

tadamcz Jan 23, 2026 •

edited

Loading