Skip to content

Add tools to human_cli#3053

Open
tadamcz wants to merge 26 commits intoUKGovernmentBEIS:mainfrom
tadamcz:human-cli-tool-calls
Open

Add tools to human_cli#3053
tadamcz wants to merge 26 commits intoUKGovernmentBEIS:mainfrom
tadamcz:human-cli-tool-calls

Conversation

@tadamcz
Copy link
Contributor

@tadamcz tadamcz commented Jan 13, 2026

This PR adds an optional tools: list[Tool] parameter to human_cli, that allows humans to call tools in the same way LMs can.

Motivation: to give humans and LMs the same tools, without having to maintain them in two different places. (Think of some complex custom tool; this isn't relevant for something like file editing tools)

Flagging my main design decisions (happy to make different choices on any of these):

  • CLI input: convert to argparse, but also provide raw JSON escape hatch
  1. Named arguments for simple types (task tool addition --x 12 --y 34)
    Maps JSON Schema types → argparse for simple types: str, int, float,
    bool, and arrays of primitives.
  2. JSON escape hatch for complex types (task tool db_lookup --raw-json-escape-hatch '{"config": {"nested": true}}')
    Tools with complex parameter types (dicts, nested objects, unions) show
    a help message directing users to use --raw-json-escape-hatch. The
    JSON schema is displayed so users know what to pass.
  • Nesting under task keyword (i.e. task tool db_lookup, not tool db_lookup)
    This seemed simplest with the current setup.

  • Result display
    ToolResult can be str, int, images, audio, etc. Serialize types to string if possible; raise NotImplementedError for image/audio/video.

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

@tadamcz tadamcz force-pushed the human-cli-tool-calls branch from 3855d31 to 2f8d349 Compare January 13, 2026 14:35
@jjallaire
Copy link
Collaborator

Thank you! Great idea :-)

The one thing I'd like to see improved is making the argument handling at the CLI a bit more flexible. Perhaps we could:

(1) Support positional arguments
(2) Support multiple. named arguments separated by spaces, with YAML parsing of arguments (e.g. foo=true)

My reasoning is that users trying to write JSON will get kind of fiddly and error prone.

Related: we recently added "centaur mode" to Inspect SWE to let a human_cli() solver have access to Claude Code or Codex CLI: https://meridianlabs-ai.github.io/inspect_swe/claude_code.html#centaur-mode

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 13, 2026

OK that makes sense, let me think about improving the CLI API then before opening this PR! :-)

Just in case it's easy for you to answer: any idea why my test_human_cli_with_tools hangs forever, but test_human_cli does not? (nvm, figured it out thanks to @tbroadley!)

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 14, 2026

I've fixed the hanging test, now working on changing the CLI API

In argparse, an argument is
  either positional or named.

You can't define one argument that accepts both task `tool addition 12 34` and
  `task tool addition --x 12 --y 34`.

  This is supported by libraries like `click`, but installing click and its dependencies in the human container doesn't seem worth it
In argparse, an argument is
  either positional or named.

You can't define one argument that accepts both task `tool addition 12 34` and
  `task tool addition --x 12 --y 34`.

  This is supported by libraries like `click`, but installing click and its dependencies in the human container doesn't seem worth it
  misleading since users must use --raw-json-escape-hatch for everything anyway
@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 14, 2026

Here's an attempt using argparse. There's a --raw-json-escape-hatch argument for passing raw JSON. Simple types are converted to argparse, but if a tool has for complex types raw JSON must be used.

However, it doesn't support positional arguments, all arguments must be passed as named arguments.

@jjallaire how much do you care about positional arguments? My impression is that argparse force each argument to be either positional or named, unlike Python functions where if I define def addition(x: int, y: int), x and y can be passed either as positional args or as kwargs.

(Separately, I think relying on CLI standards like argparse is preferable to introducing YAML parsing. Disagree?)

If at this point you'd prefer to take over the PR and do things differently I'd also be happy with that!

@jjallaire
Copy link
Collaborator

I'm fine with using cli args (no positional). The YAML was just do that string like "true" get properly typed (perhaps argparse already does that though?)

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 23, 2026

Got it!

argparse already handles integers properly when defined using parser.add_argument('--foo', type=int)

The standard way of handling booleans in argparse is by the presence/absence of flags, rather than by passing in True or False as strings. This is done with parser.add_argument('--foo', action='store_true'), i.e. if --foo is passed, set to Python True, else to Python False. This is what my PR already does here https://github.com/UKGovernmentBEIS/inspect_ai/pull/3053/files#diff-cf86947e84ea37fd3c9ff20195bad2baa196c90f72a77aeec1221fd6e65eaadbR364

The only case I can think of where this could be confusing is with a tool that has an optional boolean argument that defaults to True instead of False. Even in that case I would rather just do it the default argparse way, rather than doing a conversion with e.g. str2bool, or introducing --no-foo, which I aver would be more surprising to most users. Users can always use the JSON escape hatch for these cases as well.

I think in any case what we are doing here (automatically generating code for a CLI interface) is always going to be pretty imperfect.

@tadamcz tadamcz marked this pull request as ready for review January 23, 2026 15:46
escape_hatch_preparse = dedent("""
# Pre-parse for --raw-json-escape-hatch (bypasses argparse validation)
import json
ESCAPE_HATCH = "--raw-json-escape-hatch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind calling it --raw-json-escape-hatch vs. the shorter/easier to type --json or --raw-json?

Copy link
Contributor Author

@tadamcz tadamcz Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is verbose on purpose: less likely to conflict with an existing tool argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants