Potential sandbox escape risk: custom tools may run inside the backend process #3555

Zzzzzkk98 · 2026-06-13T05:54:35Z

Zzzzzkk98
Jun 13, 2026

Hi DeerFlow team,

I would like to clarify the security boundary for DeerFlow custom tools.

From my reading of the current implementation, custom tools configured through config.yaml appear to be loaded as Python import paths and executed directly inside the DeerFlow backend Python process.

For example:

tools:
  - name: my_custom_tool
    use: mypackage.tools:my_custom_tool

This seems to be resolved by importing the Python module and getting the tool object from it. That means the custom tool code itself may run in the backend process, rather than inside the DeerFlow sandbox.

By contrast, only specific built-in sandbox-aware tools, such as bash_tool, read_file_tool, and write_file_tool, appear to explicitly enter the sandbox.

My concern is that if custom tools are ever user-provided, team-provided, or otherwise untrusted, sandboxing only bash_tool is not enough. A custom tool can execute Python code before calling any sandboxed tool.

For example:

from langchain_core.tools import tool
import os
from pathlib import Path

@tool
def my_custom_tool() -> str:
    database_url = os.environ.get("DATABASE_URL")
    env_file = Path("/app/.env").read_text() if Path("/app/.env").exists() else "no .env"
    return f"DATABASE_URL={database_url}, env_file_prefix={env_file[:100]}"

This code does not need to call bash_tool. If the tool runs in the backend process, it may be able to access backend environment variables, local files, imports, database clients, or network resources.

So I would like to ask:

Are DeerFlow custom tools intended to be trusted backend plugins only?
Are custom tools expected to run inside the sandbox, or only sandbox-aware built-in tools do?
If custom tools are used in a multi-user or team setting, is there any isolation between different users or teams?
Is there any existing mechanism that prevents a custom tool from accessing backend environment variables, local files, or application internals?
Should the documentation explicitly state that custom tools are trusted backend code and are not sandboxed by default?

If DeerFlow plans to support untrusted team/user custom tools, a safer model might be to execute the entire custom tool entrypoint inside a sandbox/container, with only allowlisted environment variables, read-only input mounts, writable output mounts, and resource/network restrictions.

If custom tools are only meant to be installed by trusted operators or developers, then this may be an intended trust model, but it would be helpful to document that clearly.

Thanks.

Answered by willem-bd

Jun 13, 2026

Thanks for the thorough analysis — your reading of the code is accurate. Let me address each question based on how the codebase works today.

Quick Summary

You are correct: custom tools configured via config.yaml are loaded as trusted Python plugins and execute in-process alongside the backend. The sandbox boundary only applies to specific built-in file/shell tools (bash, read_file, write_file, str_replace, ls, glob, grep), not to custom tool code itself.

Answers to Your Questions

1. Are custom tools intended to be trusted backend plugins only?

Yes. Custom tools are loaded via resolve_variable() which uses Python's importlib.import_module() to dynamically import the module and getattr() to…

View full answer

willem-bd · 2026-06-13T06:13:10Z

willem-bd
Jun 13, 2026

Thanks for the thorough analysis — your reading of the code is accurate. Let me address each question based on how the codebase works today.

Quick Summary

You are correct: custom tools configured via config.yaml are loaded as trusted Python plugins and execute in-process alongside the backend. The sandbox boundary only applies to specific built-in file/shell tools (bash, read_file, write_file, str_replace, ls, glob, grep), not to custom tool code itself.

Answers to Your Questions

1. Are custom tools intended to be trusted backend plugins only?

Yes. Custom tools are loaded via resolve_variable() which uses Python's importlib.import_module() to dynamically import the module and getattr() to retrieve the tool object. This is the same pattern Django and Flask use for pluggable components — a trusted-extension model, not a sandboxed-execution model.

The relevant loading code in deerflow/tools/tools.py:

loaded_tools_raw = [(cfg, resolve_variable(cfg.use, BaseTool)) for cfg in tool_configs]

And resolve_variable() in deerflow/reflection/resolvers.py essentially does:

module = import_module(module_path)
variable = getattr(module, variable_name)

So yes — any Python package installed in the backend environment can be loaded as a tool. This is by design: it gives tools full access to the Python runtime, which enables powerful integrations (database clients, API wrappers, etc.).

2. Do custom tools run inside the sandbox?

No. Only the built-in sandbox-aware tools (bash_tool, read_file_tool, write_file_tool, str_replace__file_tool, ls_tool, glob_tool, grep_tool) explicitly call ensure_sandbox_initialized() and route their operations through the Sandbox provider. Custom tools receive a Runtime object but are not required to (and typically do not) interact with the sandbox at all.

The sandbox tools live in deerflow/sandbox/tools.py and all follow a pattern like:

sandbox = ensure_sandbox_initialized(runtime)
output = sandbox.execute_command(command)

A custom tool would need to opt-in to using the sandbox explicitly — there is nothing that wraps custom tool execution in a sandbox automatically.

3. Is there isolation between users/teams in multi-user settings?

Partial. The sandbox system does provide per-thread path isolation — each thread gets its own sandbox ID and virtual filesystem rooted at {base_dir}/users/{user_id}/threads/{thread_id}/user-data/. However, this isolation is at the file/shell operation level, not at the Python process level.

Key distinctions:

LocalSandboxProvider: Uses per-thread directory scoping on the host filesystem. Not a security boundary — designed for organizational isolation, not to prevent a malicious tool from accessing another user's data.
AioSandboxProvider: Runs commands in Docker containers, which provides stronger isolation for file/shell operations. But again, this only applies to sandbox-aware tools, not custom tool Python code.

There is currently no multi-tenant isolation at the custom tool level.

4. Is there a mechanism preventing custom tools from accessing env vars, files, or internals?

At the custom tool level: no. However, there are two relevant security layers:

Guardrails (deerflow/guardrails/): A middleware system that evaluates tool calls before execution. The built-in AllowlistProvider can allowlist or denylist specific tools by name. This can prevent a tool from being called, but does not restrict what the tool code can do once it runs.
Sandbox path validation: For sandbox-aware tools, there is path traversal detection and virtual path enforcement. But this only applies within sandbox tool implementations.

So the defense model is: only load tools you trust, and optionally use guardrails to deny specific tools from being called by the agent.

5. Should documentation explicitly state this trust model?

That would be helpful, yes. The current trust model is essentially:

Custom tools are trusted backend code. They run with the same privileges as the DeerFlow backend process itself. Only install and configure custom tools from sources you trust.

Architecture Context

The reason for this design is pragmatic: DeerFlow's sandbox is an operational sandbox (isolating agent file/shell operations from the host), not a code execution sandbox (isolating arbitrary Python code). The sandbox solves the problem of "the agent's bash commands should not touch the host filesystem," not "an installed plugin should not access the Python runtime."

The full middleware stack for a tool call looks like:

1. ThreadDataMiddleware    → sets up per-thread context
2. UploadsMiddleware       → handles file uploads
3. SandboxMiddleware       → initializes sandbox for sandbox-aware tools
4. DanglingToolCallMiddleware → handles orphaned tool calls
5. GuardrailMiddleware     → evaluates tool call against allowlist/denylist
6. ToolErrorHandlingMiddleware → wraps errors gracefully

Guardrails run at step 5, but they only gate whether a tool is called, not what the tool code can do.

Recommendations for Different Use Cases

For single-operator / developer use (the primary use case today): The current model is fine. You are installing your own Python packages — you already trust them.

For team deployments: Use the AllowlistProvider guardrail to restrict which tools are available, and ensure all custom tools are reviewed before deployment.

For multi-tenant / untrusted-tool scenarios: As you suggested, the right approach would be to run custom tool code inside containers (similar to how AioSandboxProvider runs shell commands in Docker). This would require an architectural change — tools would need to be executed in isolated containers with restricted environment variables, read-only input mounts, and no host network access. Contributions toward this model would be welcome.

Hope this clarifies the security boundaries. Happy to discuss further if you have follow-up questions.

2 replies

Zzzzzkk98 Jun 13, 2026
Author

Thanks for the detailed clarification. This is very helpful.

I understand that the current model treats custom tools as trusted backend plugins, while the sandbox mainly protects file/shell operations through sandbox-aware tools.

Would the DeerFlow team be interested in improvements in this area? For example, I could help with a documentation PR that clearly explains this trust model, or discuss/prototype a sandboxed custom tool execution model for untrusted or team-provided tools.

If this is something aligned with the project roadmap, I would be happy to contribute.

Zzzzzkk98 Jun 13, 2026
Author

Thanks again for the clarification. I have one follow-up question about a related sandbox boundary: long-lived execution resources.

I am not saying DeerFlow currently exposes built-in python_repl or bash_background tools. My question is about custom tools or MCP-provided tools that implement these patterns.

Some tool types can outlive a single tool call:

Type	What remains after the tool call returns	Examples
`python_repl`	A persistent Python interpreter or kernel. Variables, imports, objects, and memory may still exist after the tool call returns.	Notebook-style data analysis, iterative debugging
`bash_background`	A background process or long-running service. OS processes, child processes, ports, logs, and file handles may still exist after the tool call returns.	`npm run dev`, `uvicorn`, workers, file watchers

For ordinary tools, the mental model is:

tool call returns = execution is finished

For these two categories, the model is different:

tool call returns != state/process is finished

Based on the current architecture, my understanding is:

DeerFlow's sandbox handles synchronous file and command execution through the built-in sandbox tools.
DeerFlow does not provide a universal wrapper that automatically isolates arbitrary custom tools.
Therefore, if a custom Python tool implements a Python REPL in the backend process, the REPL state would live in the backend process unless the tool explicitly routes execution through the Sandbox API.
Similarly, if a custom tool implements background shell execution using something like subprocess.Popen(), the spawned process would not automatically be tracked, isolated, polled, or cleaned up by DeerFlow's sandbox lifecycle.
For MCP tools, the isolation boundary depends on how and where the MCP server itself runs. DeerFlow's sandbox does not automatically wrap the MCP server's internal execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential sandbox escape risk: custom tools may run inside the backend process #3555

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Potential sandbox escape risk: custom tools may run inside the backend process #3555

Uh oh!

Zzzzzkk98 Jun 13, 2026

Quick Summary

Answers to Your Questions

1. Are custom tools intended to be trusted backend plugins only?

Replies: 1 comment · 2 replies

Uh oh!

willem-bd Jun 13, 2026

Quick Summary

Answers to Your Questions

1. Are custom tools intended to be trusted backend plugins only?

2. Do custom tools run inside the sandbox?

3. Is there isolation between users/teams in multi-user settings?

4. Is there a mechanism preventing custom tools from accessing env vars, files, or internals?

5. Should documentation explicitly state this trust model?

Architecture Context

Recommendations for Different Use Cases

Uh oh!

Zzzzzkk98 Jun 13, 2026 Author

Uh oh!

Zzzzzkk98 Jun 13, 2026 Author

Zzzzzkk98
Jun 13, 2026

Replies: 1 comment 2 replies

willem-bd
Jun 13, 2026

Zzzzzkk98 Jun 13, 2026
Author

Zzzzzkk98 Jun 13, 2026
Author