Taint-flow analysis engine for the Codesteward platform.
Traces how untrusted input propagates through a codebase to dangerous operations and writes queryable TAINT_FLOW edges into the Codesteward graph.
codesteward-taint answers one critical question:
Can user-controlled input reach a dangerous operation without being sanitized first?
It analyzes your codebase at three progressively deeper levels:
| Level | Name | What it does |
|---|---|---|
| 1 | Call graph traversal | Follows the CALLS graph to find multi-hop paths from known sources to known sinks. Fast and language-agnostic — runs entirely as graph queries. |
| 2 | Intra-procedural propagation | Reads source code to verify that tainted values actually reach the sink through variable assignments, interpolation, and transformations. Eliminates Level 1 false positives. |
| 3 | Path-sensitive CFG analysis | Builds a control flow graph per function and tracks taint along individual execution branches. Catches sanitizers that exist in a function but are on a different branch than the tainted value — a class of vulnerability simpler tools miss entirely. |
Results are written as TAINT_FLOW edges into the Neo4j graph, queryable alongside the structural edges (CALLS, GUARDED_BY, PROTECTED_BY) produced by codesteward-graph. This enables cross-concern queries like:
"Which unprotected endpoints have a taint path to a SQL sink?"
| Language | Frameworks |
|---|---|
| Python | FastAPI, Flask, Django |
| TypeScript / JavaScript | Express, NestJS |
| Java | Spring MVC |
| Go | net/http, Gin, Echo |
Additional frameworks can be added via custom catalog files — see catalogs/CATALOGS.md.
- codesteward-graph must have run first —
codesteward-taintreads theCALLSgraph it produces - Neo4j 5.x (optional —
--stubmode runs analysis without writing results) - The repository must be accessible on disk (required for Level 2 and Level 3 source analysis)
Pre-built binaries are available on the releases page.
# macOS (Apple Silicon)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-arm64 \
-o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint
# macOS (Intel)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-amd64 \
-o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint
# Linux (amd64)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-linux-amd64 \
-o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint# Windows (PowerShell)
Invoke-WebRequest -Uri https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-windows-amd64.exe `
-OutFile "$Env:LOCALAPPDATA\Programs\codesteward-taint.exe"Add %LOCALAPPDATA%\Programs to your PATH if it isn't already, then open a new terminal.
The codesteward-mcp Docker image ships with codesteward-taint pre-installed. No separate installation is needed when using Docker.
codesteward-taint --version# 1. Build the codebase graph first (if not already done)
codesteward-mcp graph_rebuild --repo-path /path/to/your/repo
# 2. Run taint analysis
codesteward-taint \
--neo4j-uri bolt://localhost:7687 \
--neo4j-password your-password \
--repo-id my-repo \
--repo-path /path/to/your/repo \
--frameworks fastapiResults are written to Neo4j and printed to stdout:
status: ok
duration_ms: 1240
levels_run: [1, 2, 3]
catalog_names: [fastapi, _common]
paths_unsafe: 3
paths_sanitized: 8
findings:
- source_name: user_id
source_file: src/routes/users.py
source_line: 14
sink_name: execute
sink_file: src/db/queries.py
sink_line: 88
cwe: CWE-89
hops: 3
level: 3
sanitized: false
- source_name: search_query
source_file: src/routes/search.py
source_line: 31
sink_name: subprocess.run
sink_file: src/indexer/cli.py
sink_line: 55
cwe: CWE-78
hops: 2
level: 2
sanitized: falseOnce analysis has run, findings are queryable via the Codesteward MCP server:
codebase_graph_query(query_type="semantic", query="")
codesteward-taint [flags]
Flags:
--neo4j-uri string Neo4j bolt URI (default: bolt://localhost:7687)
--neo4j-user string Neo4j username (default: neo4j)
--neo4j-password string Neo4j password
--tenant-id string Tenant namespace (default: local)
--repo-id string Repository identifier (required)
--repo-path string Absolute path to repository on disk (required)
--frameworks string Comma-separated catalog names, e.g. fastapi,express
Omit to auto-detect from languages in the graph
--max-hops int Maximum call depth for Level 1 traversal (default: 8)
--level int Maximum analysis level: 1, 2, or 3 (default: 3)
--persist-cfg bool Write control flow graph nodes to Neo4j,
enabling path-level queries (default: false)
--include-safe bool Include sanitized paths in output (default: false)
--catalog-dir string Path to additional custom catalog YAML files
--no-builtin-catalogs bool Disable built-in catalogs; use only --catalog-dir
(default: false)
--output string Output format: yaml | json | text (default: yaml)
--stub bool Analyze without writing results to Neo4j
--log-level string debug | info | warn | error (default: info)
--version Print version and exit
codesteward-taint --repo-id my-repo --repo-path /path/to/repo --level 1codesteward-taint --repo-id my-repo --repo-path /path/to/repo --stubcodesteward-taint --repo-id my-repo --repo-path /path/to/repo --persist-cfgWith --persist-cfg, the control flow graph is written to Neo4j as BasicBlock nodes and CFG_EDGE relationships, enabling Cypher queries that traverse execution paths within functions — for example, determining what branch conditions must hold for a sink to be reachable.
Catalogs define which function names are treated as taint sources, sinks, and sanitizers. The built-in catalogs for supported frameworks are published in the catalogs/ directory — browse them to understand coverage or use them as templates.
For the full catalog schema, template, and usage examples see catalogs/CATALOGS.md.
When codesteward-taint is installed and on PATH, the Codesteward MCP server automatically registers a taint_analysis tool. AI agents can invoke it directly:
taint_analysis(repo_id="my-repo", frameworks=["fastapi"])
After it completes, codebase_graph_query(query_type="semantic") returns taint findings. Combined with referential queries, this enables prompts like:
"Find all API endpoints that have no auth guard and a taint path to a SQL sink."
which the agent resolves with two graph queries and no manual code reading.
Catalog contributions and bug reports are welcome — see CONTRIBUTING.md.
| Code | Meaning |
|---|---|
0 |
Analysis completed (findings may or may not exist) |
1 |
Fatal error (Neo4j unreachable, repo path not found, etc.) |
2 |
Invalid arguments |
Copyright (c) 2026 bitkaio LLC. All rights reserved.
codesteward-taint is proprietary software distributed as a compiled binary. The binary may be used freely as part of the Codesteward platform. Redistribution, reverse engineering, and modification are not permitted. See LICENSE for full terms.
