codesteward-taint

Taint-flow analysis engine for the Codesteward platform. Traces how untrusted input propagates through a codebase to dangerous operations and writes queryable TAINT_FLOW edges into the Codesteward graph.

What it does

codesteward-taint answers one critical question:

Can user-controlled input reach a dangerous operation without being sanitized first?

It analyzes your codebase at three progressively deeper levels:

Level	Name	What it does
1	Call graph traversal	Follows the `CALLS` graph to find multi-hop paths from known sources to known sinks. Fast and language-agnostic — runs entirely as graph queries.
2	Intra-procedural propagation	Reads source code to verify that tainted values actually reach the sink through variable assignments, interpolation, and transformations. Eliminates Level 1 false positives.
3	Path-sensitive CFG analysis	Builds a control flow graph per function and tracks taint along individual execution branches. Catches sanitizers that exist in a function but are on a different branch than the tainted value — a class of vulnerability simpler tools miss entirely.

Results are written as TAINT_FLOW edges into the Neo4j graph, queryable alongside the structural edges (CALLS, GUARDED_BY, PROTECTED_BY) produced by codesteward-graph. This enables cross-concern queries like:

"Which unprotected endpoints have a taint path to a SQL sink?"

Supported languages and frameworks

Language	Frameworks
Python	FastAPI, Flask, Django
TypeScript / JavaScript	Express, NestJS
Java	Spring MVC
Go	`net/http`, Gin, Echo

Additional frameworks can be added via custom catalog files — see catalogs/CATALOGS.md.

Requirements

codesteward-graph must have run first — codesteward-taint reads the CALLS graph it produces
Neo4j 5.x (optional — --stub mode runs analysis without writing results)
The repository must be accessible on disk (required for Level 2 and Level 3 source analysis)

Installation

Download binary

Pre-built binaries are available on the releases page.

# macOS (Apple Silicon)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-arm64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint

# macOS (Intel)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-amd64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint

# Linux (amd64)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-linux-amd64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint

# Windows (PowerShell)
Invoke-WebRequest -Uri https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-windows-amd64.exe `
                  -OutFile "$Env:LOCALAPPDATA\Programs\codesteward-taint.exe"

Add %LOCALAPPDATA%\Programs to your PATH if it isn't already, then open a new terminal.

Docker

The codesteward-mcp Docker image ships with codesteward-taint pre-installed. No separate installation is needed when using Docker.

Verify

codesteward-taint --version

Quick start

# 1. Build the codebase graph first (if not already done)
codesteward-mcp graph_rebuild --repo-path /path/to/your/repo

# 2. Run taint analysis
codesteward-taint \
  --neo4j-uri      bolt://localhost:7687 \
  --neo4j-password your-password \
  --repo-id        my-repo \
  --repo-path      /path/to/your/repo \
  --frameworks     fastapi

Results are written to Neo4j and printed to stdout:

status: ok
duration_ms: 1240
levels_run: [1, 2, 3]
catalog_names: [fastapi, _common]
paths_unsafe: 3
paths_sanitized: 8
findings:
  - source_name: user_id
    source_file: src/routes/users.py
    source_line: 14
    sink_name: execute
    sink_file: src/db/queries.py
    sink_line: 88
    cwe: CWE-89
    hops: 3
    level: 3
    sanitized: false
  - source_name: search_query
    source_file: src/routes/search.py
    source_line: 31
    sink_name: subprocess.run
    sink_file: src/indexer/cli.py
    sink_line: 55
    cwe: CWE-78
    hops: 2
    level: 2
    sanitized: false

Once analysis has run, findings are queryable via the Codesteward MCP server:

codebase_graph_query(query_type="semantic", query="")

Usage

codesteward-taint [flags]

Flags:
  --neo4j-uri       string   Neo4j bolt URI (default: bolt://localhost:7687)
  --neo4j-user      string   Neo4j username (default: neo4j)
  --neo4j-password  string   Neo4j password
  --tenant-id       string   Tenant namespace (default: local)
  --repo-id         string   Repository identifier (required)
  --repo-path       string   Absolute path to repository on disk (required)
  --frameworks      string   Comma-separated catalog names, e.g. fastapi,express
                             Omit to auto-detect from languages in the graph
  --max-hops        int      Maximum call depth for Level 1 traversal (default: 8)
  --level           int      Maximum analysis level: 1, 2, or 3 (default: 3)
  --persist-cfg     bool     Write control flow graph nodes to Neo4j,
                             enabling path-level queries (default: false)
  --include-safe         bool     Include sanitized paths in output (default: false)
  --catalog-dir          string   Path to additional custom catalog YAML files
  --no-builtin-catalogs  bool     Disable built-in catalogs; use only --catalog-dir
                                  (default: false)
  --output               string   Output format: yaml | json | text (default: yaml)
  --stub            bool     Analyze without writing results to Neo4j
  --log-level       string   debug | info | warn | error (default: info)
  --version                  Print version and exit

Run only Level 1 (fast, graph-only)

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --level 1

Dry run without Neo4j

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --stub

Persist CFG for advanced path queries

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --persist-cfg

With --persist-cfg, the control flow graph is written to Neo4j as BasicBlock nodes and CFG_EDGE relationships, enabling Cypher queries that traverse execution paths within functions — for example, determining what branch conditions must hold for a sink to be reachable.

Custom catalogs

Catalogs define which function names are treated as taint sources, sinks, and sanitizers. The built-in catalogs for supported frameworks are published in the catalogs/ directory — browse them to understand coverage or use them as templates.

For the full catalog schema, template, and usage examples see catalogs/CATALOGS.md.

Integration with codesteward-mcp

When codesteward-taint is installed and on PATH, the Codesteward MCP server automatically registers a taint_analysis tool. AI agents can invoke it directly:

taint_analysis(repo_id="my-repo", frameworks=["fastapi"])

After it completes, codebase_graph_query(query_type="semantic") returns taint findings. Combined with referential queries, this enables prompts like:

"Find all API endpoints that have no auth guard and a taint path to a SQL sink."

which the agent resolves with two graph queries and no manual code reading.

Contributing

Catalog contributions and bug reports are welcome — see CONTRIBUTING.md.

Exit codes

Code	Meaning
`0`	Analysis completed (findings may or may not exist)
`1`	Fatal error (Neo4j unreachable, repo path not found, etc.)
`2`	Invalid arguments

License

codesteward-taint is proprietary software distributed as a compiled binary. The binary may be used freely as part of the Codesteward platform. Redistribution, reverse engineering, and modification are not permitted. See LICENSE for full terms.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
catalogs		catalogs
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codesteward-taint

What it does

Supported languages and frameworks

Requirements

Installation

Download binary

Docker

Verify

Quick start

Usage

Run only Level 1 (fast, graph-only)

Dry run without Neo4j

Persist CFG for advanced path queries

Custom catalogs

Integration with codesteward-mcp

Contributing

Exit codes

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

codesteward-taint

What it does

Supported languages and frameworks

Requirements

Installation

Download binary

Docker

Verify

Quick start

Usage

Run only Level 1 (fast, graph-only)

Dry run without Neo4j

Persist CFG for advanced path queries

Custom catalogs

Integration with codesteward-mcp

Contributing

Exit codes

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!