Skip to content

Codesteward/codesteward-taint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codesteward-taint

Codesteward — Govern • Verify • Evolve

Taint-flow analysis engine for the Codesteward platform. Traces how untrusted input propagates through a codebase to dangerous operations and writes queryable TAINT_FLOW edges into the Codesteward graph.

License: ELv2 Platform Languages


What it does

codesteward-taint answers one critical question:

Can user-controlled input reach a dangerous operation without being sanitized first?

It analyzes your codebase at three progressively deeper levels:

Level Name What it does
1 Call graph traversal Follows the CALLS graph to find multi-hop paths from known sources to known sinks. Fast and language-agnostic — runs entirely as graph queries.
2 Intra-procedural propagation Reads source code to verify that tainted values actually reach the sink through variable assignments, interpolation, and transformations. Eliminates Level 1 false positives.
3 Path-sensitive CFG analysis Builds a control flow graph per function and tracks taint along individual execution branches. Catches sanitizers that exist in a function but are on a different branch than the tainted value — a class of vulnerability simpler tools miss entirely.

Results are written as TAINT_FLOW edges into the Neo4j graph, queryable alongside the structural edges (CALLS, GUARDED_BY, PROTECTED_BY) produced by codesteward-graph. This enables cross-concern queries like:

"Which unprotected endpoints have a taint path to a SQL sink?"


Supported languages and frameworks

Language Frameworks
Python FastAPI, Flask, Django
TypeScript / JavaScript Express, NestJS
Java Spring MVC
Go net/http, Gin, Echo

Additional frameworks can be added via custom catalog files — see catalogs/CATALOGS.md.


Requirements

  • codesteward-graph must have run first — codesteward-taint reads the CALLS graph it produces
  • Neo4j 5.x (optional — --stub mode runs analysis without writing results)
  • The repository must be accessible on disk (required for Level 2 and Level 3 source analysis)

Installation

Download binary

Pre-built binaries are available on the releases page.

# macOS (Apple Silicon)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-arm64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint

# macOS (Intel)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-darwin-amd64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint

# Linux (amd64)
curl -fsSL https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-linux-amd64 \
     -o /usr/local/bin/codesteward-taint
chmod +x /usr/local/bin/codesteward-taint
# Windows (PowerShell)
Invoke-WebRequest -Uri https://github.com/bitkaio/codesteward-taint/releases/latest/download/codesteward-taint-windows-amd64.exe `
                  -OutFile "$Env:LOCALAPPDATA\Programs\codesteward-taint.exe"

Add %LOCALAPPDATA%\Programs to your PATH if it isn't already, then open a new terminal.

Docker

The codesteward-mcp Docker image ships with codesteward-taint pre-installed. No separate installation is needed when using Docker.

Verify

codesteward-taint --version

Quick start

# 1. Build the codebase graph first (if not already done)
codesteward-mcp graph_rebuild --repo-path /path/to/your/repo

# 2. Run taint analysis
codesteward-taint \
  --neo4j-uri      bolt://localhost:7687 \
  --neo4j-password your-password \
  --repo-id        my-repo \
  --repo-path      /path/to/your/repo \
  --frameworks     fastapi

Results are written to Neo4j and printed to stdout:

status: ok
duration_ms: 1240
levels_run: [1, 2, 3]
catalog_names: [fastapi, _common]
paths_unsafe: 3
paths_sanitized: 8
findings:
  - source_name: user_id
    source_file: src/routes/users.py
    source_line: 14
    sink_name: execute
    sink_file: src/db/queries.py
    sink_line: 88
    cwe: CWE-89
    hops: 3
    level: 3
    sanitized: false
  - source_name: search_query
    source_file: src/routes/search.py
    source_line: 31
    sink_name: subprocess.run
    sink_file: src/indexer/cli.py
    sink_line: 55
    cwe: CWE-78
    hops: 2
    level: 2
    sanitized: false

Once analysis has run, findings are queryable via the Codesteward MCP server:

codebase_graph_query(query_type="semantic", query="")

Usage

codesteward-taint [flags]

Flags:
  --neo4j-uri       string   Neo4j bolt URI (default: bolt://localhost:7687)
  --neo4j-user      string   Neo4j username (default: neo4j)
  --neo4j-password  string   Neo4j password
  --tenant-id       string   Tenant namespace (default: local)
  --repo-id         string   Repository identifier (required)
  --repo-path       string   Absolute path to repository on disk (required)
  --frameworks      string   Comma-separated catalog names, e.g. fastapi,express
                             Omit to auto-detect from languages in the graph
  --max-hops        int      Maximum call depth for Level 1 traversal (default: 8)
  --level           int      Maximum analysis level: 1, 2, or 3 (default: 3)
  --persist-cfg     bool     Write control flow graph nodes to Neo4j,
                             enabling path-level queries (default: false)
  --include-safe         bool     Include sanitized paths in output (default: false)
  --catalog-dir          string   Path to additional custom catalog YAML files
  --no-builtin-catalogs  bool     Disable built-in catalogs; use only --catalog-dir
                                  (default: false)
  --output               string   Output format: yaml | json | text (default: yaml)
  --stub            bool     Analyze without writing results to Neo4j
  --log-level       string   debug | info | warn | error (default: info)
  --version                  Print version and exit

Run only Level 1 (fast, graph-only)

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --level 1

Dry run without Neo4j

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --stub

Persist CFG for advanced path queries

codesteward-taint --repo-id my-repo --repo-path /path/to/repo --persist-cfg

With --persist-cfg, the control flow graph is written to Neo4j as BasicBlock nodes and CFG_EDGE relationships, enabling Cypher queries that traverse execution paths within functions — for example, determining what branch conditions must hold for a sink to be reachable.


Custom catalogs

Catalogs define which function names are treated as taint sources, sinks, and sanitizers. The built-in catalogs for supported frameworks are published in the catalogs/ directory — browse them to understand coverage or use them as templates.

For the full catalog schema, template, and usage examples see catalogs/CATALOGS.md.


Integration with codesteward-mcp

When codesteward-taint is installed and on PATH, the Codesteward MCP server automatically registers a taint_analysis tool. AI agents can invoke it directly:

taint_analysis(repo_id="my-repo", frameworks=["fastapi"])

After it completes, codebase_graph_query(query_type="semantic") returns taint findings. Combined with referential queries, this enables prompts like:

"Find all API endpoints that have no auth guard and a taint path to a SQL sink."

which the agent resolves with two graph queries and no manual code reading.


Contributing

Catalog contributions and bug reports are welcome — see CONTRIBUTING.md.


Exit codes

Code Meaning
0 Analysis completed (findings may or may not exist)
1 Fatal error (Neo4j unreachable, repo path not found, etc.)
2 Invalid arguments

License

Copyright (c) 2026 bitkaio LLC. All rights reserved.

codesteward-taint is proprietary software distributed as a compiled binary. The binary may be used freely as part of the Codesteward platform. Redistribution, reverse engineering, and modification are not permitted. See LICENSE for full terms.

About

Taint-flow analysis engine for the Codesteward platform — traces untrusted input to dangerous sinks across your codebase using three-level analysis: call graph traversal, intra-procedural propagation, and path-sensitive CFG analysis.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors