Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions crates/ferro-airflow-dag-parser/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,46 @@ public API require a major bump.

## [Unreleased]

## [1.0.1] - 2026-06-16

Security (recursion-DoS hardening). No public-API change — additive, fully
semver-compatible.

### Security
- **Closed a parser stack-overflow DoS (FP5).** Parsing attacker-controlled
Python could overflow the vendored `littrs-ruff-python-parser` 0.6.2
recursive-descent parser, aborting the host process with a `SIGSEGV`
(`catch_unwind` cannot intercept a guard-page fault). The previous
pre-screen capped only bracket nesting (32) and *consecutive* single-operator
runs (64), so the non-bracket recursion vectors — `not`/`await` keyword
chains, `~`/`-`/`+` runs, right-associative `a**b**c`, `a if b else …`
conditional / `lambda:` chains, deeply nested compound statements,
*mixed* prefix-operator chains (fuzz Finding 2, `crash-0665b68…`),
`yield`/`yield from` chains, and `async async … def` error-recovery chains —
slipped through and overflowed the parser. The last three were surfaced by
the adversarial design-review (Codex DD) convergence pass and closed by
counting `Yield`/`From`/`Async` in the lexer recursion metric.

The fix ports FerroAir's complete three-layer recursion guard
(`ferroair-dag-parser`, FA1) into `panic_safe.rs`: (1) an iterative bracket
pre-scan (cap 256), (2) a single real-tokenizer pass that bounds combined
expression recursion (`brackets + operator-run + per-line right-recursion +
indent`, cap 1024) and rejects PEP-750 t-strings (which the parser panics
on), and (3) execution of the parse **and** AST walk on a dedicated 128 MiB
stack so the numeric cap — not the caller's ~2 MiB stack — is the binding
limit. The recursive AST walkers (`collect_shift_edges`, `stringify_expr`,
`resolve_to_task_id`, …) additionally truncate past a 1024 depth so a deep
left-leaning `>>` / attribute / call chain that survives the parser cannot
overflow the walk either.

This is not a claim of bulletproof input handling — see
`dd-pack/11-known-limitations.md` for the honest residual (a single
left-leaning chain of hundreds of thousands of trailers in a multi-MB file
can still overflow on recursive AST construction/drop, bounded by the
128 MiB stack). The realistic FP5 parser-recursion shapes (each well under
4 KiB) are fully closed, with regression tests under `tests/stack_safety.rs`
and an adversarial design-review (Codex DD) convergence pass.

## [1.0.0] - 2026-06-08

First semver-stable release; the public API is committed under semver.
Expand Down
2 changes: 1 addition & 1 deletion crates/ferro-airflow-dag-parser/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: Apache-2.0
[package]
name = "ferro-airflow-dag-parser"
version = "1.0.0"
version = "1.0.1"
description = "Static AST-based extractor for Apache Airflow™ Python DAG files. Recovers dag_id, task_ids, dependencies, schedule, and dynamic-fallback markers without running the source. Extracted from the Ferro ecosystem."
categories = ["development-tools", "parser-implementations"]
keywords = ["airflow", "dag", "parser", "static-analysis", "python"]
Expand Down
38 changes: 32 additions & 6 deletions crates/ferro-airflow-dag-parser/src/dynamic_markers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,17 @@ const TASK_DECORATORS: &[&str] = &[
///
/// Returns [`ParseError::Parse`] when ruff cannot parse the source.
pub fn detect_dynamic_markers(source: &str) -> Result<Vec<DynamicMarker>, ParseError> {
// Same stack-safety shield as the static extractor: the marker
// detector parses the source a second time and walks it with its own
// recursive `MarkerVisitor`, so it must run on the large dedicated
// stack behind the same pre-scan caps — a deeply nested DAG that
// survived the static path would otherwise crash here.
crate::panic_safe::shield_parser_panic("ruff-markers", source, || {
detect_dynamic_markers_impl(source)
})
}

fn detect_dynamic_markers_impl(source: &str) -> Result<Vec<DynamicMarker>, ParseError> {
let parsed = parse_module_safely(source)?;
let module: &ModModule = parsed.syntax();
let line_index = LineIndex::new(source);
Expand Down Expand Up @@ -424,16 +435,28 @@ fn is_dag_callable(expr: &Expr) -> bool {
}
}

/// Stack-safety cap for the decorator-callee chain walk below. A
/// `@dag()()()…` / `@task()()()…` decorator builds a left-leaning call
/// chain the parser produces iteratively, so it is not bounded by the
/// lexer recursion cap; truncate past this depth (such a decorator is not
/// a `@dag` / `@task` decorator anyway). Mirrors `ruff_impl`'s
/// `MAX_WALK_DEPTH` so both decorator paths are bounded identically
/// (Codex DD R8, 2026-06-16).
const MAX_DECORATOR_CHAIN_DEPTH: usize = 1024;

fn match_dag_decorator(expr: &Expr) -> bool {
fn inner(expr: &Expr) -> Option<&str> {
fn inner(expr: &Expr, depth: usize) -> Option<&str> {
if depth > MAX_DECORATOR_CHAIN_DEPTH {
return None;
}
match expr {
Expr::Name(ExprName { id, .. }) => Some(id.as_str()),
Expr::Attribute(ExprAttribute { attr, .. }) => Some(attr.as_str()),
Expr::Call(call) => inner(&call.func),
Expr::Call(call) => inner(&call.func, depth + 1),
_ => None,
}
}
matches!(inner(expr), Some(name) if DAG_DECORATOR_NAMES.contains(&name))
matches!(inner(expr, 0), Some(name) if DAG_DECORATOR_NAMES.contains(&name))
}

fn callee_is_chain_helper(expr: &Expr) -> bool {
Expand All @@ -454,15 +477,18 @@ fn is_operator_constructor(expr: &Expr) -> bool {
}

fn is_task_decorator_call(call: &ExprCall) -> bool {
fn inner(expr: &Expr) -> Option<&str> {
fn inner(expr: &Expr, depth: usize) -> Option<&str> {
if depth > MAX_DECORATOR_CHAIN_DEPTH {
return None;
}
match expr {
Expr::Name(ExprName { id, .. }) => Some(id.as_str()),
Expr::Attribute(ExprAttribute { attr, .. }) => Some(attr.as_str()),
Expr::Call(c) => inner(&c.func),
Expr::Call(c) => inner(&c.func, depth + 1),
_ => None,
}
}
matches!(inner(&call.func), Some(name) if TASK_DECORATORS.contains(&name))
matches!(inner(&call.func, 0), Some(name) if TASK_DECORATORS.contains(&name))
}

/// Conservative: anything that is not `@task()` (zero-arg) is considered
Expand Down
Loading
Loading