backtest_engine

Numba JIT-accelerated trade simulator and validation toolkit. Turns Python for-loop backtests into 10-50x faster simulations.

What this is: A fast bar-based trade simulator that takes OHLC data + pre-computed signals and evaluates each trade independently. Includes automated bug detection and stage-gate validation.

What this is not: A portfolio backtester. There is no account balance tracking, multi-instrument position management, or order book simulation. If you need those, look at backtrader or vectorbt.

Features

Core (instrument-agnostic)

simulate_trades() — SL/TP/trailing/SAR trailing/limit entry/custom exit, all @njit kernels. Supports next-bar-open entry via open_prices in all modes
Indicators — SMA, ATR, Bollinger Bands, RCI, Parabolic SAR, expanding quantile, multi-timeframe mapping (all @njit)
Monte Carlo DD — 10,000-shuffle drawdown distribution, Kelly criterion, optimal risk sizing, prop firm DD check
StressTest — block bootstrap (preserves losing-streak autocorrelation), parameter degradation scenarios (win rate / RR / cost what-if)
Walk-Forward / CSCV — out-of-sample validation with IS/OOS ratio and Probability of Backtest Overfitting (PBO, Bailey 2015)
TradeResults — structured array with convenience metrics: profit factor, win rate, expectancy, Sharpe/Sortino, max drawdown, recovery factor
Utilities — CSV loading, OHLC resampling, higher-TF indicator mapping

Opinionated modules (FX defaults, fully customizable)

BrokerCost — measured spread + commission model. Ships with FX presets (29 pairs (28 FX + XAUUSD)), but accepts any instrument via constructor. Supports per-trade cost arrays for variable-spread modeling
BugGuard — 14 automated checks (BG-01–BG-13) that catch look-ahead bias, cost underestimation, fixed-cost misuse, spread filter violations, overfitting, and more. Data period estimation defaults to FX market hours but accepts explicit timestamps for any market
GateKeeper — fast-kill pipeline (Gate 0–4) that eliminates hopeless strategies early. Gate 0: BugGuard, Gate 1–2: PF/RF screening, Gate 3: WFA+CSCV overfitting, Gate 4: Monte Carlo DD. All thresholds are class variables — override them for your asset class

Install

git clone https://github.com/bartonguestier1725-collab/backtest-engine.git
cd backtest-engine
uv venv .venv && source .venv/bin/activate
uv pip install -e '.[dev]'

Note: numba 0.63 requires NumPy < 2.4. numpy>=2.2,<2.4 is pinned.

Quick start

import numpy as np
from backtest_engine import simulate_trades, BrokerCost, LONG, MonteCarloDD, atr

# OHLC data (numpy arrays)
open_, high, low, close = ...

# ATR-based SL/TP
atr_vals = atr(high, low, close, 14)

# Define signals
signal_bars = np.array([100, 250, 400], dtype=np.int32)
directions = np.array([LONG, LONG, LONG], dtype=np.int8)
sl_distances = atr_vals[signal_bars] * 1.5
tp_distances = atr_vals[signal_bars] * 3.0

# Per-trade cost (spread + commission, scaled to each trade's SL)
cost = BrokerCost.tradeview_ilc()
instruments = ["EURUSD"] * len(signal_bars)
entry_costs = cost.per_trade_cost(instruments, sl_distances)

# Run simulation (Grade A: both open_prices and entry_costs provided)
results = simulate_trades(
    high, low, close,
    signal_bars, directions, sl_distances, tp_distances,
    max_hold=100,
    be_trigger_pct=0.5,
    open_prices=open_,
    entry_costs=entry_costs,
)

# Results (pnl_r is already net of costs)
print(f"Quality: {results.quality.grade}")  # "A"
print(f"Win rate: {np.mean(results['pnl_r'] > 0) * 100:.1f}%")
print(f"Avg PnL: {np.mean(results['pnl_r']):.3f}R")
print(f"Avg cost: {np.mean(results['cost_r']):.4f}R")

# Monte Carlo DD analysis
mc = MonteCarloDD(results['pnl_r'], risk_pct=0.01)
mc.run()
print(f"95th DD: {mc.dd_percentile(95) * 100:.1f}%")

simulate_trades options

Parameter	Description
`exit_mode="rr"`	Fixed risk-reward: SL/TP/timeout/breakeven
`exit_mode="trailing"`	Trailing stop
`exit_mode="sar_trailing"`	Parabolic SAR-based trailing (with direction-aware Plan B)
`exit_mode="custom"`	Exit on external signal
`be_trigger_pct`	Move SL to entry price when profit reaches this fraction of TP (0 = disabled)
`retrace_pct`	Limit entry: wait for price to retrace this fraction of TP before filling (0 = market entry)
`retrace_timeout`	Max bars to wait for limit fill before cancelling
`trail_activation_r`	R-multiple required to activate trailing stop
`trail_distance_r`	Trailing stop distance in R-multiples
`open_prices`	Enter at next-bar open instead of signal-bar close (supported in all modes)
`entry_costs`	Per-trade cost array in R-units (e.g. from `BrokerCost.per_trade_cost()`). Subtracted from `pnl_r` post-simulation. `NO_FILL` trades automatically get zero cost

Pre-flight quality check

simulate_trades() inspects its inputs before running and assigns a quality grade:

Grade	Condition	Meaning
A	`entry_costs` AND `open_prices` both provided	All inputs provided
B	One of the two provided	One input missing
C	Neither provided	No costs + close entry — over-optimistic

Grade indicates whether inputs were provided, not whether they are correct. Use BugGuard (BG-02, BG-12) to validate cost accuracy.

Grade B/C emit a BacktestQualityWarning with details on what's missing. The simulation still runs — nothing is blocked.

# Grade C — warns you
results = simulate_trades(high, low, close, ...)
# BacktestQualityWarning: Backtest Quality: C
#   entry_costs:  NOT PROVIDED — costs will be 0 (use BrokerCost.per_trade_cost())
#   open_prices:  NOT PROVIDED — entry at signal-bar close (optimistic bias)

# Grade A — no warning
results = simulate_trades(..., open_prices=open_arr, entry_costs=cost_arr)

# Access quality info
print(results.quality.grade)  # "A"

# Suppress warnings
import warnings
from backtest_engine import BacktestQualityWarning
warnings.filterwarnings("ignore", category=BacktestQualityWarning)

# Or disable per call
results = simulate_trades(..., preflight=False)

Broker cost models

BrokerCost works with any instrument. Pass your own spreads, commissions, and pip definitions.

from backtest_engine import BrokerCost

# Custom broker — any instrument
cost = BrokerCost(
    spreads={"EURUSD": 0.00008, "USDJPY": 0.008, "BTCUSD": 15.0},
    commission_per_lot=4.0,
    pip_values={"EURUSD": 10.0, "USDJPY": 6.67, "BTCUSD": 1.0},
    pip_sizes={"EURUSD": 0.0001, "USDJPY": 0.01, "BTCUSD": 1.0},
)

# FX presets (ships with 29 pairs (28 FX + XAUUSD), measured spreads)
cost = BrokerCost.tradeview_ilc()   # ECN (tight spread + $5 RT commission)
cost = BrokerCost.fundora()          # Prop firm (wider spread, no commission)

# Per-trade cost — pass to simulate_trades() for accurate cost modeling
instruments = ["EURUSD"] * len(sl_distances)
cost_array = cost.per_trade_cost(instruments, sl_distances)
results = simulate_trades(..., entry_costs=cost_array)
# results['pnl_r'] already has costs subtracted; results['cost_r'] shows each trade's cost

# Get all costs as a dict (useful for BugGuard)
expected = cost.cost_prices()  # {"EURUSD": 0.00007, "USDJPY": 0.014, ...}

BugGuard

Automatically checks for 14 known backtesting bugs before you trust any result.

The checks themselves are instrument-agnostic (look-ahead bias, bfill leak, same-bar reentry, etc.). The only FX-specific part is data period estimation — pass start_ts/end_ts to use calendar dates instead.

from backtest_engine import bug_guard, BrokerCost

report = bug_guard(
    source_path="my_strategy.py",
    signal_bars=signal_bars,
    entry_bars=results["entry_bar"],
    exit_bars=results["exit_bar"],
    spreads_used={"USDJPY": 0.017, "EURUSD": 0.000075},
    expected_costs=BrokerCost.tradeview_ilc().cost_prices(),
    resolution_minutes=5,
    n_bars=len(close),
    n_trades=len(signal_bars),
)
# BG-01: Look-ahead bias      BG-02: Cost underestimation  BG-04: bfill data leak
# BG-05: Coarse-bar SL/TP     BG-06: Short-period overfit  BG-07: Full-period quantile
# BG-08: Same-bar re-entry     BG-09: Close-price entry     BG-09b: Missing open_prices
# BG-12: Fixed-cost misuse    BG-13: Spread filter gap     ... and more

Individual checks can also be called standalone:

from backtest_engine import check_look_ahead, check_cost_registry, check_spread_filter

result = check_look_ahead(signal_bars, entry_bars)
print(f"{result.check_id}: {result.message}")

# Check if backtest signals respect a max spread constraint
spread_result = check_spread_filter(spreads_at_entry, max_spread=0.00020)
print(f"{spread_result.check_id}: {spread_result.message}")

GateKeeper

Stage-gate pipeline that kills hopeless strategies early. Default thresholds are tuned for FX — override them for your asset class.

from backtest_engine import GateKeeper, BrokerCost

gk = GateKeeper(
    strategy_name="SMA Crossover v3",
    n_bars=7022,
    bar_minutes=60,
    resolution_minutes=1,
    spreads_used=spreads,
    source_path=__file__,
    expected_costs=BrokerCost.tradeview_ilc().cost_prices(),
)

gk.gate0_validate()                    # BugGuard 14 checks + GK-00 input completeness
gk.gate1_quick(run_func, quick_params)  # ~20 combos → PF >= 1.05?
gk.gate2_screen(run_func, full_params)  # ~100 combos → PF >= 1.10, RF >= 1.5?
gk.gate3_validate(wfa_result, cscv_result)  # WFA OOS win rate + CSCV PBO
gk.gate4_montecarlo(mc)                     # MC drawdown pass rate >= 70%
gk.summary()

# Override thresholds for a different asset class
gk.GATE1_MIN_PF = 1.10
gk.GATE2_MIN_RF = 2.0
gk.GATE3_MAX_PBO = 0.30
gk.GATE4_MIN_MC_PASS = 0.80

Gate details

Gate	Time	What it checks	Kill condition
Gate 0	1 min	BugGuard (14 checks) + input completeness	Any BG ERROR → RuntimeError
Gate 1	5 min	~20 param combos on full data	Best PF < 1.05
Gate 2	20 min	~100 param combos, PF + Recovery Factor	Best PF < 1.10 or RF < 1.5
Gate 3	30 min	WFA out-of-sample win rate + CSCV PBO	OOS win rate < 0.55 or PBO > 0.40
Gate 4	5 min	Monte Carlo DD pass rate + confidence percentile	DD@confidence > dd_limit or pass rate < 0.70

Gate 0 also emits a GK-00 WARN when key inputs are missing (source_path, spreads_used, n_bars), which causes important BugGuard checks to be silently skipped.

run_func

run_func takes a parameter dict and returns a metric dict:

def run_func(params: dict) -> dict | None:
    """Run a single backtest with the given params.
    Must return {'pf': ..., 'total_r': ..., 'n_trades': ..., 'max_dd_r': ...}
    or None if no trades.
    """

Gate 3: WFA + CSCV

Gate 3 uses results from WalkForward and CSCV to detect overfitting. CSCV is optional — pass None to skip the PBO check.

from backtest_engine import WalkForward, CSCV

# Run WFA
wf = WalkForward(n_bars=len(close), is_ratio=0.7, n_splits=5)
wfa_result = wf.run(param_grid, evaluate_fn)

# Run CSCV
cscv = CSCV(n_splits=10)
cscv_result = cscv.run(param_grid, evaluate_fn, n_bars=len(close))

# Feed into GateKeeper
gk.gate3_validate(wfa_result, cscv_result)
# → Checks: OOS win rate >= 0.55, PBO <= 0.40

Gate 4: Monte Carlo DD

Gate 4 validates that the strategy survives randomized trade-order scenarios.

from backtest_engine import MonteCarloDD

mc = MonteCarloDD(results['pnl_r'], n_sims=10_000, risk_pct=0.01, seed=42)
mc.run()

gk.gate4_montecarlo(mc, dd_limit=0.20, confidence=95.0)
# → Checks: DD@95% <= 20% AND fraction of sims with max DD < 20% >= 0.70

TradeResults metrics

simulate_trades() returns a TradeResults object (numpy structured array) with convenience properties:

results = simulate_trades(...)

print(f"PF: {results.profit_factor:.2f}")       # Gross profit / gross loss
print(f"Win rate: {results.win_rate:.1%}")       # Fraction of winning trades
print(f"Expectancy: {results.expectancy_r:.3f}R") # Mean PnL per trade
print(f"Geo mean: {results.geometric_mean_r:.4f}") # Geometric growth rate
print(f"Sharpe: {results.sharpe_r:.2f}")         # mean / std of pnl_r
print(f"Sortino: {results.sortino_r:.2f}")       # mean / downside_std
print(f"Max DD: {results.max_drawdown_r:.1f}R")  # Peak-to-trough in R
print(f"RF: {results.recovery_factor:.2f}")      # total_r / max_dd_r

All properties handle edge cases (empty results, no losses, no wins) and return plain floats.

Monte Carlo & prop firm check

from backtest_engine import MonteCarloDD

mc = MonteCarloDD(pnl_after_costs, n_sims=10_000, risk_pct=0.01, seed=42)
mc.run()

print(f"DD 95th: {mc.dd_percentile(95)*100:.1f}%")
print(f"Kelly: {mc.kelly_fraction()*100:.1f}%")
print(f"Optimal risk: {mc.optimal_risk_pct(max_dd=0.15)*100:.2f}%")

# Prop firm drawdown check (any firm, any limits)
result = mc.prop_firm_check(max_dd_limit=0.04, total_dd_limit=0.08, confidence=95.0)
print(f"Pass: {result['pass']}, Max DD OK: {result['max_dd_ok']}")

StressTest

Stress-test a strategy's robustness beyond simple Monte Carlo shuffling.

from backtest_engine import StressTest

st = StressTest(results['pnl_r'], n_sims=1000, seed=42)

# Block bootstrap — preserves losing-streak autocorrelation
bb = st.block_bootstrap(block_size=10)
print(f"Block bootstrap DD@95%: {bb['dd_95']*100:.1f}%")

# Parameter degradation — what-if scenarios
degraded = st.degrade(win_rate_delta=-0.05, rr_scale=0.90, cost_add_r=0.02)
# degraded is a modified pnl array you can feed into MonteCarloDD

# Run all scenarios at once
report = st.run_all(block_size=10)
# report["baseline"]           — standard MC shuffle
# report["block_bootstrap"]    — block bootstrap with autocorrelation
# report["degraded"]["wr_minus5"]     — win rate -5%
# report["degraded"]["rr_80pct"]      — reward:risk scaled to 80%
# report["degraded"]["cost_plus_01r"] — extra 0.1R cost per trade
# report["degraded"]["combined"]      — all degradations at once

Block bootstrap resamples consecutive blocks of trades instead of individual shuffling. This preserves the natural clustering of losing streaks, giving more realistic worst-case DD estimates.

Walk-Forward & CSCV

from backtest_engine import WalkForward, CSCV

# Walk-Forward analysis
wf = WalkForward(n_bars=len(close), is_ratio=0.7, n_splits=5, anchored=False)
wf_result = wf.run(param_grid, evaluate_fn)
print(f"OOS mean: {wf_result['oos_mean']:.3f}R")
print(f"IS/OOS ratio: {wf_result['is_oos_ratio']:.2f}")

# CSCV — Probability of Backtest Overfitting (Bailey 2015)
cscv = CSCV(n_splits=10)
cscv_result = cscv.run(param_grid, evaluate_fn, n_bars=len(close))
print(f"PBO: {cscv_result['pbo']:.1%}")  # < 0.5 = probably not overfit

Utilities

from backtest_engine import load_ohlcv, resample_ohlcv, map_higher_tf

# Load CSV → numpy arrays
timestamps, open_, high, low, close, volume = load_ohlcv("data.csv")

# Resample 5-min to 1-hour
ts_1h, o_1h, h_1h, l_1h, c_1h, v_1h = resample_ohlcv(
    timestamps, open_, high, low, close, volume, rule="1h"
)

# Map higher-TF indicator to lower-TF bars (uses last completed bar)
atr_1h = atr(h_1h, l_1h, c_1h, 14)
atr_mapped = map_higher_tf(timestamps, ts_1h, atr_1h)

FX defaults reference

The following FX-specific defaults ship with the library. They are used by BugGuard and GateKeeper when no custom values are provided.

Module	Default	How to override
`BugGuard`	Data period estimation uses FX hours (22 days/mo, 17h/day)	Pass `start_ts`/`end_ts` to `check_data_period()`
`BugGuard`	Fundora cost registry (29 pairs: 28 FX + XAUUSD)	Pass your own `expected_costs` dict
`GateKeeper`	PF >= 1.05/1.10, RF >= 1.5, trades >= 30/50, PBO <= 0.40, MC pass >= 0.70	Override class variables: `gk.GATE1_MIN_PF = 1.10`
`BrokerCost`	`tradeview_ilc()`, `fundora()` presets	Use the constructor directly with your own spreads

Tests

python -m pytest tests/ -v

244 tests, all passing.

License

GPL-3.0 — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
backtest_engine		backtest_engine
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

backtest_engine

Features

Core (instrument-agnostic)

Opinionated modules (FX defaults, fully customizable)

Install

Quick start

simulate_trades options

Pre-flight quality check

Broker cost models

BugGuard

GateKeeper

Gate details

run_func

Gate 3: WFA + CSCV

Gate 4: Monte Carlo DD

TradeResults metrics

Monte Carlo & prop firm check

StressTest

Walk-Forward & CSCV

Utilities

FX defaults reference

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

backtest_engine

Features

Core (instrument-agnostic)

Opinionated modules (FX defaults, fully customizable)

Install

Quick start

simulate_trades options

Pre-flight quality check

Broker cost models

BugGuard

GateKeeper

Gate details

run_func

Gate 3: WFA + CSCV

Gate 4: Monte Carlo DD

TradeResults metrics

Monte Carlo & prop firm check

StressTest

Walk-Forward & CSCV

Utilities

FX defaults reference

Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages