A public, scriptable research repository for studying long-horizon index investing, dollar-cost averaging, drawdowns, recovery cycles, inflation-adjusted returns, valuation regimes, and lump-sum versus DCA tradeoffs.
This repository is designed for one purpose: make long-run equity-index research reproducible, visual, and easy to extend.
Lump sum usually wins on average. DCA usually wins on survivability.
DCA is not mainly about maximizing return. It is about improving tolerance for bad starting points and making the plan easier to execute.
For most investors, SP500 is the better core allocation than an all-in NDX100 approach.
NDX100 historically compounded faster, but its drawdowns were severe enough to break many investors. SP500 was usually easier to hold through stress.
NDX100 is more timing-sensitive and more dependent on valuation and drawdown regime.
It can make sense as an aggressive satellite allocation, especially when accumulated gradually during weaker valuation or deeper drawdown periods.
DCA matters more for NDX100 than for SP500.
Not because NDX100 is weak over the long run, but because its volatility and drawdowns are severe enough that staged buying can materially improve survivability and increase tolerance for bad starting luck.
Do not over-optimize contribution frequency.
Weekly, monthly, quarterly, and yearly schedules did not differ enough to justify complex execution. Monthly contributions are usually sufficient.
High valuation is not a sell signal.
It is a warning that future returns may be lower than the most optimistic historical examples.
Small timing details matter less than people think.
Weekday choice, trading-day choice, and contribution frequency matter far less than asset selection, consistency, and the ability to stay invested during deep drawdowns.
The real edge is not clever timing. It is staying alive long enough for compounding to work.
This repository is a research tool and not investment advice.
This project currently includes research workflows for:
- Spot index path comparison
- Weekly and monthly DCA studies
- Buy-and-hold rolling-window studies
- Fixed weekday and monthly trading-day sensitivity
- Drawdown depth and recovery-cycle analysis
- Worst-start-date stress tests
- DCA frequency comparison
- Real-return analysis using CPI
- Wealth-path analysis for ongoing DCA
- CAPE regime analysis for future SP500 returns
- Lump sum versus DCA comparisons
The core focus is:
SP500NDX100SPYQQQ
with local CSV support so research can be run against verified local datasets rather than fragile online endpoints.
This repo is built around questions that matter for real investors:
- How often does long-term index investing actually work?
- How much does the starting date matter?
- How painful are the worst drawdowns?
- How long do major crashes take to recover?
- Does DCA materially improve survival in bad regimes?
- Does contribution frequency matter enough to optimize?
- How much of nominal return survives inflation?
- Does starting valuation meaningfully shape long-run outcomes?
- When does lump sum beat DCA, and when does DCA protect the investor?
This repository is public, but local datasets are not committed.
data/is ignored by Git- Put your own local datasets under
data/processed/ - Generated reports and charts are written to
data/reports/ - Presentation-ready images are copied into
docs/assets/for the public repository
Expected local CSV schema:
date,open,high,low,close,adj_close,volume,symbol
Typical local files:
data/processed/sp500_daily.csv
data/processed/ndx100_daily.csv
git clone https://github.com/XiaokunDuan/index-dca-lab.git
cd index-dca-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]Fetch ETF history through Yahoo Finance:
python -m dca_backtest.cli fetch \
--symbol SPY \
--start 1993-01-22 \
--end 2026-04-15Run a basic local-CSV backtest:
python -m dca_backtest.cli backtest \
--provider local-csv \
--csv-path data/processed/sp500_daily.csv \
--symbol SP500 \
--start 1927-12-30 \
--end 2026-04-10 \
--frequency monthly \
--base-contribution 1000 \
--monthly-anchor 1Run a rolling DCA study with progress bars:
python -m dca_backtest.cli study-dca \
--dataset SP500=data/processed/sp500_daily.csv \
--dataset NDX100=data/processed/ndx100_daily.csv \
--frequencies monthly weekly \
--horizons 5 10 15 20 30 \
--contribution 1000 \
--start 1985-10-01 \
--end 2026-04-10 \
--output-dir data/reports_alignedRender the standard DCA report charts:
python -m dca_backtest.cli plot-dca \
--reports-dir data/reports_aligned \
--sp500-csv data/processed/sp500_daily.csv \
--ndx100-csv data/processed/ndx100_daily.csv \
--start 1985-10-01 \
--end 2026-04-10Normalized SP500 versus NDX100 from a common start date:
Direct SP500 versus NDX100 spot-index comparison:
SP500 long-run history:
NDX100 long-run history:
Bridge from long index history to ETFs:
Positive-return rate by holding horizon:
Median versus worst rolling-window CAGR:
Full-period DCA summary:
SP500 versus NDX100 drawdown comparison:
NDX100 versus SPY drawdown comparison:
Recovery math after large losses:
Major recovery cycles, ranked by severity:
Recovery-cycle scatter plot:
The 2000 bubble peak through eventual recovery:
Buy-and-hold positive-return rate by horizon:
Buy-and-hold median versus worst CAGR:
Shared-horizon comparison between SP500 and NDX100:
SPY weekday sensitivity:
SPY monthly trading-day sensitivity:
Worst historical start dates: lump sum versus DCA:
Worst-start median outcome summary:
Positive-return rate across weekly, monthly, quarterly, and yearly schedules:
Median CAGR across DCA frequencies:
Full-period comparison of DCA frequencies:
Nominal versus real positive-return rates:
Nominal versus real median CAGR:
Nominal versus real long-run index growth:
SP500 monthly DCA wealth path:
NDX100 monthly DCA wealth path:
SP500 CAPE buckets versus future returns:
SP500 starting CAPE versus subsequent forward returns:
Median outcome:
Positive-return rate:
Worst-case outcome:
Real SP500 long-run growth from the actual historical series:
Why 8%, 9%, and 10% compound so differently:
Single-rate compounding intuition at 9%:
- SP500 and NDX100 both delivered strong long-run outcomes
- NDX100 historically compounded faster, but with much deeper drawdowns
- SP500 generally offered a better holding experience
- NDX100 experienced an
-82.9%drawdown after the 2000 bubble - That episode took about
15.6 yearsfrom peak to full recovery - SP500 suffered less in the same broad era and recovered materially earlier
- Lump sum often wins on median return because capital gets invested earlier
- DCA often improves positive-return odds and worst-case outcomes
- DCA is especially valuable when starting points are unlucky
- Weekly, monthly, quarterly, and yearly DCA do not produce radically different long-run results
- The investor’s ability to keep contributing matters more than optimizing the schedule
- Nominal returns overstate purchasing-power growth
- SP500 looks meaningfully weaker once returns are deflated by CPI
- NDX100 remains strong in real terms, but its path remains harsher
- Low starting CAPE regimes produced materially higher forward returns
- High CAPE regimes still worked over very long windows, but usually at lower rates
- This does not turn valuation into a precise timing tool, but it clearly affects forward return expectations
dca_backtest/ engine, providers, CLI, rolling-study logic, plotting helpers
scripts/ standalone research scripts for one-off studies and charts
tests/ unit tests
data/ local-only datasets and generated reports (ignored by Git)
docs/assets/ public chart assets embedded in the README
fetch
- Fetch and cache ETF history through Yahoo Finance
backtest
- Run a single DCA or DCA-plus-drawdown strategy
scan
- Run a threshold and multiplier grid scan for drawdown-buy logic
study-dca
- Run rolling-window DCA studies with terminal progress bars
plot-dca
- Generate standard study plots from rolling-window DCA outputs
Key scripts under scripts/:
clean_excel_indices.pyplot_sp500_vs_ndx100.pyplot_sp500_vs_ndx100_drawdowns.pyplot_drawdown_recovery_cycles.pyplot_drawdown_recovery_math.pyplot_buy_hold_win_rates.pyplot_worst_start_study.pyplot_dca_frequency_comparison.pyplot_real_return_study.pyplot_dca_wealth_paths.pyplot_cape_forward_return_study.pyplot_lump_sum_vs_dca_study.pyplot_spy_schedule_curves.pyplot_2000_recovery_compare.py
- Price-index studies are conservative relative to true total-return studies
- ETF
adj_closecaptures splits and distributions but is still not a full after-tax investor experience - Real-return studies use CPI and therefore begin only where CPI coverage is available
- Some charts compare spot indexes and ETFs separately; those are related but not identical objects
- This is a research repository, not investment advice
- Results depend on data quality and return convention
- Long-run U.S. equity success should not be blindly generalized to every market
- Median outcomes and worst-case outcomes should always be read together
MIT




































