Index DCA Lab

A public, scriptable research repository for studying long-horizon index investing, dollar-cost averaging, drawdowns, recovery cycles, inflation-adjusted returns, valuation regimes, and lump-sum versus DCA tradeoffs.

This repository is designed for one purpose: make long-run equity-index research reproducible, visual, and easy to extend.

Practical Takeaways

What Actually Matters

Lump sum usually wins on average. DCA usually wins on survivability.

DCA is not mainly about maximizing return. It is about improving tolerance for bad starting points and making the plan easier to execute.

For most investors, SP500 is the better core allocation than an all-in NDX100 approach.

NDX100 historically compounded faster, but its drawdowns were severe enough to break many investors. SP500 was usually easier to hold through stress.

NDX100 is more timing-sensitive and more dependent on valuation and drawdown regime.

It can make sense as an aggressive satellite allocation, especially when accumulated gradually during weaker valuation or deeper drawdown periods.

DCA matters more for NDX100 than for SP500.

Not because NDX100 is weak over the long run, but because its volatility and drawdowns are severe enough that staged buying can materially improve survivability and increase tolerance for bad starting luck.

Do not over-optimize contribution frequency.

Weekly, monthly, quarterly, and yearly schedules did not differ enough to justify complex execution. Monthly contributions are usually sufficient.

High valuation is not a sell signal.

It is a warning that future returns may be lower than the most optimistic historical examples.

Small timing details matter less than people think.

Weekday choice, trading-day choice, and contribution frequency matter far less than asset selection, consistency, and the ability to stay invested during deep drawdowns.

The real edge is not clever timing. It is staying alive long enough for compounding to work.

This repository is a research tool and not investment advice.

What This Repository Covers

This project currently includes research workflows for:

Spot index path comparison
Weekly and monthly DCA studies
Buy-and-hold rolling-window studies
Fixed weekday and monthly trading-day sensitivity
Drawdown depth and recovery-cycle analysis
Worst-start-date stress tests
DCA frequency comparison
Real-return analysis using CPI
Wealth-path analysis for ongoing DCA
CAPE regime analysis for future SP500 returns
Lump sum versus DCA comparisons

The core focus is:

SP500
NDX100
SPY
QQQ

with local CSV support so research can be run against verified local datasets rather than fragile online endpoints.

Research Questions

This repo is built around questions that matter for real investors:

How often does long-term index investing actually work?
How much does the starting date matter?
How painful are the worst drawdowns?
How long do major crashes take to recover?
Does DCA materially improve survival in bad regimes?
Does contribution frequency matter enough to optimize?
How much of nominal return survives inflation?
Does starting valuation meaningfully shape long-run outcomes?
When does lump sum beat DCA, and when does DCA protect the investor?

Data Policy

This repository is public, but local datasets are not committed.

data/ is ignored by Git
Put your own local datasets under data/processed/
Generated reports and charts are written to data/reports/
Presentation-ready images are copied into docs/assets/ for the public repository

Expected local CSV schema:

date,open,high,low,close,adj_close,volume,symbol

Typical local files:

data/processed/sp500_daily.csv
data/processed/ndx100_daily.csv

Installation

git clone https://github.com/XiaokunDuan/index-dca-lab.git
cd index-dca-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]

Quick Start

Fetch ETF history through Yahoo Finance:

python -m dca_backtest.cli fetch \
  --symbol SPY \
  --start 1993-01-22 \
  --end 2026-04-15

Run a basic local-CSV backtest:

python -m dca_backtest.cli backtest \
  --provider local-csv \
  --csv-path data/processed/sp500_daily.csv \
  --symbol SP500 \
  --start 1927-12-30 \
  --end 2026-04-10 \
  --frequency monthly \
  --base-contribution 1000 \
  --monthly-anchor 1

Run a rolling DCA study with progress bars:

python -m dca_backtest.cli study-dca \
  --dataset SP500=data/processed/sp500_daily.csv \
  --dataset NDX100=data/processed/ndx100_daily.csv \
  --frequencies monthly weekly \
  --horizons 5 10 15 20 30 \
  --contribution 1000 \
  --start 1985-10-01 \
  --end 2026-04-10 \
  --output-dir data/reports_aligned

Render the standard DCA report charts:

python -m dca_backtest.cli plot-dca \
  --reports-dir data/reports_aligned \
  --sp500-csv data/processed/sp500_daily.csv \
  --ndx100-csv data/processed/ndx100_daily.csv \
  --start 1985-10-01 \
  --end 2026-04-10

Research Gallery

1. Spot Index Behavior

Normalized SP500 versus NDX100 from a common start date:

Direct SP500 versus NDX100 spot-index comparison:

SP500 long-run history:

NDX100 long-run history:

Bridge from long index history to ETFs:

2. DCA Rolling-Window Studies

Positive-return rate by holding horizon:

Median versus worst rolling-window CAGR:

Full-period DCA summary:

3. Drawdowns and Recovery Cycles

SP500 versus NDX100 drawdown comparison:

NDX100 versus SPY drawdown comparison:

Recovery math after large losses:

Major recovery cycles, ranked by severity:

Recovery-cycle scatter plot:

The 2000 bubble peak through eventual recovery:

4. Buy-and-Hold at Arbitrary Start Dates

Buy-and-hold positive-return rate by horizon:

Buy-and-hold median versus worst CAGR:

Shared-horizon comparison between SP500 and NDX100:

5. Timing Sensitivity Inside the Calendar

SPY weekday sensitivity:

SPY monthly trading-day sensitivity:

6. Worst Start Dates

Worst historical start dates: lump sum versus DCA:

Worst-start median outcome summary:

7. DCA Frequency Comparison

Positive-return rate across weekly, monthly, quarterly, and yearly schedules:

Median CAGR across DCA frequencies:

Full-period comparison of DCA frequencies:

8. Inflation-Adjusted Real Returns

Nominal versus real positive-return rates:

Nominal versus real median CAGR:

Nominal versus real long-run index growth:

9. Wealth Paths Instead of Endpoints

SP500 monthly DCA wealth path:

NDX100 monthly DCA wealth path:

10. CAPE and Forward Returns

SP500 CAPE buckets versus future returns:

SP500 starting CAPE versus subsequent forward returns:

11. Lump Sum Versus DCA

Median outcome:

Positive-return rate:

Worst-case outcome:

12. Compounding Intuition

Real SP500 long-run growth from the actual historical series:

Why 8%, 9%, and 10% compound so differently:

Single-rate compounding intuition at 9%:

Main Findings So Far

Long-horizon equity exposure is powerful, but the path matters

SP500 and NDX100 both delivered strong long-run outcomes
NDX100 historically compounded faster, but with much deeper drawdowns
SP500 generally offered a better holding experience

Drawdowns are survivability problems, not just temporary losses

NDX100 experienced an -82.9% drawdown after the 2000 bubble
That episode took about 15.6 years from peak to full recovery
SP500 suffered less in the same broad era and recovered materially earlier

DCA is usually not the return-maximizing choice, but it improves robustness

Lump sum often wins on median return because capital gets invested earlier
DCA often improves positive-return odds and worst-case outcomes
DCA is especially valuable when starting points are unlucky

Frequency matters less than consistency

Weekly, monthly, quarterly, and yearly DCA do not produce radically different long-run results
The investor’s ability to keep contributing matters more than optimizing the schedule

Inflation materially changes the story

Nominal returns overstate purchasing-power growth
SP500 looks meaningfully weaker once returns are deflated by CPI
NDX100 remains strong in real terms, but its path remains harsher

Starting valuation matters, especially over 10 to 20 years

Low starting CAPE regimes produced materially higher forward returns
High CAPE regimes still worked over very long windows, but usually at lower rates
This does not turn valuation into a precise timing tool, but it clearly affects forward return expectations

Repository Layout

dca_backtest/   engine, providers, CLI, rolling-study logic, plotting helpers
scripts/        standalone research scripts for one-off studies and charts
tests/          unit tests
data/           local-only datasets and generated reports (ignored by Git)
docs/assets/    public chart assets embedded in the README

Core CLI Commands

fetch

Fetch and cache ETF history through Yahoo Finance

backtest

Run a single DCA or DCA-plus-drawdown strategy

scan

Run a threshold and multiplier grid scan for drawdown-buy logic

study-dca

Run rolling-window DCA studies with terminal progress bars

plot-dca

Generate standard study plots from rolling-window DCA outputs

Core Research Scripts

Key scripts under scripts/:

clean_excel_indices.py
plot_sp500_vs_ndx100.py
plot_sp500_vs_ndx100_drawdowns.py
plot_drawdown_recovery_cycles.py
plot_drawdown_recovery_math.py
plot_buy_hold_win_rates.py
plot_worst_start_study.py
plot_dca_frequency_comparison.py
plot_real_return_study.py
plot_dca_wealth_paths.py
plot_cape_forward_return_study.py
plot_lump_sum_vs_dca_study.py
plot_spy_schedule_curves.py
plot_2000_recovery_compare.py

Reproducibility Notes

Price-index studies are conservative relative to true total-return studies
ETF adj_close captures splits and distributions but is still not a full after-tax investor experience
Real-return studies use CPI and therefore begin only where CPI coverage is available
Some charts compare spot indexes and ETFs separately; those are related but not identical objects

Notes and Limitations

This is a research repository, not investment advice
Results depend on data quality and return convention
Long-run U.S. equity success should not be blindly generalized to every market
Median outcomes and worst-case outcomes should always be read together

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dca_backtest		dca_backtest
docs/assets		docs/assets
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Index DCA Lab

Practical Takeaways

What Actually Matters

What This Repository Covers

Research Questions

Data Policy

Installation

Quick Start

Research Gallery

1. Spot Index Behavior

2. DCA Rolling-Window Studies

3. Drawdowns and Recovery Cycles

4. Buy-and-Hold at Arbitrary Start Dates

5. Timing Sensitivity Inside the Calendar

6. Worst Start Dates

7. DCA Frequency Comparison

8. Inflation-Adjusted Real Returns

9. Wealth Paths Instead of Endpoints

10. CAPE and Forward Returns

11. Lump Sum Versus DCA

12. Compounding Intuition

Main Findings So Far

Long-horizon equity exposure is powerful, but the path matters

Drawdowns are survivability problems, not just temporary losses

DCA is usually not the return-maximizing choice, but it improves robustness

Frequency matters less than consistency

Inflation materially changes the story

Starting valuation matters, especially over 10 to 20 years

Repository Layout

Core CLI Commands

Core Research Scripts

Reproducibility Notes

Notes and Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages