This module trains a supervised model to predict a company's Revenue Multiple.
By default, it automatically parses and uses numeric columns from the CSV (monetary, percentages, counts, etc.). To prevent leakage, the loader excludes identifiers, links/text, the target, and any column whose name suggests it encodes valuation/multiples/price directly (e.g., Stock Price, Market Cap, Price/Sales, EV/Revenue, Shares Outstanding, etc.). Columns are parsed as dollars, percentages, or floats, with median imputation for sparse missing values.
Data source: the included CSV – A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv.
- Auto-parse numeric features while guarding against leakage (see above).
- Hold-out split (default 80/20) with fixed seed for reproducibility.
- Model: GradientBoostingRegressor; optional Optuna K-fold tuning.
- Optional MC-style ensemble for robustness/uncertainty.
- Metrics: MAE, RMSE, R², median AE, p90/p95 AE, MAPE; saved to JSON.
- Visuals: parity (actual vs predicted), residuals scatter, residuals histogram, top feature importances.
pip install -r requirements.txt# Reproduce latest run
python saas_revenue_multiple_model.py train \
--tune --trials 10 --cv-folds 5 \
--mc-ensemble 5 --mc-bootstrap \
--csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
--model-out saas_rev_multiple_model.pkl \
--plot saas_rev_multiple_fit.png \
--plot-style parity --plot-label-tickers --plot-color-by none --plot-size-by noneOutputs:
saas_rev_multiple_model.pkl— serialized model- Plots:
saas_rev_multiple_fit.png— parity (actual vs predicted)saas_rev_multiple_fit_residuals_scatter.pngsaas_rev_multiple_fit_residuals_hist.pngsaas_rev_multiple_fit_feature_importance.png
- JSON trace:
saas_model_report.jsonwith metrics/config
Predict by ticker (features auto-derived from CSV):
python saas_revenue_multiple_model.py predict \
--model saas_rev_multiple_model.pkl \
--ticker ADBE \
--csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv"Note: In auto-feature mode, prediction requires --ticker + --csv and raw manual values are disabled (features depend on the parsed CSV schema).
Enable tuning with K-fold CV:
python saas_revenue_multiple_model.py train \
--tune --trials 50 --cv-folds 5 \
--csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
--model-out saas_rev_multiple_model.pkl \
--plot saas_rev_multiple_fit.pngThis searches over n_estimators, learning_rate, max_depth, and subsample for GradientBoostingRegressor using mean MAE across folds.
You can train an ensemble of models via bagging to estimate uncertainty:
python saas_revenue_multiple_model.py train \
--tune --trials 25 --cv-folds 5 \
--mc-ensemble 15 --mc-bootstrap \
--csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
--model-out saas_rev_multiple_model.pklAt prediction time, request mean +/- std from the ensemble:
python saas_revenue_multiple_model.py predict \
--model saas_rev_multiple_model.pkl \
--ticker ADBE \
--csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
--uncertaintyBuild a local SQLite DB of company fundamentals directly from SEC APIs.
- Install deps
pip install -r requirements.txt- Run ETL
python etl.py \
--tickers ADBE,MSFT \
--db saas_fundamentals.db \
--user-agent "[email protected]"Or via a file:
python etl.py \
--tickers-file tickers.txt \
--db saas_fundamentals.db \
--user-agent "[email protected]"- Creates/updates SQLite DB per
schema.sqlwith tables:companies,fundamentals,filings. - Pulls
companyfactsandsubmissionsper CIK; computes LQ, annualized, LTM, and YoY metrics. - Respects SEC rate limits and requires a polite User-Agent.
- Smart tag resolver per metric with provenance and caching.
- Strict unit hygiene: monetary facts only in
USD; shares inshares. - Scale normalization using SEC-provided
scale(value × 10^scale). - Robust report stamps derived from the exact revenue row used (with fallbacks).
- Margin computation only when periods align exactly; guardrails drop |margin| > 100%.
- Preference for quarterly forms (
10-Q) for LQ;10-Kallowed for annual context/fallback. - Traceability:
fundamentals_audittable logs source tag, unit, and scale per metric. - Polite rate limiting with jitter and retries/backoff for NASDAQ/SEC endpoints.
python duck_analytics.py --db saas_fundamentals.db --summary
python duck_analytics.py --db saas_fundamentals.db --export fundamentals.parquetpython etl.py --tickers AAPL,MSFT,AMZN,ADBE,NVDA --db saas_fundamentals.db --user-agent "[email protected]"- The script robustly parses '$' and commas in cash/debt columns and '%' in margin.
- Rows with missing inputs/target are dropped before splitting.
- You can adjust model choice or hyperparameters inside
saas_revenue_multiple_model.py.