GitHub - ruizmr/info664

SaaS Revenue Multiple Predictor

This module trains a supervised model to predict a company's Revenue Multiple.

By default, it automatically parses and uses numeric columns from the CSV (monetary, percentages, counts, etc.). To prevent leakage, the loader excludes identifiers, links/text, the target, and any column whose name suggests it encodes valuation/multiples/price directly (e.g., Stock Price, Market Cap, Price/Sales, EV/Revenue, Shares Outstanding, etc.). Columns are parsed as dollars, percentages, or floats, with median imputation for sparse missing values.

Data source: the included CSV – A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv.

Approach

Auto-parse numeric features while guarding against leakage (see above).
Hold-out split (default 80/20) with fixed seed for reproducibility.
Model: GradientBoostingRegressor; optional Optuna K-fold tuning.
Optional MC-style ensemble for robustness/uncertainty.
Metrics: MAE, RMSE, R², median AE, p90/p95 AE, MAPE; saved to JSON.
Visuals: parity (actual vs predicted), residuals scatter, residuals histogram, top feature importances.

Install

pip install -r requirements.txt

Train

# Reproduce latest run
python saas_revenue_multiple_model.py train \
  --tune --trials 10 --cv-folds 5 \
  --mc-ensemble 5 --mc-bootstrap \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl \
  --plot saas_rev_multiple_fit.png \
  --plot-style parity --plot-label-tickers --plot-color-by none --plot-size-by none

Outputs:

saas_rev_multiple_model.pkl — serialized model
Plots:
- saas_rev_multiple_fit.png — parity (actual vs predicted)
- saas_rev_multiple_fit_residuals_scatter.png
- saas_rev_multiple_fit_residuals_hist.png
- saas_rev_multiple_fit_feature_importance.png
JSON trace: saas_model_report.json with metrics/config

Predict

Predict by ticker (features auto-derived from CSV):

python saas_revenue_multiple_model.py predict \
  --model saas_rev_multiple_model.pkl \
  --ticker ADBE \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv"

Note: In auto-feature mode, prediction requires --ticker + --csv and raw manual values are disabled (features depend on the parsed CSV schema).

Hyperparameter Tuning (Optuna + K-fold CV)

Enable tuning with K-fold CV:

python saas_revenue_multiple_model.py train \
  --tune --trials 50 --cv-folds 5 \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl \
  --plot saas_rev_multiple_fit.png

This searches over n_estimators, learning_rate, max_depth, and subsample for GradientBoostingRegressor using mean MAE across folds.

Monte Carlo-style Ensemble for Uncertainty

You can train an ensemble of models via bagging to estimate uncertainty:

python saas_revenue_multiple_model.py train \
  --tune --trials 25 --cv-folds 5 \
  --mc-ensemble 15 --mc-bootstrap \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl

At prediction time, request mean +/- std from the ensemble:

python saas_revenue_multiple_model.py predict \
  --model saas_rev_multiple_model.pkl \
  --ticker ADBE \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --uncertainty

Notes

SEC Fundamentals ETL

Build a local SQLite DB of company fundamentals directly from SEC APIs.

Quick start

Install deps

pip install -r requirements.txt

Run ETL

python etl.py \
  --tickers ADBE,MSFT \
  --db saas_fundamentals.db \
  --user-agent "[email protected]"

Or via a file:

python etl.py \
  --tickers-file tickers.txt \
  --db saas_fundamentals.db \
  --user-agent "[email protected]"

Creates/updates SQLite DB per schema.sql with tables: companies, fundamentals, filings.
Pulls companyfacts and submissions per CIK; computes LQ, annualized, LTM, and YoY metrics.
Respects SEC rate limits and requires a polite User-Agent.

Recent improvements

Smart tag resolver per metric with provenance and caching.
Strict unit hygiene: monetary facts only in USD; shares in shares.
Scale normalization using SEC-provided scale (value × 10^scale).
Robust report stamps derived from the exact revenue row used (with fallbacks).
Margin computation only when periods align exactly; guardrails drop |margin| > 100%.
Preference for quarterly forms (10-Q) for LQ; 10-K allowed for annual context/fallback.
Traceability: fundamentals_audit table logs source tag, unit, and scale per metric.
Polite rate limiting with jitter and retries/backoff for NASDAQ/SEC endpoints.

Validate with DuckDB

python duck_analytics.py --db saas_fundamentals.db --summary
python duck_analytics.py --db saas_fundamentals.db --export fundamentals.parquet

Sample rebuild

python etl.py --tickers AAPL,MSFT,AMZN,ADBE,NVDA --db saas_fundamentals.db --user-agent "[email protected]"

The script robustly parses '$' and commas in cash/debt columns and '%' in margin.
Rows with missing inputs/target are dropped before splitting.
You can adjust model choice or hyperparameters inside saas_revenue_multiple_model.py.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
SECweekly.db		SECweekly.db
SECweekly_convert.pid		SECweekly_convert.pid
duck_analytics.py		duck_analytics.py
etl.py		etl.py
ft_transformer.py		ft_transformer.py
optuna_saas.db		optuna_saas.db
requirements.txt		requirements.txt
saas_fundamentals.db		saas_fundamentals.db
saas_fundamentals.duckdb		saas_fundamentals.duckdb
saas_model_report.json		saas_model_report.json
saas_rev_multiple_fit.png		saas_rev_multiple_fit.png
saas_rev_multiple_fit_feature_importance.png		saas_rev_multiple_fit_feature_importance.png
saas_rev_multiple_fit_residuals_hist.png		saas_rev_multiple_fit_residuals_hist.png
saas_rev_multiple_fit_residuals_scatter.png		saas_rev_multiple_fit_residuals_scatter.png
saas_revenue_multiple_model.py		saas_revenue_multiple_model.py
schema.sql		schema.sql
secweekly_data.py		secweekly_data.py
train_ftt.py		train_ftt.py
universe.json		universe.json
– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv		– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SaaS Revenue Multiple Predictor

Approach

Install

Train

Predict

Hyperparameter Tuning (Optuna + K-fold CV)

Monte Carlo-style Ensemble for Uncertainty

Notes

SEC Fundamentals ETL

Quick start

Recent improvements

Validate with DuckDB

Sample rebuild

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SaaS Revenue Multiple Predictor

Approach

Install

Train

Predict

Hyperparameter Tuning (Optuna + K-fold CV)

Monte Carlo-style Ensemble for Uncertainty

Notes

SEC Fundamentals ETL

Quick start

Recent improvements

Validate with DuckDB

Sample rebuild

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages