Skip to content

ruizmr/info664

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SaaS Revenue Multiple Predictor

This module trains a supervised model to predict a company's Revenue Multiple.

By default, it automatically parses and uses numeric columns from the CSV (monetary, percentages, counts, etc.). To prevent leakage, the loader excludes identifiers, links/text, the target, and any column whose name suggests it encodes valuation/multiples/price directly (e.g., Stock Price, Market Cap, Price/Sales, EV/Revenue, Shares Outstanding, etc.). Columns are parsed as dollars, percentages, or floats, with median imputation for sparse missing values.

Data source: the included CSV – A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv.

Approach

  • Auto-parse numeric features while guarding against leakage (see above).
  • Hold-out split (default 80/20) with fixed seed for reproducibility.
  • Model: GradientBoostingRegressor; optional Optuna K-fold tuning.
  • Optional MC-style ensemble for robustness/uncertainty.
  • Metrics: MAE, RMSE, R², median AE, p90/p95 AE, MAPE; saved to JSON.
  • Visuals: parity (actual vs predicted), residuals scatter, residuals histogram, top feature importances.

Install

pip install -r requirements.txt

Train

# Reproduce latest run
python saas_revenue_multiple_model.py train \
  --tune --trials 10 --cv-folds 5 \
  --mc-ensemble 5 --mc-bootstrap \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl \
  --plot saas_rev_multiple_fit.png \
  --plot-style parity --plot-label-tickers --plot-color-by none --plot-size-by none

Outputs:

  • saas_rev_multiple_model.pkl — serialized model
  • Plots:
    • saas_rev_multiple_fit.png — parity (actual vs predicted)
    • saas_rev_multiple_fit_residuals_scatter.png
    • saas_rev_multiple_fit_residuals_hist.png
    • saas_rev_multiple_fit_feature_importance.png
  • JSON trace: saas_model_report.json with metrics/config

Predict

Predict by ticker (features auto-derived from CSV):

python saas_revenue_multiple_model.py predict \
  --model saas_rev_multiple_model.pkl \
  --ticker ADBE \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv"

Note: In auto-feature mode, prediction requires --ticker + --csv and raw manual values are disabled (features depend on the parsed CSV schema).

Hyperparameter Tuning (Optuna + K-fold CV)

Enable tuning with K-fold CV:

python saas_revenue_multiple_model.py train \
  --tune --trials 50 --cv-folds 5 \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl \
  --plot saas_rev_multiple_fit.png

This searches over n_estimators, learning_rate, max_depth, and subsample for GradientBoostingRegressor using mean MAE across folds.

Monte Carlo-style Ensemble for Uncertainty

You can train an ensemble of models via bagging to estimate uncertainty:

python saas_revenue_multiple_model.py train \
  --tune --trials 25 --cv-folds 5 \
  --mc-ensemble 15 --mc-bootstrap \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --model-out saas_rev_multiple_model.pkl

At prediction time, request mean +/- std from the ensemble:

python saas_revenue_multiple_model.py predict \
  --model saas_rev_multiple_model.pkl \
  --ticker ADBE \
  --csv "– A free database of all SaaS businesses listed on the U.S. stock exchanges NYSE and NASDAQ.csv" \
  --uncertainty

Notes


SEC Fundamentals ETL

Build a local SQLite DB of company fundamentals directly from SEC APIs.

Quick start

  1. Install deps
pip install -r requirements.txt
  1. Run ETL
python etl.py \
  --tickers ADBE,MSFT \
  --db saas_fundamentals.db \
  --user-agent "[email protected]"

Or via a file:

python etl.py \
  --tickers-file tickers.txt \
  --db saas_fundamentals.db \
  --user-agent "[email protected]"
  • Creates/updates SQLite DB per schema.sql with tables: companies, fundamentals, filings.
  • Pulls companyfacts and submissions per CIK; computes LQ, annualized, LTM, and YoY metrics.
  • Respects SEC rate limits and requires a polite User-Agent.

Recent improvements

  • Smart tag resolver per metric with provenance and caching.
  • Strict unit hygiene: monetary facts only in USD; shares in shares.
  • Scale normalization using SEC-provided scale (value × 10^scale).
  • Robust report stamps derived from the exact revenue row used (with fallbacks).
  • Margin computation only when periods align exactly; guardrails drop |margin| > 100%.
  • Preference for quarterly forms (10-Q) for LQ; 10-K allowed for annual context/fallback.
  • Traceability: fundamentals_audit table logs source tag, unit, and scale per metric.
  • Polite rate limiting with jitter and retries/backoff for NASDAQ/SEC endpoints.

Validate with DuckDB

python duck_analytics.py --db saas_fundamentals.db --summary
python duck_analytics.py --db saas_fundamentals.db --export fundamentals.parquet

Sample rebuild

python etl.py --tickers AAPL,MSFT,AMZN,ADBE,NVDA --db saas_fundamentals.db --user-agent "[email protected]"
  • The script robustly parses '$' and commas in cash/debt columns and '%' in margin.
  • Rows with missing inputs/target are dropped before splitting.
  • You can adjust model choice or hyperparameters inside saas_revenue_multiple_model.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages