Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 35 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
# StochTree

`stochtree` (short for "stochastic tree") unlocks flexible decision tree modeling in R or Python.
`stochtree` (short for "stochastic trees") unlocks flexible decision tree modeling in R or Python.

## Table of Contents

* [Getting Started](getting-started.md): Details on how to install and use `stochtree`
* [About](about.md): Overview of the models supported by stochtree and pointers to further reading
* [R Package](R_docs/index.md): Complete documentation of the R package
* [Python Package](python_docs/index.md): Complete documentation of the Python package
* [C++ Core API and Architecture](cpp_docs/index.md): Overview and documentation of the C++ codebase that supports stochtree
* [Development](development/index.md): Roadmap and how to contribute

## What does the software do?

Boosted decision tree models (like [xgboost](https://xgboost.readthedocs.io/en/stable/),
[LightGBM](https://lightgbm.readthedocs.io/en/latest/), or
[scikit-learn's HistGradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html))
Expand All @@ -15,11 +27,26 @@ are great, but often require time-consuming hyperparameter tuning.
The "core" of the software is written in C++, but it provides R and Python APIs.
The R package is [available on CRAN](https://cran.r-project.org/web/packages/stochtree/index.html) and the python package will soon be on PyPI.

## Table of Contents
## Why "stochastic" trees?

* [Getting Started](getting-started.md): Details on how to install and use `stochtree`
* [About](about.md): Overview of the models supported by stochtree and pointers to further reading
* [R Package](R_docs/index.md): Complete documentation of the R package
* [Python Package](python_docs/index.md): Complete documentation of the Python package
* [C++ Core API and Architecture](cpp_docs/index.md): Overview and documentation of the C++ codebase that supports stochtree
* [Development](development/index.md): Roadmap and how to contribute
"Stochastic" loosely means the same thing as "random." This naturally raises the question: how is `stochtree` different from a random forest library?
At a superficial level, both are decision tree ensembles that use randomness in training.

The difference lies in how that "randomness" is deployed.
Random forests take random subsets of a training dataset, and then run a deterministic decision tree fitting algorithm ([recursive partitioning](https://en.wikipedia.org/wiki/Recursive_partitioning)).
Stochastic tree algorithms use randomness to construct decision tree ensembles from a fixed training dataset.

The original stochastic tree model, [Bayesian Additive Regression Trees (BART)](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-4/issue-1/BART-Bayesian-additive-regression-trees/10.1214/09-AOAS285.full), used [Markov Chain Monte Carlo (MCMC)](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) to sample forests from their posterior distribution.

So why not call our project `bayesiantree`?

Some algorithms implemented in `stochtree` are "quasi-Bayesian" in that they are inspired by a Bayesian model, but are sampled with fast algorithms that do not provide a valid Bayesian posterior distribution.

Moreover, we think of stochastic forests as general-purpose modeling tools.
What makes them useful is that their strong empirical performance -- especially on small or noisy datasets -- not their adherence to any statistical framework.

So why not just call our project `decisiontree`?

Put simply, the sampling approach is part of what makes BART and other `stochtree` algorithms work so well -- we know because we have tested out versions that did not do stochastic sampling of the tree fits.

So we settled on the term "stochastic trees", or "stochtree" for short (pronounced "stoke-tree").