Introduction to Baseline Optimal

Between the raw data and the optimal results in machine learning projects, there is an exhausting, iterative process. We go back and forth to experiment with various combinations of feature engineering, processing methods, and models along with their hyperparameters. At the end of the day, we hope our efforts pay off.

🤞 God bless data scientists. 🤞

Manual experimentation is a good practice. However, you may have found that we often produce messy, repetitive code throughout the process, and it takes a long while for us to figure out that an attempt doesn't work out. Sometimes we may overcomplicate data transformation and processing to get promising but unncessary metric scores.

Given these problems, the baseline_optimal package automates the workflow by employing Optuna's Bayesian optimization, significantly reducing the need for manual experimentation. You provide the raw data, and the modules do the heavy lifting.

Installation

You can install the baseline_optimal package and its dependencies using pip:

pip install baseline_optimal

After installation, you can import the package in Python:

import baseline_optimal

Documentation

Access the the entire documentation through GitHub Pages.

Check out baseline_optimal modules available and their respective documentation as well as example.

Modules	Task	Documentation	Example
`baseline_optimal.class_task`	classification	Link	Link

Check out machine learning algorithms supported and hyperparameters considered.

Algorithm	Source	Hyperparameters
`DecisionTreeClassifier`	`sklearn.tree`	`max_features` `max_depth` `min_samples_split`
`RandomForestClassifier`	`sklearn.ensemble`	`n_estimators` `max_features` `max_depth` `min_samples_split`
`AdaBoostClassifier`	`sklearn.ensemble`	`n_estimators` `learning_rate`
`XGBClassifier`	`xgboost`	`n_estimators` `learning_rate` `max_depth`

Why "Baseline" Optimal

The current version supports feature selection, missing value imputation, scaling and encoding as data transformation and processing steps. The pipeline performance is evaluated based on choices of these components along with multiple machine learning algorithms. With help of Optuna, the package gives you the optimal workflow provided the raw data.

The results are "baseline" optimal because the workflow attempts only the most basic methods. No feature engineering or dimensionality reduction, so on and so forth. It aims to answer the lazy question that, "If I do nothing, how far can I get?" By using this package, if you get satisfting results then congradulations! If not, then you know where the baseline is and you might want to do better than that based on your domain knowledge.

🤞 Good luck. 🤞

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
baseline_optimal		baseline_optimal
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction to Baseline Optimal

Installation

Documentation

Why "Baseline" Optimal

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sjwan01/baseline_optimal

Folders and files

Latest commit

History

Repository files navigation

Introduction to Baseline Optimal

Installation

Documentation

Why "Baseline" Optimal

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages