Skip to content

A machine learning pipeline for preprocessing, model selection, and evaluation.

License

Notifications You must be signed in to change notification settings

achelousace/brainsbuildcode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Brains Build Code - Automated Machine Learning Pipeline

Brains Build Code is an automated machine learning pipeline designed to simplify the end-to-end machine learning workflow. It handles:

  • Data preprocessing
  • Feature engineering
  • Model selection
  • Hyperparameter tuning
  • Model evaluation

Built to save you time, reduce boilerplate, and accelerate experimentation.

Installation

From PyPI

pip install brainsbuildcode

Directly from GitHub

pip install git+https://github.com/achelousace/brainsbuildcode.git

πŸ“– Usage

Fast Build Example

from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load dataset
data = load_breast_cancer(as_frame=True)
df = data.frame

# Instantiate and build the model
best_model = Brain(df, target='target', model_name='RFC', grid_search=None)
best_model.build()

Alternative (Chainable Call)

from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer(as_frame=True)
df = data.frame

# Instantiate and immediately build
best_model = Brain(df, target='target', model_name='RFC', grid_search='cv').build()

Full Pipline Example

from sklearn.datasets import load_iris
import pandas as pd
from brainsbuildcode import Convert, Brain

# Load dataset
data = load_iris(as_frame=True)
df = data.frame
df['target'] = data.target


# Full Pipline of Brain class
brain = Brain(df=df,
              target='target',                        # Target column name
              model_name='RFC',                       # Model name ('RFC', 'XGBC', etc.)
              task='classification',                  # Task type: 'classification' or 'regression'
              
              # Scaling and PCA
              scale=True,                             # Apply scaling to numerical columns (True) or disable scaling (False)
              pca=False,                              # Apply PCA (True) or not (False)
              pca_comp=0.9,                           # PCA components to retain (if pca=True), default 90% variance

              # Data splitting and CV
              test_size=0.2,                          # Test set size (0.2 = 20% test)
              cv=5,                                   # Number of cross-validation folds
              
              # Missing values handling
              miss=True,                              # Show missing values summary (True) or skip (False)
              ynan=False,                             # Drop rows with NaN in target (True) or keep (False)
              
              # Columns to process
              numerical_cols=[],                      # Manually specify numerical columns (empty = auto-detect)
              categorical_cols=[],                    # Manually specify categorical columns (empty = auto-detect)
              ordinal_cols={},                        # Dict of ordinal columns: {column: [order]}
              drop_cols=(),                           # Columns to drop before processing
              
              # Encoding and Imputation
              categorical_encoding='onehot',          # 'onehot' or 'label' encoding for categorical features
              ordinal_encoding=False,                 # Apply ordinal encoding (True) or skip (False)
              numerical_impute_strategy='mean',       # Imputation strategy for numerical columns
              categorical_impute_strategy='most_frequent',  # Imputation strategy for categorical
              ordinal_impute_strategy='most_frequent',      # Imputation strategy for ordinal
              categorical_fill_value=None,            # Fill value for categorical imputation
              ordinal_fill_value=None,                # Fill value for ordinal imputation
              
              # Display options
              showx=True,                            # Display processed X_train/X_test (True) or skip (False)
              summary=True,                          # Display data summary (True) or skip (False)
              objvalue=True,                         # Display value counts of categorical columns (True/False)
              xtype=True,                            # Display column data types (True/False)
              
              # Duplicate handling
              drop_duplicates=False,    # False = keep duplicates, True = drop first occurrence, 'all' = drop all duplicates
              
              # Target encoding
              yencode=None,                           # 'encode' = LabelEncode target, 'bin' = Binarize, None = no encoding
              
              # Grid Search / Hyperparameter tuning
              grid_search=None,                       # None = no tuning, 'cv' = GridSearchCV, 'rand' = RandomizedSearchCV
              voting='soft',                          # Voting method if using ensemble voting ('soft' or 'hard')
              voteclass=[],                           # List of classifiers for voting model
              pa=0,                                   # Grid search plot: index of hyperparameter to visualize (0 = first param)

              # Column value filtering
              dropc=False,                            # Drop rows based on column values (True/False)
              column_value={},                        # Dict of {column_name: values_to_drop} if dropc=True

              # Ordinal detection
              ord_threshold=None,                     # Auto-detect ordinal columns if unique values <= threshold (None disables)
              ordname=(),                             # Tuple of ordinal column names to treat manually
              
              # Conversion thresholds for object columns
              typeval=80,                             # % threshold: convert object column to numeric if >= this %
              convrate=80,                            # % threshold: if numeric, convert to int if >= this %, else float

              preprocessed_out=False                  # Return preprocessed dataset (True) or train model (False)
             )

# Build, preprocess, and train the model
brain.build()

Convert Function (Manual Data Preprocessing)

from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd

# Load dataset
df = sns.load_dataset("titanic")

# Define target
target = 'survived'

# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()

# Now you can process `X`, `y`, `ncol`, `ocol`, `ordinal_cols` manually, or pass them back to `Brain`

Pass Convert to Brain Manually

from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd

# Load dataset
df = sns.load_dataset("titanic")

# Define target
target = 'survived'

# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()

# Step 2: Instantiate Brain with converted column info
best_model = Brain(
    df=df,
    target=target,
    model_name='RFC',
    grid_search=None,
    drop_duplicates=True,
    numerical_cols=ncol,        # Pass numerical columns from Convert
    categorical_cols=ocol,      # Pass categorical columns from Convert
    ordinal_cols=ordinal_cols   # Pass ordinal columns from Convert
)

# Step 3: Build the model
best_model.build()

Models Names

Model_Name Model Name Task
LR LogisticRegression classification
RFC RandomForestClassifier classification
XGBC XGBClassifier classification
KNNC KNeighborsClassifier classification
DTC DecisionTreeClassifier classification
SVC SVC classification
MLPC MLPClassifier classification
ADAC AdaBoostClassifier classification
GBC GradientBoostingClassifier classification
BC BaggingClassifier classification
NBC BernoulliNB classification
Linear LinearRegression regression
RFR RandomForestRegressor regression
XGBR XGBRegressor regression
KNNR KNeighborsRegressor regression
DTR DecisionTreeRegressor regression
SVR SVR regression
MLPR MLPRegressor regression
ADAR AdaBoostRegressor regression
GBR GradientBoostingRegressor regression
BR BaggingRegressor regression
NBR BayesianRidge regression
LRmulti LogisticRegression multi-class classification
RFmulti RandomForestClassifier multi-class classification
XGBmulti XGBClassifier multi-class classification
KNNmulti KNeighborsClassifier multi-class classification
DTmulti DecisionTreeClassifier multi-class classification
SVCmulti SVC multi-class classification
MLPCmulti MLPClassifier multi-class classification
ADAmulti AdaBoostClassifier multi-class classification
GBmulti GradientBoostingClassifier multi-class classification
BCmulti BaggingClassifier multi-class classification
NBmulti ComplementNB multi-class classification
vote VotingClassifier Ensamble

πŸ’‘ Key Features

  • Automatic Detection of numerical, categorical, and ordinal features.

  • Missing Value Handling with customizable strategies.

  • Feature Scaling & PCA Support.

  • Flexible Encoding: One-hot, label, ordinal.

  • Multiple Models Supported: Random Forest, XGBoost, Logistic Regression, SVC, etc.

  • Voting Classifiers & Ensemble Models.

  • Hyperparameter Optimization: Grid Search & Randomized Search.

  • Detailed Evaluation Metrics & Visualizations.

πŸ”— License

This project is licensed under the MIT License.

πŸ› οΈ Contribution

Feel free to contribute, suggest features, or report issues via pull requests and the issues section!

About

A machine learning pipeline for preprocessing, model selection, and evaluation.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages