Skip to content

Conversation

@gbeane
Copy link
Collaborator

@gbeane gbeane commented Dec 31, 2025

Add CatBoost Classifier Support

Summary

Adds CatBoost as a third classifier option alongside Random Forest and XGBoost, providing users with a high-accuracy gradient boosting alternative.

Changes

Core Implementation:

  • Added ClassifierType.CATBOOST enum value
  • Imported CatBoostClassifier directly (no conditional handling needed - CatBoost doesn't require libomp)
  • Created _make_catboost() factory with appropriate parameters:
    • verbose=False to suppress training output
    • allow_writing_files=False to prevent intermediate files
  • Updated train(), predict(), predict_proba(), and sort_features_to_classify() methods to handle CatBoost's native NaN support and feature_names_ attribute

Documentation:

  • Added comprehensive "Choosing a Classifier Type" section to user guide
  • Includes pros/cons, performance characteristics, and usage recommendations for all three classifiers
  • Quick comparison table to help users make informed decisions

Key Features

  • ✅ Native NaN/missing value support (like XGBoost)
  • ✅ No external dependencies (unlike XGBoost's libomp requirement)
  • ✅ Higher accuracy potential than Random Forest
  • ⚠️ Slower training time (documented in user guide)

Testing

Unit tests for CatBoost will be added after PR #253 (test reorganization) is merged.

Files Modified

  • src/jabs/types/classifier_types.py - Added CATBOOST enum
  • src/jabs/classifier/classifier.py - Implemented CatBoost support
  • src/jabs/resources/docs/user_guide/gui.md - Added classifier comparison guide

@gbeane gbeane self-assigned this Dec 31, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CatBoost as a third classifier option to JABS, providing users with a high-accuracy gradient boosting alternative that natively handles missing data without external dependencies like XGBoost's libomp requirement.

Key Changes:

  • Implemented CatBoost classifier support with native NaN handling
  • Added comprehensive classifier comparison documentation to the user guide
  • Updated prediction methods to properly handle CatBoost's feature name attribute

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/jabs/classifier/classifier.py Added CatBoost factory function, imported CatBoostClassifier, updated train/predict methods to handle CatBoost's NaN support and feature_names_ attribute
src/jabs/types/classifier_types.py Added CATBOOST enum value (referenced but not shown in diff)
src/jabs/resources/docs/user_guide/gui.md Added detailed "Choosing a Classifier Type" section with pros/cons comparison table and recommendations for all three classifiers
pyproject.toml Added catboost>=1.2.8 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if self._classifier_type in (ClassifierType.XGBOOST, ClassifierType.CATBOOST):
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
# XGBoost and CatBoost can handle NaN, just replace infinities
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misleading - the code replaces infinities with NaN (not 0), and then CatBoost/XGBoost handle both the original NaN values and the newly converted infinity→NaN values. Consider revising to: 'XGBoost and CatBoost can handle NaN, so convert infinities to NaN instead of 0'

Copilot uses AI. Check for mistakes.
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
result = self._classifier.predict_proba(self.sort_features_to_classify(features))
# XGBoost and CatBoost can handle NaN, just replace infinities
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misleading - the code replaces infinities with NaN (not 0), and then CatBoost/XGBoost handle both the original NaN values and the newly converted infinity→NaN values. Consider revising to: 'XGBoost and CatBoost can handle NaN, so convert infinities to NaN instead of 0'

Copilot uses AI. Check for mistakes.
@gbeane gbeane merged commit 5510074 into main Dec 31, 2025
2 checks passed
@gbeane gbeane deleted the add-catboost-option branch December 31, 2025 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants