Skip to content

Can classifier update() be faster than training from scratch? #123

@DSA101

Description

@DSA101

I am building a dataset and am training NaiveBayesClassifier as the dataset grows. Instead of retraining the classifier every time after adding few new entries, I was hoping to use the update() method just to add new entries and retrain the model with them, in order to cut training time when new data added. What I discovered that loading a pickled trained classifier and updating it just with new entries is not faster than re-training it from scratch. Re-reading the docs they do say that update() "Update the classifier with new training data and re-trains the classifier", which implies re-training on the entire data set...

Question: is there such thing as incremental re-training, or realistically it is processing the entire dataset from scratch, every time I want to update the classifier with new data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions