Decouple tf-idf transform on training set from tf-idf of test set

This is not an issue or bug per se with the FeatureHashing package, but I'm wondering if it's possible to train a model using the tf-idf option with the split function using `hashed.model.matrix`, but without computing the tf-idf transform on the training + test datasets.
I'm thinking that in many realistic scenarios, we don't know in advance what words the test set will contain, hence the decoupling of the tf-idf. 
Normally, at prediction time, one would only keep the words that appeared in the training set and discard the others to construct a tf-idf matrix prior to using the hashing trick.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple tf-idf transform on training set from tf-idf of test set #121

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decouple tf-idf transform on training set from tf-idf of test set #121

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions