This is not an issue or bug per se with the FeatureHashing package, but I'm wondering if it's possible to train a model using the tf-idf option with the split function using hashed.model.matrix, but without computing the tf-idf transform on the training + test datasets.
I'm thinking that in many realistic scenarios, we don't know in advance what words the test set will contain, hence the decoupling of the tf-idf.
Normally, at prediction time, one would only keep the words that appeared in the training set and discard the others to construct a tf-idf matrix prior to using the hashing trick.