Issues with sparse and dense matrices may arrise. Watch that Possible preprocessing may be: Pair-wise, efficient if matrix sparse: - [Chi-Squared test](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html) - [Mutual information](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html) Other options: - [Feature agglomeration](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.FeatureAgglomeration.html#sklearn.cluster.FeatureAgglomeration) - [Random projection](https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html#sklearn.random_projection.SparseRandomProjection) If the matrix at the end of preprocessing is dense it is better to go with [IncrementalPCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.IncrementalPCA.html#sklearn.decomposition.IncrementalPCA)