This project was made to experiment with Spark MLlib 2.4.0 with a kaggle wine review data set. It explores Imputer
, TF-IDF
, StopWordsRemover
, Word2Vec
, Tokenization
, StringIndexer
, OneHotEncoderEstimator
, GBTRegressor
, LinearRegression
, RandomForestRegression
, GeneralizedLinearRegression
, and other concepts.
Simply download the HTML
file and view it in any browser.