A project which was an assignment for our Text Analytics course @ Heriot Watt Uni. This project uses a scraped dataset of amazon reviews across various categories of items and their ratings'.
This project aimed to develop a ML model to accurately predict the rating given using the review's text.
The standard NLP workflow is abided by. What was done in each step of the workflow is briefly described below:
- Data Exploration and Visualization.
- Text Processing and Normalization.
- GridSearch over Text Vectorization Techniques and Classifiers.
- Sequence Modelling - Deep Learning.
- Topic Modelling.
The best performing model was a bidirectional LSTM with Word2Vec embedding.
- results Folder: Contains the saved models so they can be re-evaluated.
- cw2.ipynb: The Jupyter Notebook containing the entire pipeline and experimentation.
- train.pickle: The pickle file containing the dataset.
Pandas Numpy MatplotLib NLTK SKLearn Tensorflow Keras Gensim
Contributors: Bhavika Kaliya, Alora Tabuco and Andrea Nabua.