This project is a web-based application that performs sentiment analysis on clothing reviews. Users can input review text, and the app predicts whether the review is Recommended or Not Recommended using two models:
- TF-IDF + Random Forest
- BERT (Bidirectional Encoder Representations from Transformers)
The app provides side-by-side predictions and confidence scores for comparison.
clothing-review-analysis.mp4
-
Two Models for Prediction:
- TF-IDF + Random Forest: Lightweight, interpretable traditional machine learning model.
- BERT: Advanced deep learning-based NLP model for high accuracy.
-
Streamlit Interface:
- Simple web UI to enter reviews for prediction.
- Display results for both models with confidence scores.
-
Backend:
- Python
- Hugging Face
transformers
- Scikit-learn
-
Web Framework:
- Streamlit
-
Machine Learning Models:
- Pretrained BERT model fine-tuned for sentiment classification.
- Random Forest classifier trained on TF-IDF features.
pip install -r requirements.txt
-
Place the TF-IDF + Random Forest model and vectorizer in the root directory:
tfidf_rf_model.pkl
tfidf_vectorizer.pkl
-
Save the fine-tuned BERT model into the directory
bert_sentiment_model/
:config.json
pytorch_model.bin
tokenizer_config.json
vocab.txt
To start the Streamlit app, use the following command:
streamlit run app.py
Once running, open the link provided in the terminal to access the web app in your browser.
- Enter Review Text: In the text area provided, input a review you want to analyze.
- Get Predictions: Click on the "Predict" button.
- View Results: The app displays:
- Predicted sentiment (
Recommended
orNot Recommended
). - Confidence scores from both the TF-IDF + Random Forest and BERT models.
- Predicted sentiment (
sentiment-analysis-app/
├── app.py # Main Streamlit app script
├── tfidf_rf_model.pkl # Trained Random Forest model
├── tfidf_vectorizer.pkl # Trained TF-IDF vectorizer
├── bert_sentiment_model/ # Directory containing the saved BERT model
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── tokenizer_config.json
│ ├── vocab.txt
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Traditional machine learning pipeline.
- Uses TF-IDF for feature extraction and a Random Forest classifier for predictions.
- The
bert-base-multilingual-uncased-sentiment
model from Hugging Face'stransformers
. - Tokenizer processes the review text, and the model predicts sentiment.
- Batch Predictions: Add support for analyzing multiple reviews via file upload.
- Custom Threshold: Allow users to set a confidence threshold for predictions.
- Visualization: Include charts or graphs for a better understanding of model outputs.
Run the train.py
script to train the TF-IDF + Random Forest model and prepare the BERT pipeline:
python train.py
This will save the trained models (tfidf.pkl
, random_forest.pkl
) for the TF-IDF + Random Forest method.
- Hugging Face Transformers for pre-trained BERT models.
- Kaggle for the dataset: "Women's Clothing E-Commerce Reviews".
- Ernitia Paramasari
Data Scientist and Machine Learning Engineer
Feel free to contribute to this project by submitting issues or pull requests!
This project is licensed under the MIT License.
Enjoy analyzing clothing reviews with cutting-edge NLP models! 🎉