This project performs Sentiment Analysis on restaurant reviews using Natural Language Processing (NLP) and Machine Learning (NaΓ―ve Bayes Classifier).
It predicts whether a given review is positive π or negative π based on the text entered by the user.
The project uses a dataset of restaurant reviews (Restaurant_Reviews.tsv) to train a text classification model.
By cleaning, tokenizing, stemming, and converting text into numerical form using a Bag of Words model, it learns to classify reviews accurately.
- π§Ή Text preprocessing (cleaning, tokenization, stopword removal, stemming)
- π§ Feature extraction using CountVectorizer (Bag of Words)
- π€ Trained NaΓ―ve Bayes Classifier for sentiment prediction
- βοΈ Interactive console input to test custom reviews
- π Evaluation metrics: Accuracy, Precision, and Recall
- π Hyperparameter tuning using different alpha values
| Category | Libraries / Tools Used |
|---|---|
| Language | Python |
| Data Handling | NumPy, Pandas |
| NLP | NLTK |
| Machine Learning | scikit-learn |
| Visualization / Output | Console-based metrics and predictions |
The project uses the Restaurant_Reviews.tsv dataset, which contains two columns:
| Column | Description |
|---|---|
| Review | The text review given by a customer |
| Liked | The sentiment label (1 = Positive, 0 = Negative) |
-
Clone the repository
git clone https://github.com/yourusername/Sentiment_Analysis_of_Restaurant_Reviews.git cd Sentiment_Analysis_of_Restaurant_Reviews -
Install required libraries
pip install numpy pandas nltk scikit-learn
-
Download NLTK stopwords
import nltk nltk.download('stopwords')
-
Place the dataset
Ensure the fileRestaurant_Reviews.tsvis in the same directory as your Python script. -
Run the project
python Sentiment_Analysis_of_Restaurant_Reviews.py
-
Data Preprocessing
- Remove non-alphabetic characters using regex
- Convert text to lowercase
- Remove English stopwords
- Apply stemming using PorterStemmer
- Create a corpus of cleaned reviews
-
Feature Extraction
- Convert text into numeric vectors using
CountVectorizer(max_features=1500)
- Convert text into numeric vectors using
-
Model Training
- Split data into training and testing sets (
train_test_split) - Train using Multinomial Naive Bayes
- Split data into training and testing sets (
-
Evaluation
- Calculate Accuracy, Precision, and Recall
- Tune
alphahyperparameter to find the best score
-
Prediction
- Accept custom user input from console
- Predict whether the review is Positive or Negative
-------------------
| SCORES |
-------------------
Accuracy Score : 83.5%
Precision Score : 78.6%
Recall Score : 80.1%
Accuracy Score for alpha=0.1 is : 83.25%
Accuracy Score for alpha=0.2 is : 83.5%
...
The Best Accuracy is 83.5%
----------------------------------------------------------------------------
| This Model Predicts Whether Restaurant Review is Positive or Negative |
----------------------------------------------------------------------------
Enter Your Review About Restaurant : The food was delicious and service was great!
=> This is Positive Review.
# Example prediction
Enter Your Review About Restaurant : The place was very dirty and food was cold.
=> This is Negative Review.- The model is trained using Multinomial Naive Bayes, which is efficient for text classification tasks.
- Adjusting the
alphaparameter helps control model smoothing. - You can experiment with other classifiers like SVM or Logistic Regression for improved accuracy.
Mrunali Badgujar
π§ mrucoder@gmail.com
π GitHub Profile
This project is licensed under the MIT License β see the LICENSE file for details.
β¨ βLet the code decide if your meal was worth it!β β¨