Polarity classification for products and services reviews written in Darija.
The project aims to develop a sentiment analysis system specifically designed for analyzing product and services reviews written in Darija, the Moroccan dialect, supporting Arabic & Arabizi writing styles. The goal is to provide businesses operating in Morocco with valuable insights into customer sentiments and opinions.
The project involves the following key steps:
-
Conducting a survey of existing techniques and datasets for sentiment analysis, with a specific focus on resources available for Darija.
-
Data Collection: Gathering a dataset of Darija products and services reviews from relevalt Youtube videos and Facebook groups.
-
Data Preprocessing: Cleaning and preparing the collected data for analysis, including text normalization and removal of noise, this step includes writing style specific processing.
-
Exploring various DL techniques for polarity classification.
-
Fine-tuning DarijaBERT-mix, a pre-trained transformer-based model specifically designed for darija.
-
Application Development: Buiding a Demo Streamlit web application.