GitHub - CarloArpini/2025-TextMining-Search-project: University project for the Text Mining & Search course of the MD in Data Science.

2025 Project for Text Mining & search course of the Data Science Master's Degree.

The project aims mainly at a multi class classification of about 50000 Tweets from https://ieeexplore.ieee.org/document/9378065 based on the different types of hate speech present (or not) in each tweet. We used different techniques (Bag of Words, GloVe and sentence-BERT) to vectorize/create embeddings of our tweets and then we classified tweets using MLP, Decision Trees and XGBoost.

The secondary aim of the project is to investigate on whether the classes defined in the classification are indeed representative of how hate speech can be characterised or not: to do this we used clustering techniques on each vector based representation of our tweets and then we took a look at each cluster with a topic modelling approach.

Note that in order to execute the notebooks it will be needed to load GloVe embeddings; here is the link to the file download we used https://huggingface.co/stanfordnlp/glove/resolve/main/glove.twitter.27B.zip, from https://github.com/stanfordnlp/GloVe?tab=readme-ov-file

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
.DS_Store		.DS_Store
BoW.ipynb		BoW.ipynb
BoW_novocab_size_wordfreq.png		BoW_novocab_size_wordfreq.png
BoW_wordfreq_nostopwords.png		BoW_wordfreq_nostopwords.png
GloVo.ipynb		GloVo.ipynb
README.md		README.md
TM project .pdf		TM project .pdf
cleantweets.csv		cleantweets.csv
cyberbullying_tweets.csv		cyberbullying_tweets.csv
preprocessing.ipynb		preprocessing.ipynb
readme.txt		readme.txt
sBERT.ipynb		sBERT.ipynb
textmining_report.pdf		textmining_report.pdf
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages