Tweepic is a cutting-edge project aimed at the clustering of live tweets. Our goal is to collect tweets and discern which ones are discussing the same topic, thereby enabling us to group them accordingly. The name 'Tweepic' is a portmanteau of 'tweet' and 'topic', reflecting our project's core objective.
Unlike traditional methods that group tweets based solely on hashtags, Tweepic proposes a novel approach that also considers the proximity of sentences and the similarity of words through their embeddings. Our process begins by determining a proximity measure for sentences, words, and hashtags. Using this measure, we construct a graph where each vertex represents a tweet, and the edges represent the k-th nearest tweets, weighted by their distance. Subsequently, a classifier is employed to decide which edges should be cut due to significant differences between the connected tweets. This results in the final graph, where each connected component represents a cluster of tweets discussing the same topic.
- Andrea Sanchietti - [email protected]
- Francesco Palandra - [email protected]
- Pipeline
- Models
- Papers
- Finetune