Abstract Social tagging of movies reveals a wide range of heterogeneous information about
movies, like the genre, plot structure, soundtracks, metadata, visual and emotional experiences.
Such information can be valuable in building automatic systems to create tags for movies. Au-
tomatic tagging systems can help recommendation engines to improve the retrieval of similar
movies as well as help viewers to know what to expect from a movie in advance. In this pa-
per, we set out to the task of collecting a corpus of movie plot synopses and tags. We describe a
methodology that enabled us to build a fine-grained set of around 70 tags exposing heterogeneous
characteristics of movie plots and the multi-label associations of these tags with some 14K movie
plot synopses. We investigate how these tags correlate with movies and the flow of emotions
throughout different types of movies. Finally, we use this corpus to explore the feasibility of infer-
ring tags from plot synopses. We expect the corpus will be useful in other tasks where analysis of
narratives is relevant.
Please find the paper here: https://www.aclweb.org/anthology/L18-1274
This dataset was published in LREC 2018@Miyazaki, Japan.
Keywords Tag generation for movies, Movie plot analysis, Multi-label dataset, Narrative texts
More information is available here http://ritual.uh.edu/mpst-2018/
Please use the following BibTex to cite the work.
@InProceedings{KAR18.332, author = {Sudipta Kar and Suraj Maharjan and A. Pastor López-
Monroy and Thamar Solorio},
title = {{MPST}: A Corpus of Movie Plot Synopses with Tags},
book- title = {Proceedings of the Eleventh International Conference on Language Resources and Eval-
uation (LREC 2018)},
year = {2018}, month = {May}, date = {7-12}, location = {Miyazaki, Japan},
editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-00-9}, language = {english} }
We would like to thank the National Science Foundation for partially funding this work under
award 1462141. We are also grateful to Prasha Shrestha, Giovanni Molina, Deepthi Mave, and
Gustavo Aguilar for reviewing and providing valuable feedback during the process of creating
tag clusters
Predict tags for a movie using synopsis
https://www.kaggle.com/cryptexcode/mpst-movie-plot-synopses-with-tags/activity
Its consist of movie id , movie title, synopsis,tags,source
It is a multi-label classification problem Multi-label Classification: Multilabel classification as-
signs to each sample a set of target labels. This can be thought as predicting properties of a data-
point that are not mutually exclusive, such as topics that are relevant for a document.