Aim to use the multinomial Naive Bayes algorithm build a spam filter for SMS messages.
To train the algorithm, we'll use a dataset of 5,572 SMS messages that have been manually classified.
The dataset was put together by Tiago A. Almeida and José María Gómez Hidalgo, and it can be downloaded from the The UCI Machine Learning Repository.
The data collection process is described in more details on this page, where you can also find some of the papers authored by Tiago A. Almeida and José María Gómez Hidalgo.