Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about dataset LastFM #5

Open
knowbyyou opened this issue Jul 12, 2019 · 4 comments
Open

Questions about dataset LastFM #5

knowbyyou opened this issue Jul 12, 2019 · 4 comments

Comments

@knowbyyou
Copy link

According the link you provided in the paper, I found the LastFM dataset. But the statistic of datasets are different between the original dataset and the dataset you provided here. For example, the number of users and items are 23565 and 48122, respectively. However, the statistics are 1892 users and 17632 artists in the original dataset. Why? And I also want to ask you that is the item id here is same as the artist id in the original dataset. Is it possible for me to understand that you just sorted the history of individual users into one line? Thank you for your answsering.

@xiangwang1223
Copy link
Owner

Hi,
Thanks for your interests. Sorry for the late reply after a busy week.

  1. Please visit the LFM-1b to obtain the original version of Last-FM dataset, which has around 120,000 users. For more details, please refer to the LFM-1b paper.

  2. In the original dataset, the listening list includes the artist, album, and track, as well as a timestamp. As such, we can get a sorted personal history.

Thanks.

@649435349
Copy link

649435349 commented Jul 21, 2019

HI,
for the answer 2, does it mean that the item list of one user is sorted by the time? In another word, the latter one is listened by the user later than the former one? And the test set is later than the train set, right? Are other datasets processed in the same way?
Looking sincerely for ur reply!

@xiangwang1223
Copy link
Owner

Sorry for the late reply after a busy week. Yes, all datasets are processed chronologically.

@mzamjadi
Copy link

Hi Xiang,
Thanks for referring to LastFM data and the paper. Based on LastFM paper, the data ranges from January 2013 to August 2014. Your paper mentions that a subset of data from Jan 2015 to June 2015 was used. Could you please advise on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants