Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions in your evaluation code and data #2

Open
liuhaifeng0212 opened this issue Sep 9, 2019 · 7 comments
Open

Some questions in your evaluation code and data #2

liuhaifeng0212 opened this issue Sep 9, 2019 · 7 comments

Comments

@liuhaifeng0212
Copy link

In your paper "To evaluate the results more efficiently, we randomly sample 999 items which have no interaction with the target user and rank the validation and test items with respect to these 999 items."
But I can't fount the validation dataset, Meanwhile, In MovieLens, only 943 users, 1682 items, I counted the trian in your Movielens train data, I fount that user id : 466 have 684 items in train data. user id: 405 have 732 items in train data, So how could you evaluation it with 1000 items in test set?

@ghost
Copy link

ghost commented Sep 9, 2019

Thank you for your interests. In the paper, I think I write that for MovieLens, I perform the ranking between all items. The sampling 999 negative is only for KKBOX datasets. Besides, in the readme file, I upload the links for the test file for KKBOX.

@ghost
Copy link

ghost commented Sep 9, 2019

I read the paper again. There is a confusion. I'm sorry about that. The circumstance is that because MovieLens has smaller item number, so the ranking in all items is affordable. The sampling strategy is only for KKBOX.

@liuhaifeng0212
Copy link
Author

when i running your RCF.py code in the default setting, print "the total loss in 1 th iteration is: nan, the attentions are nan, nan, nan, na", no matter how I change the parament setting, always report the same error. do you have any suggestion?

@ghost
Copy link

ghost commented Sep 27, 2019

sorry for the late reply. I have some friends who is also doing something based on this work. According to them, they didn't encounter the NAN problem. If you do always have nan, I doubt there is something wrong with the activation or the softmax function.

@zxm97
Copy link

zxm97 commented Oct 18, 2019

sorry for the late reply. I have some friends who is also doing something based on this work. According to them, they didn't encounter the NAN problem. If you do always have nan, I doubt there is something wrong with the activation or the softmax function.

I'm sorry but i didn't find out how to run the RCF model(on MovieLens) without masking. I don't understand the meaning of "mode == 'add'" and "mode == 'mul'". Could you please tell me how can I get a satisfying performance?

@ghost
Copy link

ghost commented Oct 19, 2019 via email

@zxm97
Copy link

zxm97 commented Oct 20, 2019

Hi, mask is necessary because of the batch setting so we can feed the data in one feed-dict with a fixed length. For example, one I_u^t contains item {1,2,3,4}, the other contains {5,6}. So we need to add a mask to feed them in one batch and the later will be {5,6,mask,mask}. The mode denotes how we treat the masked position. ‘’ADD’’ means it will be treated as -NA. “Mul” means it is treated as zeros.

Best,
Xin Xin

On Oct 18, 2019, at 7:50 PM, zxm97
Thanks for you reply. I downloaded the code again and made a few changes to ensure it work in python3. And I added tf.clip_by_value() into every log loss computation to prevent inf/nan. But I can't get satisfactory results on MovieLens. The attentions are about 0.31, 0.27, 0.25, 0.17 respectively, far away from 0.1397, 0.3191, 0.2552, 0.2859. And the performance is even worse than MF. I don't know what the problem is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants