-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding again lightfm #671
adding again lightfm #671
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #671 +/- ##
===========================================
- Coverage 91.03% 40.59% -50.44%
===========================================
Files 132 133 +1
Lines 9078 9134 +56
===========================================
- Hits 8264 3708 -4556
- Misses 814 5426 +4612 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! Need to get the tests working, and I left some other comments for removing unnecessary bits and improving the logic.
from pytest import approx, mark | ||
import sys | ||
import os | ||
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove — we do not need to do path manipulations for imports. If there is an import problem, we should fix it at the source or in how we set up and run things.
from lenskit.data import Dataset, ItemList, from_interactions_df | ||
from lenskit.metrics import quick_measure_model | ||
from lenskit.testing import BasicComponentTests, ScorerTests | ||
from lenskit_lightfm import LightFMScorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing this to from lenskit.lightfm
should fix the immediate test failure problem.
users_: Vocabulary | ||
items_: Vocabulary | ||
user_features_: np.ndarray[tuple[int, int], np.dtype[np.float64]] | ||
item_features_: np.ndarray[tuple[int, int], np.dtype[np.float64]] | ||
model_: LightFM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the _
suffixes on these — with the new design, I have not been using that suffix in new components.
if hasattr(self, "model_") and not options.retrain: | ||
return | ||
|
||
interaction_matrix = data.interaction_matrix(format="csr", field="rating") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only work for explicit-feedback data, but the defaults are set up for WARP loss.
If LightFM supports an explicit-feedback mode, we should only use the rating
column if we are using that mode. It would be better to supply no field, so we get a binary indicator matrix, for the common implicit feedback case.
user_features_: np.ndarray[tuple[int, int], np.dtype[np.float64]] | ||
item_features_: np.ndarray[tuple[int, int], np.dtype[np.float64]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using the model to do predictions, we don't need to save these arrays separately, so let's remove them.
if user_num is None: | ||
_logger.debug("Unknown user %s", query.user_id) | ||
return ItemList(items, scores=np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does LightFM have the ability to predict for a user based on their history instead of their identifier? If so, we should use that functionality.
def test_lightfm_batch_accuracy(ml_100k: pd.DataFrame): | ||
data = from_interactions_df(ml_100k) | ||
lightfm_algo = LightFMScorer() | ||
results = quick_measure_model(lightfm_algo, data, predicts_ratings=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default setup is WARP loss, which is for implicit feedback. So we should remove predicts_ratings
, and change to top-N metrics (RBP or NDCG).
No description provided.