A highly customisable movie recommender that will give you movies based on your favourite genre/taste This project builds a movie recommendation system using the MovieLens dataset. It leverages matrix factorization with PyTorch to learn latent embeddings for users and movies, and then applies KMeans clustering on the learned movie embeddings to group similar movies. Finally, user preferences are mapped to clusters, and the system suggests movies that are popular within those clusters.
This script handles the training stage:
- Loads the MovieLens dataset.
- Implements a Matrix Factorization model in PyTorch (user embeddings × movie embeddings).
- Trains the model with Mean Squared Error loss and Adam optimizer.
- Extracts movie embeddings after training.
- Runs KMeans clustering on embeddings to group similar movies.
- Shows top-rated movies in each cluster (helps interpret clusters).
- Saves the cluster labels to movie_clusters.npy for later use.
This script is responsible for the recommendation (inference) stage:
- Loads the MovieLens dataset.
- Loads pre-trained cluster labels from movie_clusters.npy.
- Maps user-selected movies to their cluster(s).
- Identifies clusters of interest and prints recommendations from them.
Include user-specific ratings instead of just clustering. Try more advanced models (Neural Collaborative Filtering, Autoencoders).
Combine content-based filtering (genres, tags) with collaborative filtering.
Build a simple Flask/React app to let users input their favorite movies and get recommendations.
Scale to the MovieLens 1M / 20M datasets for richer embeddings.
Add standard metrics (RMSE, precision@k, recall@k) to measure recommendation quality.
python movie_recommender_training.pyThis will:
- Train the matrix factorization model
- Cluster movies into groups
- Save cluster labels as movie_clusters.npy
python recommend_movies.pyThis will:
- Load saved clusters
- Ask for favorite movies (hardcoded or user-provided)
- Suggest other movies from the same clusters