https://github.com/jrzaurin/recotour
A tour through recommendation algorithms in python [IN PROGRESS]
https://github.com/jrzaurin/recotour
collaborative-filtering deep-learning lightgbm matrix-factorization python3 recommendation-algorithms
Last synced: 10 months ago
JSON representation
A tour through recommendation algorithms in python [IN PROGRESS]
- Host: GitHub
- URL: https://github.com/jrzaurin/recotour
- Owner: jrzaurin
- Created: 2018-05-22T19:06:23.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-12-26T18:38:27.000Z (over 1 year ago)
- Last Synced: 2025-01-27T15:27:30.169Z (over 1 year ago)
- Topics: collaborative-filtering, deep-learning, lightgbm, matrix-factorization, python3, recommendation-algorithms
- Language: Jupyter Notebook
- Homepage:
- Size: 6.56 MB
- Stars: 176
- Watchers: 21
- Forks: 38
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RecoTour
This repo intends to be a tour through some recommendation algorithms in
python using various dataset. Companion posts are:
1. [Recotour: a tour through recommendation algorithms in python](https://medium.com/datadriveninvestor/recotour-a-tour-through-recommendation-algorithms-in-python-52d780628ab9)
2. [RecoTour II: neural recommendation algorithms](https://towardsdatascience.com/recotour-ii-neural-recommendation-algorithms-49733938d56e)
3. [RecoTour III: Variational Autoencoders for Collaborative Filtering with Mxnet and Pytorch](https://jrzaurin.github.io/infinitoml/2020/05/15/mult-vae.html).
The repo is organised as follows:
1. **recotour**: this is the original "tour" through recommendation algorithms
using the [Ponpare](https://www.kaggle.com/c/coupon-purchase-prediction)
coupon dataset. In particular, the algorithms included in the `recotour`
directory are:
1. Data processing, with a deep dive into feature engineering
2. Most Popular recommendations (the baseline)
3. Item-User similarity based recommendations
4. kNN Collaborative Filtering recommendations
5. GBM based recommendations using `lightGBM` with a tutorial on how to optimize gbms
6. Non-Negative Matrix Factorization recommendations
7. Factorization Machines (Steffen Rendle 2010) recommendations using `xlearn`
8. Field Aware Factorization Machines (Yuchin Juan, et al, 2016) recommendations using `xlearn`
9. Deep Learning based recommendations (Wide and Deep, Heng-Tze Cheng, et al, 2016) using `pytorch`
I have included a more modular (nicer looking) version of a possible final
solution (described in `Chapter16_final_solution_Recommendations.ipynb`) in
the directory `final_recommendations`.
In addition, I have included an illustration of how to use other evaluation
metrics apart from the one shown in the notebooks ( the mean average precision
or MAP) such as the Normalized Discounted Cumulative Gain
([NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain)). This can
be found in `using_ncdg.py` in the directory `py_scripts`.
In addition, there are other, DL-based recommendation algorithms that use
mainly the [Amazon Reviews](http://jmcauley.ucsd.edu/data/amazon/) dataset,
in particular the 5-core Movies and TV reviews. These are:
2. **neural_cf**: Neural Collaborative Filtering (Xiangnan He et al., 2017)
3. **neural_graph_cf**: Neural Graph Collaborative Filtering (Wang Xiang et al. 2019)
4. **mult-vae**: Variational Autoencoders for Collaborative Filtering (Dawen Liang et al,. 2018)
**The core of the repo are the notebooks** in each directory. They intend to
be self-contained and in consequence, there is some of code repetition. The
code is, of course, "notebook-oriented". The notebooks have plenty of
explanations and references to relevant papers or packages. My intention was
to focus on the code, but you will also find some math.
I hope the code here is useful to someone. If you have any idea on how to
improve the content of the repo, or you want to contribute, let me know.