https://github.com/haekyu/mfrwr
A Comparative Study of Matrix Factorization and Random Walk with Restart
https://github.com/haekyu/mfrwr
bigdata2017 matrix-factorization random-walk-with-restart recommender-system
Last synced: 7 months ago
JSON representation
A Comparative Study of Matrix Factorization and Random Walk with Restart
- Host: GitHub
- URL: https://github.com/haekyu/mfrwr
- Owner: haekyu
- Created: 2018-03-09T04:26:54.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-19T02:38:52.000Z (almost 8 years ago)
- Last Synced: 2025-06-30T02:02:07.430Z (8 months ago)
- Topics: bigdata2017, matrix-factorization, random-walk-with-restart, recommender-system
- Language: Python
- Homepage: https://datalab.snu.ac.kr/mfrwr/
- Size: 159 KB
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Readme of mfrwr codes (v1.0)
## Contents
1. Basic information
2. Overview
3. Requirements
4. How to use
1) Input and output
2) How to run
3) How to give parameters
5. Demo example
## 1. Basic information
- Authors: [Haekyu Park](https://haekyu.github.io), [Jinhong Jung](https://datalab.snu.ac.kr/~jinhong/), and [U Kang](https://datalab.snu.ac.kr/~ukang/)
- Program name: MFRWR
- Version: 1.0
- Last updated: 28 Aug 2017
- Main contact: Haekyu Park (hkpark627@snu.ac.kr)
## 2. Overview
This package is a set of implementaions of recommender systems based on matrix factorization and random walk with restart.
These methods are compared in each recommendation scenarios in the following paper: [**A comparative study of matrix factorization and random walk with restart in recommender system.**](https://datalab.snu.ac.kr/mfrwr/resources/mfrwr.pdf)
We suggest 4 matrix factorization methods and 4 random walk with restart methods for the following cases:
- When explicit feedback ratings are given
- When implicit feedback ratings are given
- When bias terms are introduced
- When side information is used
## 3. Requirements
- python 3.*
- numpy
- pandas
- scipy
- We recommend you to use [Anaconda](https://www.continuum.io/downloads).
## 4. How to use
1) Input and output
- Input
- Ratings and side information are able to be given as input.
- All the input should be given as tab-separated files.
- Rating file should have three columns for user_id, item_id, and rating.
- User_id should not overlapped with item_id.
- Side information file should have two columns for user/item_id and value.
- The name of rating files should be 'rating.tsv'.
- Output
- We print out Spearman's rho, precision@k, and recall@k with stdout.
- You may write log generator for the results for yourself.
- Intermediate outputs
- We generate intermediate outputs such as vectors and biases of users and items.
- All the intermediate outputs have their file names under the rule: 'method name_dataset name_parameters_fold.txt'.
2) How to run
- First go to `./code/`.
- You can run the code by typing `python main.py`.
- You can optionally give parameters with two approaches.
* 1) By appending '--argument_type argument_value'.
For example, if you want to run matrix factorization with explicit ratings, and you want to set learning rate = 0.05, lambda = 0.3, and dimension = 5,
you can run the code as follows:
python main.py --method MF_exp --lr 0.05 --lamb 0.3 --dim 5
* 2) By using config
You can make a configuration file that contains all the parameters you give.
Then you can run the code by giving path of the config file as follows:
python main.py --config True --config_path ./myconfig.conf
Rules to make config files are as follows.
- All parameters should be separated with '\n'.
- Argument type and its values for each parameters should be separated with ';'.
For example, a config file to run matrix factorization is as follows.
```
method;MF_exp
dataset;filmtrust
lr;0.05
lamb;0.3
dim;5
```
The parameters you can give are as follows.
| argument_type | default argument_value | details |
|---| ---| ---|
|--dataset | filmtrust | Name of dataset |
|--method | MF_exp | Name of method (*1) |
|--data_path | '../data/' | Where datasets are |
|--input_path | '../data//input/' | Where inputs are |
|--result_path | '../results' | Where intermediate results are |
|--side_paths | [/link.tsv] | List of paths of side info (*2) |
|--entity_types | [['u', 'u']] | List of types of side info (*3) |
|--config | False | Whether to give config file |
|--config_path | '../config/democonfig.conf' | Path of config file |
|--is_implicit | False | Whether implicit data are given |
|--alpha | 0.001 | Coefficient of confidence level in implicit feedback|
|--is_social | False | Whether social links are used |
|--lr | 0.05 | Learning rate |
|--lamb | 0.3 | Regularization parameters |
|--dim | 5 | Dimension of vectors |
|--is_sample | False | Wheter to sample seed users (*4) |
|--num_seed | 300 | # sampled seed users |
|--c | 0.2 | Probability of restart |
|--beta | 0.4 | Probability of walk in RWR_bias |
|--gamma | 0.3 | Probability of restart in RWR_bias |
|--delta | 1.0 | Weight of additional links in RWR_side|
- (*1)
- Name of methods can be one of the followings.
- split5folds, MF_exp, MF_imp, MF_bias, MF_side, RWR_exp, RWR_imp, RWR_bias, RWR_side.
- When 'split5folds' is given, you can split rating data into 5 folds.
The split should be done before running MF/RWR methods.
- (*2)
- You can give file paths of side information with list.
- This should be given by config file.
- For example, if you want to give '../data/movielens/age.tsv' and '../data/movielens/gender.tsv' for side information, you can give --side_paths argument in the config file as follows.
```
side_paths;['../data/movielens/age.tsv', '../data/movielens/gender.tsv']
```
- (*3)
- You should declare types of entities in side information.
- Each type can be one of the followings: 'u', 'i', and 's'.
- 'u' indicates users, 'i' indicates items, and 's' indicates similarity attributes.
- This should be given by config file.
- For example, if you have '../data/movielens/age.tsv' and '../data/movielens/gender.tsv' which have first column for user id and second column for user attribute, a config file can be written as follows.
```
side_paths;['../data/movielens/age.tsv', '../data/movielens/gender.tsv']
entity_types;[['u', 's'], ['u', 's']].
```
- (*4)
- You can sample seed users for RWR methods.
- Sampling options are given because the methods take too much time if many users are included.
## 5. Demo example
You can run MF_exp with filmtrust dataset.
Please run demo.sh by typing `./demo.sh`.