https://github.com/harttle/opencf
An implementation for collaborative filtering system
https://github.com/harttle/opencf
boost collaborative-filtering
Last synced: 6 months ago
JSON representation
An implementation for collaborative filtering system
- Host: GitHub
- URL: https://github.com/harttle/opencf
- Owner: harttle
- License: gpl-2.0
- Created: 2014-06-12T16:28:32.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2014-12-25T05:45:40.000Z (over 11 years ago)
- Last Synced: 2025-01-29T15:13:17.938Z (over 1 year ago)
- Topics: boost, collaborative-filtering
- Language: TeX
- Homepage:
- Size: 2.17 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
OpenCF
======
Here is an implementation of collaborative filtering system, which is the most popular algorithm for recommender systems.
More information can be obtained from the slider: report/OpenCF-report.pdf
OpenCF implemented both user-based CF and item-based CF, and optimizing methods like:
1. Row normalization for Rating matrix.
2. Similarity functions:
* raw cosine
* adjusted cosine
* pearson correlation
3. Similarity Summing:
* Direct summing
* Normalized(1-order) similarity summing
* Probability similarity summing
# Compact ID
`compact` is used to compact discontinuous user-id and item-id.
```bash
# compact data/uir to data/uir.compact, -U and -I specifies mapping file names, which is used to restore ids.
./compact -f data/uir -o data/uir.compact -U data/user.map -I data/item.map
# help
./compact -h
```
# Similarity computing
`similarity` computes similarity matrix, with various methods.
```bash
# help
./similarity -h
# raw cosine similarity
./similarity -f data/uir.compact -o data/ii.cos
# adjusted cosine similarity
./similarity -a 1 data/uir.compact -o data/ii.acos
# pearson correlation similarity
./similarity -a 2 data/uir.compact -o data/ii.corr
```
The data format `similarity` accept is:
```
userid1 itemid1 rating1
userid2 itemid2 rating2
...
```
where `userid` and `itemid` are `int` compatible, `rating`s are `float` compatible.
# Prediction
`prediction` uses similarity-file and rating-file to compute prediction matrix.
```bash
./predict -s data/ii.cos -f data/uir.compact -o data/uip.ii.compact
```
# Restore IDs
Use `compact -r` to restore user/item ids, mapping-files should be specified.
```bash
./compact -r -f data/uip.ii.compact -o data/uip.ii -U data/user.map -I data/item.map
```
# Tools
## Rating
Dataset `data/train` and `data/test` are extracted from `data/t_alibaba_data.csv` manually. The format in `train` is not compatible. `rating` is used to generate compatible rating-file from data-files with this format:
```
userid1 itemid1 operation1 month1 day1
userid2 itemid2 operation2 month2 day2
...
```
```bash
# generate compatible rating-file: data/uir
rating/rating -i data/train -o data/uir -l data/dealdate.last -c data/deal.count
# help
rating/rating
```
## Post processing
`postprocess` is used to de-emphasize items that already purchased by user.
```bash
postprocess/postprocess -p data/uip.ii -l data/dealdate.last -c data/deal.count -o data/uip.ii.mod
```
## Evaluation
```bash
# prepare test-file, generates data/test.ui
evaluate/prepare_test.sh data/test
# sort and filter prediction-file data/uip.ii to data/ii.ui
evaluate/sort_prediction.sh data/uip.ii
# evaluate sorted-prediction-file data/uip.ii.ui
evaluate/evaluate -p data/uip.ii.ui -t data/test.ui -o data/ii.curve
```