https://github.com/idiap/cbrec
Content-based Recommendation Generator
https://github.com/idiap/cbrec
Last synced: about 1 year ago
JSON representation
Content-based Recommendation Generator
- Host: GitHub
- URL: https://github.com/idiap/cbrec
- Owner: idiap
- License: gpl-3.0
- Created: 2014-12-12T07:31:34.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-01-21T10:25:22.000Z (over 11 years ago)
- Last Synced: 2025-03-23T01:02:27.458Z (about 1 year ago)
- Language: Python
- Size: 256 KB
- Stars: 13
- Watchers: 5
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.txt
- License: COPYING.txt
Awesome Lists containing this project
README
############################################################################
Content-Based Recommendation Generator (CBRec v1.0)
############################################################################
README:
=======
A Python library which generates content-based recommendations for a set of
items described by textual metadata using four possible vector space methods,
namely TF-IDF, LSI, RP and LDA. The library can be used in command line or
directly in a Python program. It takes as input a JSON file which contains
an array of hashes that describe the metadata of items and generates an out-
put JSON file which contains the same item hashes augmented with two more att-
ributes, namely (i) rec attribute which contains the top-N recommendations for
each item, represented by an array of item IDs and (ii) rec_scores attribute
which contains the top-N similarity scores, represented by an array of float
numbers.
FILES:
======
The library contains the following files:
data.py Data class for items (text extraction, preprocessing)
vector_space.py Vector space class supporting TF-IDF, LSI, RP and LDA
generate.py Main class responsible for genereting recommendations
utils.py Unbuffered stdout class
example.json Example JSON file with 1000 TED talks
USAGE:
======
Usage:
generate.py --input= --output= [options]
Options:
-v, --version show program's version number and exit
-h, --help show this help message and exit
-d, --debug print status and debug messages [default: False]
-r, --display display recommendations per item [default: False]
-i, --input= path to JSON file to be used as input
-o, --output= path to JSON file to be used as output
--extract= comma separated JSON attributes to be used [default: All]
--preprocess whether to preprocess text or not [default: False]
--method= vector space method to represent the items [default: LSI]
--k= number of topics for LSI, RP and LDA [default: 100]
--N= number of recommendations [default: 5]
EXAMPLE:
========
$ python generate.py --input=example.json --output=out.json --debug
{'--N': '5',
'--debug': True,
'--display': False,
'--extract': 'All',
'--help': False,
'--input': 'example.json',
'--k': '100',
'--method': 'LSI',
'--output': 'out.json',
'--preprocess': False,
'--version': False}
[+] Loading items:
-> Extracting text................................[OK]
[+] Creating the vector space:
-> Computing the dictionary.......................[OK]
-> Creating the bag-of-words space................[OK]
-> Creating the LSI space.........................[OK]
[+] Generating recommendations........................[OK]
[+] Saving to output file.............................[OK]
[x] Finished.
$ python generate.py --input=example.json --output=out.json --debug --preprocess --N=10 --extract=title,description
{'--N': '10',
'--debug': True,
'--display': False,
'--extract': 'title,description',
'--help': False,
'--input': 'example.json',
'--k': '100',
'--method': 'LSI',
'--output': 'out.json',
'--preprocess': True,
'--version': False}
[+] Loading items:
-> Extracting text................................[OK]
-> Preprocessing text.............................[OK]
[+] Creating the vector space:
-> Computing the dictionary.......................[OK]
-> Creating the bag-of-words space................[OK]
-> Creating the LSI space.........................[OK]
[+] Generating recommendations........................[OK]
[+] Saving to output file.............................[OK]
[x] Finished.
DEPENDENCIES:
============
1) Install python: http://www.python.org/getit/
2) Install pip: http://www.pip-installer.org/en/latest/installing.html
3) Then:
$ pip install docopt
$ pip install json
$ pip install pyyaml
$ pip install numpy
$ pip install scipy
$ pip install gensim
$ pip install nltk
$ python
>>> import nltk
>>> nltk.download()
TROUBLESHOOTING:
================
Q: How can I use the library with items stored in other formats than JSON?
A: You have to convert your file to JSON.
Q: How can I use the library directly with an item hash?
A: Simply import the library in Python and initialize a generator object with
the item hash of your preference.
Q: Is there any attribute that is required to be present in the item metadata?
A: Yes the 'id' attribute is mandatory.
CONTACT:
========
Nikolaos Pappas
Idiap Research Institute
Centre du Parc,
CH 1920 Martigny,
Switzerland
E-mail: nikolaos.pappas@idiap.ch
Website: http://people.idiap.ch/npappas/
---
Last update:
16 Dec, 2013