https://github.com/idiap/cbrec

Content-based Recommendation Generator
https://github.com/idiap/cbrec

Last synced: about 1 year ago
JSON representation

Content-based Recommendation Generator

Host: GitHub
URL: https://github.com/idiap/cbrec
Owner: idiap
License: gpl-3.0
Created: 2014-12-12T07:31:34.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2015-01-21T10:25:22.000Z (over 11 years ago)
Last Synced: 2025-03-23T01:02:27.458Z (about 1 year ago)
Language: Python
Size: 256 KB
Stars: 13
Watchers: 5
Forks: 8
Open Issues: 0
Metadata Files:
- Readme: README.txt
- License: COPYING.txt

Awesome Lists containing this project

README

          ############################################################################

              Content-Based Recommendation Generator (CBRec v1.0)      

############################################################################

README:

=======

A Python library which generates content-based recommendations for a set of 

items described by textual metadata using four possible vector space methods,

namely TF-IDF, LSI, RP and LDA. The library can be used in command line or 

directly in a Python program. It takes as input a JSON file which contains

an array of hashes that describe the metadata of items and generates an out-

put JSON file which contains the same item hashes augmented with two more att-

ributes, namely (i) rec attribute which contains the top-N recommendations for 

each item, represented by an array of item IDs and (ii) rec_scores attribute

which contains the top-N similarity scores, represented by an array of float

numbers.

FILES:

======

The library contains the following files:

   

    data.py           Data class for items (text extraction, preprocessing)

    vector_space.py   Vector space class supporting TF-IDF, LSI, RP and LDA

    generate.py       Main class responsible for genereting recommendations

    utils.py          Unbuffered stdout class

    example.json      Example JSON file with 1000 TED talks

USAGE:

======

Usage:

    generate.py --input= --output= [options]

Options:

    -v, --version                      show program's version number and exit

    -h, --help                         show this help message and exit

    -d, --debug                        print status and debug messages [default: False]

    -r, --display                      display recommendations per item [default: False]

    -i, --input=                 path to JSON file to be used as input

    -o, --output=                path to JSON file to be used as output

    --extract=             comma separated JSON attributes to be used [default: All]

    --preprocess                       whether to preprocess text or not  [default: False]

    --method=        vector space method to represent the items [default: LSI]

    --k=                      number of topics for LSI, RP and LDA [default: 100]

    --N=                      number of recommendations [default: 5]

EXAMPLE:

========

$ python generate.py --input=example.json --output=out.json --debug

{'--N': '5',

 '--debug': True,

 '--display': False,

 '--extract': 'All',

 '--help': False,

 '--input': 'example.json',

 '--k': '100',

 '--method': 'LSI',

 '--output': 'out.json',

 '--preprocess': False,

 '--version': False}

[+] Loading items:

    -> Extracting text................................[OK]

[+] Creating the vector space:

    -> Computing the dictionary.......................[OK]

    -> Creating the bag-of-words space................[OK]

    -> Creating the LSI space.........................[OK]

[+] Generating recommendations........................[OK]

[+] Saving to output file.............................[OK]

[x] Finished.

$ python generate.py --input=example.json --output=out.json --debug --preprocess --N=10 --extract=title,description

{'--N': '10',                                                                                                                           

 '--debug': True,                                                                                                                       

 '--display': False,                                                                                                                    

 '--extract': 'title,description',                                                                                                      

 '--help': False,                                                                                                                       

 '--input': 'example.json',                                                                                                             

 '--k': '100',                                                                                                                          

 '--method': 'LSI',                                                                                                                     

 '--output': 'out.json',                                                                                                                

 '--preprocess': True,                                                                                                                  

 '--version': False}                                                                                                                    

[+] Loading items:                                                                                                                      

    -> Extracting text................................[OK]                                                                              

    -> Preprocessing text.............................[OK]

[+] Creating the vector space:

    -> Computing the dictionary.......................[OK]

    -> Creating the bag-of-words space................[OK]

    -> Creating the LSI space.........................[OK]

[+] Generating recommendations........................[OK]

[+] Saving to output file.............................[OK]

[x] Finished.

DEPENDENCIES:

============

1) Install python: http://www.python.org/getit/

2) Install pip: http://www.pip-installer.org/en/latest/installing.html

3) Then:

$ pip install docopt

$ pip install json

$ pip install pyyaml

$ pip install numpy

$ pip install scipy

$ pip install gensim

$ pip install nltk

$ python

>>> import nltk

>>> nltk.download()

TROUBLESHOOTING:

================ 

Q: How can I use the library with items stored in other formats than JSON?

A: You have to convert your file to JSON.

Q: How can I use the library directly with an item hash?

A: Simply import the library in Python and initialize a generator object with 

   the item hash of your preference.

Q: Is there any attribute that is required to be present in the item metadata?

A: Yes the 'id' attribute is mandatory.

CONTACT:

========

Nikolaos Pappas 

Idiap Research Institute

Centre du Parc, 

CH 1920 Martigny, 

Switzerland

E-mail:  nikolaos.pappas@idiap.ch 

Website: http://people.idiap.ch/npappas/ 

---

Last update:

16 Dec, 2013

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/idiap/cbrec

Awesome Lists containing this project

README