{"id":18439503,"url":"https://github.com/idiap/cbrec","last_synced_at":"2025-04-07T21:32:35.832Z","repository":{"id":24504694,"uuid":"27910248","full_name":"idiap/cbrec","owner":"idiap","description":"Content-based Recommendation Generator","archived":false,"fork":false,"pushed_at":"2015-01-21T10:25:22.000Z","size":262,"stargazers_count":13,"open_issues_count":0,"forks_count":8,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-23T01:02:27.458Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.txt","changelog":null,"contributing":null,"funding":null,"license":"COPYING.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-12-12T07:31:34.000Z","updated_at":"2023-10-19T10:33:15.000Z","dependencies_parsed_at":"2022-07-13T23:50:41.928Z","dependency_job_id":null,"html_url":"https://github.com/idiap/cbrec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fcbrec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fcbrec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fcbrec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fcbrec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/cbrec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247732700,"owners_count":20986907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T06:25:11.642Z","updated_at":"2025-04-07T21:32:30.818Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"############################################################################\n\n              Content-Based Recommendation Generator (CBRec v1.0)      \n\n############################################################################\n\n\nREADME:\n=======\nA Python library which generates content-based recommendations for a set of \nitems described by textual metadata using four possible vector space methods,\nnamely TF-IDF, LSI, RP and LDA. The library can be used in command line or \ndirectly in a Python program. It takes as input a JSON file which contains\nan array of hashes that describe the metadata of items and generates an out-\nput JSON file which contains the same item hashes augmented with two more att-\nributes, namely (i) rec attribute which contains the top-N recommendations for \neach item, represented by an array of item IDs and (ii) rec_scores attribute\nwhich contains the top-N similarity scores, represented by an array of float\nnumbers.\n\nFILES:\n======\nThe library contains the following files:\n   \n    data.py           Data class for items (text extraction, preprocessing)\n    vector_space.py   Vector space class supporting TF-IDF, LSI, RP and LDA\n    generate.py       Main class responsible for genereting recommendations\n    utils.py          Unbuffered stdout class\n    example.json      Example JSON file with 1000 TED talks\n\nUSAGE:\n======\nUsage:\n    generate.py --input=\u003cpath\u003e --output=\u003cpath\u003e [options]\n\nOptions:\n    -v, --version                      show program's version number and exit\n    -h, --help                         show this help message and exit\n    -d, --debug                        print status and debug messages [default: False]\n    -r, --display                      display recommendations per item [default: False]\n    -i, --input=\u003cpath\u003e                 path to JSON file to be used as input\n    -o, --output=\u003cpath\u003e                path to JSON file to be used as output\n    --extract=\u003cattributes\u003e             comma separated JSON attributes to be used [default: All]\n    --preprocess                       whether to preprocess text or not  [default: False]\n    --method=\u003cTFIDF|LSI|RP|LDA\u003e        vector space method to represent the items [default: LSI]\n    --k=\u003cinteger\u003e                      number of topics for LSI, RP and LDA [default: 100]\n    --N=\u003cinteger\u003e                      number of recommendations [default: 5]\n\nEXAMPLE:\n========\n$ python generate.py --input=example.json --output=out.json --debug\n{'--N': '5',\n '--debug': True,\n '--display': False,\n '--extract': 'All',\n '--help': False,\n '--input': 'example.json',\n '--k': '100',\n '--method': 'LSI',\n '--output': 'out.json',\n '--preprocess': False,\n '--version': False}\n[+] Loading items:\n    -\u003e Extracting text................................[OK]\n[+] Creating the vector space:\n    -\u003e Computing the dictionary.......................[OK]\n    -\u003e Creating the bag-of-words space................[OK]\n    -\u003e Creating the LSI space.........................[OK]\n[+] Generating recommendations........................[OK]\n[+] Saving to output file.............................[OK]\n[x] Finished.\n\n$ python generate.py --input=example.json --output=out.json --debug --preprocess --N=10 --extract=title,description\n{'--N': '10',                                                                                                                           \n '--debug': True,                                                                                                                       \n '--display': False,                                                                                                                    \n '--extract': 'title,description',                                                                                                      \n '--help': False,                                                                                                                       \n '--input': 'example.json',                                                                                                             \n '--k': '100',                                                                                                                          \n '--method': 'LSI',                                                                                                                     \n '--output': 'out.json',                                                                                                                \n '--preprocess': True,                                                                                                                  \n '--version': False}                                                                                                                    \n[+] Loading items:                                                                                                                      \n    -\u003e Extracting text................................[OK]                                                                              \n    -\u003e Preprocessing text.............................[OK]\n[+] Creating the vector space:\n    -\u003e Computing the dictionary.......................[OK]\n    -\u003e Creating the bag-of-words space................[OK]\n    -\u003e Creating the LSI space.........................[OK]\n[+] Generating recommendations........................[OK]\n[+] Saving to output file.............................[OK]\n[x] Finished.\n\n\nDEPENDENCIES:\n============\n1) Install python: http://www.python.org/getit/\n2) Install pip: http://www.pip-installer.org/en/latest/installing.html\n3) Then:\n$ pip install docopt\n$ pip install json\n$ pip install pyyaml\n$ pip install numpy\n$ pip install scipy\n$ pip install gensim\n$ pip install nltk\n$ python\n\u003e\u003e\u003e import nltk\n\u003e\u003e\u003e nltk.download()\n\nTROUBLESHOOTING:\n================ \nQ: How can I use the library with items stored in other formats than JSON?\nA: You have to convert your file to JSON.\nQ: How can I use the library directly with an item hash?\nA: Simply import the library in Python and initialize a generator object with \n   the item hash of your preference.\nQ: Is there any attribute that is required to be present in the item metadata?\nA: Yes the 'id' attribute is mandatory.\n\nCONTACT:\n========\nNikolaos Pappas \nIdiap Research Institute\nCentre du Parc, \nCH 1920 Martigny, \nSwitzerland\nE-mail:  nikolaos.pappas@idiap.ch \nWebsite: http://people.idiap.ch/npappas/ \n\n\n---\nLast update:\n16 Dec, 2013","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fcbrec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fcbrec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fcbrec/lists"}