{"id":17143786,"url":"https://github.com/harttle/opencf","last_synced_at":"2026-01-04T19:49:16.283Z","repository":{"id":17853485,"uuid":"20774722","full_name":"harttle/OpenCF","owner":"harttle","description":"An implementation for collaborative filtering system","archived":false,"fork":false,"pushed_at":"2014-12-25T05:45:40.000Z","size":2280,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-29T15:13:17.938Z","etag":null,"topics":["boost","collaborative-filtering"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harttle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-06-12T16:28:32.000Z","updated_at":"2018-06-11T01:20:22.000Z","dependencies_parsed_at":"2022-09-01T00:51:48.283Z","dependency_job_id":null,"html_url":"https://github.com/harttle/OpenCF","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harttle%2FOpenCF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harttle%2FOpenCF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harttle%2FOpenCF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harttle%2FOpenCF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harttle","download_url":"https://codeload.github.com/harttle/OpenCF/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245248004,"owners_count":20584459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["boost","collaborative-filtering"],"created_at":"2024-10-14T20:42:18.877Z","updated_at":"2026-01-04T19:49:16.244Z","avatar_url":"https://github.com/harttle.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"OpenCF\n======\n\nHere is an implementation of collaborative filtering system, which is the most popular algorithm for recommender systems.\n\nMore information can be obtained from the slider: report/OpenCF-report.pdf\n\nOpenCF implemented both user-based CF and item-based CF, and optimizing methods like:\n\n1. Row normalization for Rating matrix.\n2. Similarity functions:\n    * raw cosine\n    * adjusted cosine\n    * pearson correlation\n3. Similarity Summing:\n    * Direct summing\n    * Normalized(1-order) similarity summing\n    * Probability similarity summing\n\n# Compact ID\n\n`compact` is used to compact discontinuous user-id and item-id.\n\n```bash\n# compact data/uir to data/uir.compact, -U and -I specifies mapping file names, which is used to restore ids.\n./compact -f data/uir -o data/uir.compact -U data/user.map -I data/item.map\n\n# help\n./compact -h\n```\n\n# Similarity computing\n\n`similarity` computes similarity matrix, with various methods. \n\n```bash\n# help\n./similarity -h\n\n# raw cosine similarity\n./similarity -f data/uir.compact -o data/ii.cos\n\n# adjusted cosine similarity\n./similarity -a 1 data/uir.compact -o data/ii.acos\n\n# pearson correlation similarity\n./similarity -a 2 data/uir.compact -o data/ii.corr\n```\n\nThe data format `similarity` accept is:\n\n```\nuserid1 itemid1 rating1\nuserid2 itemid2 rating2\n...\n```\n\nwhere `userid` and `itemid` are `int` compatible, `rating`s are `float` compatible.\n\n# Prediction\n\n`prediction` uses similarity-file and rating-file to compute prediction matrix.\n\n```bash\n./predict -s data/ii.cos -f data/uir.compact -o data/uip.ii.compact\n```\n\n# Restore IDs\n\nUse `compact -r` to restore user/item ids, mapping-files should be specified.\n\n```bash\n./compact -r -f data/uip.ii.compact -o data/uip.ii -U data/user.map -I data/item.map \n```\n\n# Tools\n\n## Rating \n\nDataset `data/train` and `data/test` are extracted from `data/t_alibaba_data.csv` manually. The format in `train` is not compatible. `rating` is used to generate compatible rating-file from data-files with this format:\n\n```\nuserid1 itemid1 operation1  month1  day1\nuserid2 itemid2 operation2  month2  day2\n...\n```\n\n```bash\n# generate compatible rating-file: data/uir\nrating/rating -i data/train -o data/uir -l data/dealdate.last -c data/deal.count\n\n# help\nrating/rating\n```\n\n## Post processing\n\n`postprocess` is used to de-emphasize items that already purchased by user.\n\n```bash\npostprocess/postprocess -p data/uip.ii -l data/dealdate.last -c data/deal.count -o data/uip.ii.mod\n```\n\n## Evaluation\n\n```bash\n# prepare test-file, generates data/test.ui\nevaluate/prepare_test.sh data/test\n\n# sort and filter prediction-file data/uip.ii to data/ii.ui\nevaluate/sort_prediction.sh data/uip.ii\n\n# evaluate sorted-prediction-file data/uip.ii.ui\nevaluate/evaluate -p data/uip.ii.ui -t data/test.ui -o data/ii.curve\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharttle%2Fopencf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharttle%2Fopencf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharttle%2Fopencf/lists"}