{"id":37080126,"url":"https://github.com/david-cortes/ctpfrec","last_synced_at":"2026-01-14T09:43:32.349Z","repository":{"id":62565833,"uuid":"143035924","full_name":"david-cortes/ctpfrec","owner":"david-cortes","description":"Python implementation of \"Content-based recommendations with poisson factorization\", with some extensions","archived":true,"fork":false,"pushed_at":"2023-07-30T19:03:54.000Z","size":150,"stargazers_count":30,"open_issues_count":1,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-26T22:57:38.568Z","etag":null,"topics":["cold-start","collaborative-topic-factorization","poisson-factorization","topic-modeling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/david-cortes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-31T15:52:35.000Z","updated_at":"2024-02-26T18:20:21.000Z","dependencies_parsed_at":"2022-11-03T16:15:31.838Z","dependency_job_id":null,"html_url":"https://github.com/david-cortes/ctpfrec","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/david-cortes/ctpfrec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fctpfrec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fctpfrec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fctpfrec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fctpfrec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/david-cortes","download_url":"https://codeload.github.com/david-cortes/ctpfrec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fctpfrec/sbom","scorecard":{"id":325841,"data":{"date":"2025-08-11","repo":{"name":"github.com/david-cortes/ctpfrec","commit":"adb48148af7ee74b57cb5275b8cea9eb59945c9a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.9,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"project is archived","details":["Warn: Repository is archived."],"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: BSD 2-Clause \"Simplified\" License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":9,"reason":"1 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2020-73"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T02:28:09.246Z","repository_id":62565833,"created_at":"2025-08-18T02:28:09.246Z","updated_at":"2025-08-18T02:28:09.246Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28416120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:38:59.149Z","status":"ssl_error","status_checked_at":"2026-01-14T08:38:43.588Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cold-start","collaborative-topic-factorization","poisson-factorization","topic-modeling"],"created_at":"2026-01-14T09:43:31.584Z","updated_at":"2026-01-14T09:43:32.344Z","avatar_url":"https://github.com/david-cortes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Collaborative Topic Poisson Factorization\n\nPython implementation of the algorithm for probabilistic matrix factorization described in _Content-based recommendations with poisson factorization (Gopalan, P.K., Charlin, L. and Blei, D., 2014)_.\n\nThis is a statistical model aimed at recommender systems with implicit data consisting of counts of user-item interactions (e.g. clicks by each user on different products) plus bag-of-words representations of the items. The model is fit using mean-field variational inference. Can also fit the model to side information on the users consisting of counts on different attributes (same format as the bag-of-words for items).\n\nAs it takes side information about items, it has the advantage of being able to recommend items without any ratings/clicks/plays/etc. If extending it with user side information, can also make cold-start recommendations, albeit speed is not great for that.\n\nSupports parallelization, different stopping criteria for the optimziation procedure, and adding users/items without refitting the model entirely. The bottleneck computations are written in fast Cython code.\n\nFor a similar package for explicit feedback data see also [cmfrec](https://github.com/david-cortes/cmfrec).\n\nFor Poisson factorization without side information see [hpfrec](https://github.com/david-cortes/hpfrec) and [poismf](https://github.com/david-cortes/poismf).\n\n## Model description\n\nThe model consists in producing non-negative low-rank matrix factorizations of counts data (such as number of times each user played each song in some internet service) of user-item interactions and item-word counts, produced by a generative model specified as follows:\n\n```\nItem model:\nB_vk ~ Gamma(a, b)\nT_ik ~ Gamma(c, d)\nW_iv ~ Poisson(T * B')\n\nInteractions model:\nN_uk ~ Gamma(e, f)\nE_ik ~ Gamma(g, h)\nR_ui ~ Poisson(N * (T + E)')\n```\n\n_(Where `W` is the bag-of-words representation of the items, `R` is the user-item interactions matrix, `u` is the number of users, `i` is the number of items, `v` is the number of words, and `k` is the number of latent factors or topics)_\n\nFor more details see the references section at the bottom.\n\nWhen adding user information, the model becomes as follows:\n\n```\nItem model:\nB_vk ~ Gamma(a, b)\nT_ik ~ Gamma(c, d)\nW_iv ~ Poisson(T * B')\n\nUser model:\nK_ak ~ Gamma(e, f)\nO_uk ~ Gamma(l, m)\nQ_ua ~ Poisson(O * K')\n\nInteractions model:\nN_uk ~ Gamma(i, j)\nE_ik ~ Gamma(g, h)\nR_ui ~ Poisson((O + N) * (T + E)')\n```\n\nA huge drawback of this model compared to LDA is that, as the matrices are non-negative, items with more words will have larger values in their factors/topics, which will result in them having higher scores regardless of their popularity. This effect can be somewhat decreased by using only a limited number of words to represent each item (scaling upwards the ones that don't have enough words), by standardizing the bag-of-words to have all rows summing up to a certain number (this is hard to do when the counts are supposed to be integers, but the package can still work mostly fine with decimals that are at least \u003e= 0.9, and has the option to standardize the inputs), or to a lesser extent by standardizing the resulting Theta shape matrix to have its rows sum to 1 (also supported in the package options).\n\n## Installation\n\n**Note:** requires a C compiler configured for Python. See [this guide](https://github.com/david-cortes/installing-optimized-libraries) for instructions.\n\nPackage is available on PyPI, can be installed with\n\n```pip install ctpfrec```\n\nOr if that fails:\n```\npip install --no-use-pep517 ctpfrec\n```\n\n** *\n**Note for macOS users:** on macOS, the Python version of this package might compile **without** multi-threading capabilities. In order to enable multi-threading support, first install OpenMP:\n```\nbrew install libomp\n```\nAnd then reinstall this package: `pip install --upgrade --no-deps --force-reinstall ctpfrec`.\n** *\n**IMPORTANT:** the setup script will try to add compilation flag `-march=native`. This instructs the compiler to tune the package for the CPU in which it is being installed (by e.g. using AVX instructions if available), but the result might not be usable in other computers. If building a binary wheel of this package or putting it into a docker image which will be used in different machines, this can be overriden either by (a) defining an environment variable `DONT_SET_MARCH=1`, or by (b) manually supplying compilation `CFLAGS` as an environment variable with something related to architecture. For maximum compatibility (but slowest speed), it's possible to do something like this:\n\n```\nexport DONT_SET_MARCH=1\npip install ctpfrec\n```\n\nor, by specifying some compilation flag for architecture:\n```\nexport CFLAGS=\"-march=x86-64\"\npip install ctpfrec\n```\n** *\n\n\n## Sample usage\n\n```python\nimport numpy as np, pandas as pd\nfrom ctpfrec import CTPF\n\n## Generating a fake dataset\nnusers = 10**2\nnitems = 10**2\nnwords = 5 * 10**2\nnobs   = 10**4\nnobs_bag_of_words = 10**4\n\nnp.random.seed(1)\ncounts_df = pd.DataFrame({\n\t'UserId' : np.random.randint(nusers, size=nobs),\n\t'ItemId' : np.random.randint(nitems, size=nobs),\n\t'Count'  : (np.random.gamma(1, 1, size=nobs) + 1).astype('int32')\n\t})\ncounts_df = counts_df.loc[~counts_df[['UserId', 'ItemId']].duplicated()].reset_index(drop=True)\n\nwords_df = pd.DataFrame({\n\t'ItemId' : np.random.randint(nitems, size=nobs_bag_of_words),\n\t'WordId' : np.random.randint(nwords, size=nobs_bag_of_words),\n\t'Count'  : (np.random.gamma(1, 1, size=nobs_bag_of_words) + 1).astype('int32')\n\t})\nwords_df = words_df.loc[~words_df[['ItemId', 'WordId']].duplicated()].reset_index(drop=True)\n\n## Fitting the model\n## (Can also pass the inputs as COO matrices)\nrecommender = CTPF(k = 15, reindex=True)\nrecommender.fit(counts_df=counts_df, words_df=words_df)\n\n## Making predictions\nrecommender.topN(user=10, n=10, exclude_seen=True)\nrecommender.topN(user=10, n=10, exclude_seen=False, items_pool=np.array([1,2,3,4]))\nrecommender.predict(user=10, item=11)\nrecommender.predict(user=[10,10,10], item=[1,2,3])\nrecommender.predict(user=[10,11,12], item=[4,5,6])\n\n## Evaluating Poisson log-likelihood\nrecommender.eval_llk(counts_df, full_llk=True)\n\n## Adding new items without refitting\nnitems_new = 10\nnobs_bow_new = 2 * 10**3\nnp.random.seed(5)\nwords_df_new = pd.DataFrame({\n\t'ItemId' : np.random.uniform(low=nitems, high=nitems+nitems_new, size=nobs_bow_new),\n\t'WordId' : np.random.randint(nwords, size=nobs_bow_new),\n\t'Count' : np.random.gamma(1, 1, size=nobs_bow_new).astype('int32')\n\t})\nwords_df_new = words_df_new.loc[words_df_new.Count \u003e 0]\n\nrecommender.add_items(words_df_new)\n```\n\nIf passing `reindex=True`, all user and item IDs that you pass to `.fit` will be reindexed internally (they need to be hashable types like `str`, `int` or `tuple`), and you  can use these same IDs to make predictions later. The IDs returned by `topN` are these same IDs passed to `.fit` too.\n\nFor a more detailed example, see the IPython notebook [recommending products with RetailRocket's event logs](http://nbviewer.jupyter.org/github/david-cortes/ctpfrec/blob/master/example/ctpfrec_retailrocket.ipynb) illustrating its usage with the RetailRocket dataset consisting of activity logs (view, add-to-basket, purchase) and item descriptions.\n\n## Documentation\n\nDocumentation is available at readthedocs: [http://ctpfrec.readthedocs.io](http://ctpfrec.readthedocs.io/en/latest/)\n\nIt is also internally documented through docstrings (e.g. you can try `help(ctpfrec.CTPF))`, `help(ctpfrec.CTPF.fit)`, etc.\n\n## Speeding up optimization procedure\n\nFor faster fitting and predictions, use SciPy and NumPy libraries compiled against MKL or OpenBLAS. These come by default with MKL in Anaconda installations.\n\nThe constructor for CTPF allows some parameters to make it run faster (if you know what you're doing): these are `allow_inconsistent_math=True`, `full_llk=False`, `stop_crit='diff-norm'`, `reindex=False`, `verbose=False`. See the documentation for more details.\n\n## Saving model with pickle\n\nDon't use `pickle` to save an `CTPF` object, as it will fail due to problems with lambda functions. Use `dill` instead, which has the same syntax as pickle:\n\n```python\nimport dill\nfrom ctpfrec import CTPF\n\nc = CTPF()\ndill.dump(c, open(\"CTPF_obj.dill\", \"wb\"))\nc = dill.load(open(\"CTPF_obj.dill\", \"rb\"))\n```\n\n## References\n[1] Gopalan, Prem K., Laurent Charlin, and David Blei. \"Content-based recommendations with poisson factorization.\" Advances in Neural Information Processing Systems. 2014.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-cortes%2Fctpfrec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavid-cortes%2Fctpfrec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-cortes%2Fctpfrec/lists"}