{"id":17277824,"url":"https://github.com/florianwilhelm/lda4rec","last_synced_at":"2025-09-15T09:27:04.748Z","repository":{"id":44615466,"uuid":"348248483","full_name":"FlorianWilhelm/lda4rec","owner":"FlorianWilhelm","description":"🧮  Extended Latent Dirichlet Allocation for Collaborative Filtering in Recommender Systems.","archived":false,"fork":false,"pushed_at":"2022-05-16T20:15:06.000Z","size":3274,"stargazers_count":42,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-29T15:15:20.486Z","etag":null,"topics":["collaborative-filtering","explainability","interpretability","python","recommender-system"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FlorianWilhelm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-16T07:12:26.000Z","updated_at":"2025-05-22T08:15:51.000Z","dependencies_parsed_at":"2022-09-07T00:41:32.032Z","dependency_job_id":null,"html_url":"https://github.com/FlorianWilhelm/lda4rec","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/FlorianWilhelm/lda4rec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Flda4rec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Flda4rec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Flda4rec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Flda4rec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FlorianWilhelm","download_url":"https://codeload.github.com/FlorianWilhelm/lda4rec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorianWilhelm%2Flda4rec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275234537,"owners_count":25428647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-15T02:00:09.272Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborative-filtering","explainability","interpretability","python","recommender-system"],"created_at":"2024-10-15T09:10:06.332Z","updated_at":"2025-09-15T09:27:04.694Z","avatar_url":"https://github.com/FlorianWilhelm.png","language":"Jupyter Notebook","readme":"# LDA4Rec / LDAext\n\n![LDA4Rec](docs/gfx/lda4rec_601x132.png?raw=true)\n\n[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)\n\nAccompanying source code to the paper \"Matrix Factorization for Collaborative Filtering is just Solving an\nAdjoint Latent Dirichlet Allocation Model After All\" by Florian Wilhelm and \"An Interpretable Model for Collaborative Filtering Using\nan Extended Latent Dirichlet Allocation Approach\" by Florian Wilhelm, Marisa Mohr and Lien Michiels. Check out\ngit tag v1.0 for the former and v2.0 for the latter.\n\nThe preprint of \"Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model After All\"\ncan be found [here](docs/lda4rec_fwilhelm_prepint.pdf) along with the following statement:\n\n\u003e \"© Florian Wilhelm 2021. This is the author's version of the work. It is posted here for\nyour personal use. Not for redistribution. The definitive version was published\nin RecSys '21: Fifteenth ACM Conference on Recommender Systems Proceedings, https://doi.org/10.1145/3460231.3474266.\"\n\nThe preprint of \"An Interpretable Model for Collaborative Filtering Using an Extended Latent Dirichlet Allocation Approach\"\ncan be found [here](docs/ldaext_fwilhelm_preprint.pdf) and the final paper [here](https://journals.flvc.org/FLAIRS/article/view/130567).\n\n## Installation\n\nIn order to set up the necessary environment:\n\n1. review and uncomment what you need in `environment.yml` and create an environment `lda4rec` with the help of [conda]:\n   ```\n   conda env create -f environment.yml\n   ```\n2. activate the new environment with:\n   ```\n   conda activate lda4rec\n   ```\n3. (optionally) get a free [neptune.ai] account for experiment tracking and save the api token\n   under `~/.neptune_api_token` (default).\n\n## Running Experiments\n\nFirst check out and adapt the default experiment config `configs/default.yaml` and run it with:\n```\nlda4rec -c configs/default.yaml run\n```\nA config like `configs/default.yaml` can also be used as a template to create an experiment set with:\n```\nlda4rec -c configs/default.yaml create\n```\nCheck out `cli.py` for more details.\n\n\n## Cloud Setup\n\nCommands for setting up an Ubuntu 20.10 VM with at least 20 GiB of HD on e.g. a GCP c2-standard-30 instance:\n```\ntmux\nsudo apt-get install -y build-essential\ncurl https://sh.rustup.rs -sSf | sh\nsource $HOME/.cargo/env\ncargo install pueue\ncurl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O\nsh Miniconda3-latest-Linux-x86_64.sh\nsource ~/.bashrc\ngit clone https://github.com/FlorianWilhelm/lda4rec.git\ncd lda4rec\nconda env create -f environment.yml\nconda activate lda4rec\nvim ~/.neptune_api_token # and copy it over\n```\nThen create and run all experiments for full control over parallelism with [pueue]:\n```\npueued -d # only once to start the daemon\npueue parallel 10\nexport OMP_NUM_THREADS=4  # to limit then number of threads per model\nlda4rec -c configs/default.yaml create # to create the config files\nfind ./configs -maxdepth 1 -name \"exp_*.yaml\" -exec pueue add \"lda4rec -c {} run\" \\; -exec sleep 30 \\;\n```\nRemark: `-exec sleep 30` avoids race condition when reading datasets if parallelism is too high.\n\n\n## Dependency Management \u0026 Reproducibility\n\n1. Always keep your abstract (unpinned) dependencies updated in `environment.yml` and eventually\n   in `setup.cfg` if you want to ship and install your package via `pip` later on.\n2. Create concrete dependencies as `environment.lock.yml` for the exact reproduction of your\n   environment with:\n   ```bash\n   conda env export -n lda4rec -f environment.lock.yml\n   ```\n   For multi-OS development, consider using `--no-builds` during the export.\n3. Update your current environment with respect to a new `environment.lock.yml` using:\n   ```bash\n   conda env update -f environment.lock.yml --prune\n   ```\n## Project Organization\n\n```\n├── AUTHORS.md              \u003c- List of developers and maintainers.\n├── CHANGELOG.md            \u003c- Changelog to keep track of new features and fixes.\n├── LICENSE.txt             \u003c- License as chosen on the command-line.\n├── README.md               \u003c- The top-level README for developers.\n├── configs                 \u003c- Directory for configurations of model \u0026 application.\n├── data                    \u003c- Downloaded datasets will be stored here.\n├── docs                    \u003c- Directory for Sphinx documentation in rst or md.\n├── environment.yml         \u003c- The conda environment file for reproducibility.\n├── notebooks               \u003c- Jupyter notebooks. Naming convention is a number (for\n│                              ordering), the creator's initials and a description,\n│                              e.g. `1.0-fw-initial-data-exploration`.\n├── logs                    \u003c- Generated logs are collected here.\n├── results                 \u003c- Results as exported from neptune.ai.\n├── setup.cfg               \u003c- Declarative configuration of your project.\n├── setup.py                \u003c- Use `python setup.py develop` to install for development or\n│                              or create a distribution with `python setup.py bdist_wheel`.\n├── src\n│   └── lda4rec             \u003c- Actual Python package where the main functionality goes.\n├── tests                   \u003c- Unit tests which can be run with `py.test`.\n├── .coveragerc             \u003c- Configuration for coverage reports of unit tests.\n├── .isort.cfg              \u003c- Configuration for git hook that sorts imports.\n└── .pre-commit-config.yaml \u003c- Configuration of pre-commit git hooks.\n```\n\n## How to Cite\n\nPlease cite LDA4Rec/LDAext if it helps your research. You can use the following BibTeX entry:\n\n```\n@inproceedings{wilhelm2021lda4rec,\nauthor = {Wilhelm, Florian},\ntitle = {Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All},\nyear = {2021},\nmonth = sep,\nisbn = {978-1-4503-8458-2/21/09},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3460231.3474266},\ndoi = {10.1145/3460231.3474266},\nbooktitle = {Fifteenth ACM Conference on Recommender Systems},\nnumpages = {8},\nlocation = {Amsterdam, Netherlands},\nseries = {RecSys '21}\n}\n@article{Wilhelm_Mohr_Michiels_2022, \ntitle={An Interpretable Model for Collaborative Filtering Using an Extended Latent Dirichlet Allocation Approach}, \nvolume={35}, \nurl={https://journals.flvc.org/FLAIRS/article/view/130567}, \nDOI={10.32473/flairs.v35i.130567}, \nabstractNote={With the increasing use of AI and ML-based systems, interpretability is becoming an increasingly important issue to ensure user trust and safety. This also applies to the area of recommender systems, where methods based on matrix factorization (MF) are among the most popular methods for collaborative filtering tasks with implicit feedback. Despite their simplicity, the latent factors of users and items lack interpretability in the case of the effective, unconstrained MF-based methods. In this work, we propose an extended latent Dirichlet Allocation model (LDAext) that has interpretable parameters such as user cohorts of item preferences and the affiliation of a user with different cohorts. We prove a theorem on how to transform the factors of an unconstrained MF model into the parameters of LDAext. Using this theoretical connection, we train an MF model on different real-world data sets, transform the latent factors into the parameters of LDAext and test their interpretation in several experiments for plausibility. Our experiments confirm the interpretability of the transformed parameters and thus demonstrate the usefulness of our proposed approach.}, \njournal={The International FLAIRS Conference Proceedings}, \nauthor={Wilhelm, Florian and Mohr, Marisa and Michiels, Lien}, \nyear={2022}, \nmonth={May} \n}\n```\n\n## License\n\nThis sourcecode is [AGPL-3-only](LICENSE.txt) licensed. If you require a more permissive licence, e.g. for\ncommercial reasons, contact me to obtain a licence for your business.\n\n\u003c!-- pyscaffold-notes --\u003e\n\n## Acknowledgement\n\nSpecial thanks goes to [Du Phan](https://github.com/fehiepsi) and [Fritz Obermeyer](https://github.com/fritzo) from the [(Num)Pyro](https://github.com/pyro-ppl) project for their kind help and helpful comments on my code.\n\n## Note\n\nThis project has been set up using [PyScaffold] 4.0 and the [dsproject extension] 0.6.\nSome source code was taken from [Spotlight] (MIT-licensed) by Maciej Kula as well as [lrann] (MIT-licensed) by\nFlorian Wilhelm and Marcel Kurovski.\n\n[PyScaffold]: https://pyscaffold.org/\n[conda]: https://docs.conda.io/\n[pre-commit]: https://pre-commit.com/\n[Jupyter]: https://jupyter.org/\n[nbstripout]: https://github.com/kynan/nbstripout\n[Google style]: http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings\n[dsproject extension]: https://github.com/pyscaffold/pyscaffoldext-dsproject\n[pueue]: https://github.com/Nukesor/pueue\n[neptune.ai]: https://neptune.ai/\n[Spotlight]: https://github.com/maciejkula/spotlight\n[lrann]: https://github.com/FlorianWilhelm/lrann\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorianwilhelm%2Flda4rec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflorianwilhelm%2Flda4rec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorianwilhelm%2Flda4rec/lists"}