{"id":34024090,"url":"https://github.com/glami/sansa","last_synced_at":"2026-04-02T02:08:47.291Z","repository":{"id":180534407,"uuid":"665111839","full_name":"glami/sansa","owner":"glami","description":"SANSA - sparse EASE for millions of items","archived":false,"fork":false,"pushed_at":"2025-11-21T19:15:11.000Z","size":1799,"stargazers_count":44,"open_issues_count":2,"forks_count":6,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-12-15T05:30:10.341Z","etag":null,"topics":["approximate-inverse","collaborative-filtering","recommender-system","sparse-autoencoder","sparse-matrix"],"latest_commit_sha":null,"homepage":"https://doi.org/10.1145/3604915.3608827","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glami.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-07-11T13:10:43.000Z","updated_at":"2025-11-21T19:15:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"99b1b892-c3fe-4844-8d91-33756497a658","html_url":"https://github.com/glami/sansa","commit_stats":null,"previous_names":["glami/sansa"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/glami/sansa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glami%2Fsansa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glami%2Fsansa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glami%2Fsansa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glami%2Fsansa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glami","download_url":"https://codeload.github.com/glami/sansa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glami%2Fsansa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31294400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T01:43:37.129Z","status":"online","status_checked_at":"2026-04-02T02:00:08.535Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-inverse","collaborative-filtering","recommender-system","sparse-autoencoder","sparse-matrix"],"created_at":"2025-12-13T16:04:49.522Z","updated_at":"2026-04-02T02:08:47.276Z","avatar_url":"https://github.com/glami.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [SANSA: how to compute EASE on million item datasets](https://doi.org/10.1145/3604915.3608827)\n\n[![PyPI - Version](https://img.shields.io/pypi/v/sansa)](https://pypi.org/project/sansa/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![DOI](https://img.shields.io/badge/DOI-10.1145%2F3604915.3608827-blue)](https://doi.org/10.1145/3604915.3608827)\n\nOfficial implementation of scalable collaborative filtering model **SANSA**.\n\n![Architecture and training procedure of SANSA](assets/sansa.png)\n\n\u003e **Scalable Approximate NonSymmetric Autoencoder for Collaborative Filtering**  \n\u003e Spišák M., Bartyzal R., Hoskovec A., Peška L., Tůma M.  \n\u003e Paper: [10.1145/3604915.3608827](https://doi.org/10.1145/3604915.3608827)\n\u003e \n\u003e *Best Short Paper Runner-Up*, [17th ACM Conference on Recommender Systems (ACM RecSys 2023)](https://recsys.acm.org/recsys23/)\n\n### Reproducibility\nSee branch [reproduce_our_results](https://github.com/glami/sansa/tree/reproduce_our_results) for codes used in experiments and complete experimental results. \n\n## About\n\nSANSA is a scalable modification of [EASE](https://arxiv.org/abs/1905.03375), a shallow autoencoder for collaborative filtering, **specifically designed to handle item sets with millions of items**.\n- End-to-end sparse training procedure: instead of strenuously inverting the Gramian $X^TX$ of user-item interaction matrix $X$, SANSA efficiently finds a *sparse approximate inverse* of $X^TX$. \n- Training memory requirements are proportional to the number of non-zero elements in $X^TX$ (and this can be improved further).  \n- The model's density is prescribed via a hyperparameter. \n- As a sparse neural network, SANSA offers *very fast inference* times.\n\n### Learn more in our RecSys 2023 [short paper](https://dl.acm.org/doi/10.1145/3604915.3608827), or check out the conference [poster](assets/poster.pdf).\n### Watch our Miton AI Times meetup [presentation \u0026 demo (YouTube)](https://www.youtube.com/watch?v=uAoDLXqOY-s).\n\n## Installation\n```bash\npip install sansa\n```\n(make sure to install prerequisites first, see next section)\n### Prerequisites\nTraining of SANSA uses [scikit-sparse](https://github.com/scikit-sparse/scikit-sparse), which depends on the [SuiteSparse](https://github.com/DrTimothyAldenDavis/SuiteSparse) numerical library. To install SuiteSparse on Ubuntu and macOS, run the commands below: \n```bash\n# Ubuntu\nsudo apt-get install libsuitesparse-dev\n\n# macOS\nbrew install suite-sparse\n```\nNote that `brew` (and possibly other package managers) installs SuiteSparse objects to non-standard location. Before installing the package, you need to set\nthe correct path to SuiteSparse by setting the following 2 environment variables:\n```bash\nexport SUITESPARSE_INCLUDE_DIR={PATH TO YOUR SUITESPARSE}/include/suitesparse\nexport SUITESPARSE_LIBRARY_DIR={PATH TO YOUR SUITESPARSE}/lib\n```\nFor `brew`, you can find `{PATH TO YOUR SUITESPARSE}` by running `brew info suite-sparse`. To streamline this process, you can run\n```bash\nSUITESPARSE_DIR=$(brew info suitesparse | sed -n 4p | awk '{print $1}')  # path to brew-installed package is on the 4th line, 1st column\nexport SUITESPARSE_INCLUDE_DIR=$SUITESPARSE_DIR/include/suitesparse\nexport SUITESPARSE_LIBRARY_DIR=$SUITESPARSE_DIR/lib\n```\nwhich should set the correct environment variables for you.\n\n### Installation from source\nWith SuiteSparse path correctly specified, simply run\n```bash\npip install .\n```\nin the root directory of this repository.\n\n## Usage\n### Configuration\nSANSA model supports two methods of factorization of the Gramian matrix $X^TX$ and one method for inverting the lower triangular factor. \nFactorizers and inverters are configured separately and included in the model configuration:\n```python\nfrom sansa import SANSAConfig\n\nconfig = SANSAConfig(\n    l2 = 20.0,  # regularization strength\n    weight_matrix_density = 5e-5,  # desired density of weights\n    gramian_factorizer_config = factorizer_config,  # factorizer configuration\n    lower_triangle_inverter_config = inverter_config,  # inverter configuration\n)\n```\nTo get the configuration of a model instance, use the `config` property:\n```python\nconfig = model.config\n```\n#### Factorizer configuration\nChoose between two factorization techniques:\n1. **CHOLMOD** = exact Cholesky factorization sparsified after factorization. More accurate but memory-hungry; recommended for smaller, denser matrices.\n```python\nfrom sansa import CHOLMODGramianFactorizerConfig\n\nfactorizer_config = CHOLMODGramianFactorizerConfig()  # no hyperparameters\n```\n2. **ICF** = Incomplete Cholesky factorization. Less accurate but much more memory-efficient; recommended for very large, sparse matrices.\n```python\nfrom sansa import ICFGramianFactorizerConfig\n\nfactorizer_config = ICFGramianFactorizerConfig(\n    factorization_shift_step = 1e-3,  # initial diagonal shift if incomplete factorization fails\n    factorization_shift_multiplier = 2.0,  # multiplier for the shift for subsequent attempts\n)\n```\n#### Inverter configuration\nCurrently only one inverter is available: **UMR** -- residual minimization approach\n```python\nfrom sansa import UMRUnitLowerTriangleInverterConfig\n\ninverter_config = UMRUnitLowerTriangleInverterConfig(\n    scans=1,  # number of scans through all columns of the matrix\n    finetune_steps=5,  # number of finetuning steps, targeting worst columns\n)\n```\n### Training\n```python\nfrom sansa import SANSA\n\nX = ...  # training data -- scipy.sparse.csr_matrix (rows=users, columns=items)\nconfig = ...  # specify configuration of SANSA model\n\n# Instantiate model with the config\nmodel = SANSA(config)\n\n# Train model on the user-item matrix\nmodel.fit(X)\n# or on a precomputed symmetric item-item matrix\nmodel.fit(X, compute_gramian=False)\n```\nWeights of a SANSA model can be accessed using the `weights` attribute:\n```python\nw1, w2 = model.weights  # tuple of scipy.sparse.csr_matrix of shape (num_items, num_items)\n```\nWeights can be loaded into a model using the `load_weights` method:\n```python\nw1, w2 = ...  # tuple of scipy.sparse.csr_matrix of shape (num_items, num_items)\n\nmodel.load_weights((w1, w2))\n```\n### Inference\n#### 1. High-level inference: recommendation for a batch of users\n```python\nX = ...  # input interactions -- scipy.sparse.csr_matrix (rows=users, columns=items)\n\n# Get indices of top-k items for each user + corresponding scores\n# if mask_input=True, input items get score=0\ntop_k_indices, top_k_scores = model.recommend(X, k=10, mask_input=True)  # np.ndarrays of shape (X.shape[0], k)\n```\n#### 2. Low-level inference: forward pass\n```python\nX = ...  # input interactions -- scipy.sparse.csr_matrix (rows=users, columns=items)\n\n# Forward pass\nscores = model.forward(X)  # scipy.sparse.csr_matrix of shape X.shape\n```\n\n## License\nCopyright 2023 Inspigroup s.r.o.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n[https://github.com/glami/sansa/blob/main/LICENSE](https://github.com/glami/sansa/blob/main/LICENSE)\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n## Cite us\nPlease consider citing our paper:\n```\n@inproceedings{10.1145/3604915.3608827,\nauthor = {Spi\\v{s}\\'{a}k, Martin and Bartyzal, Radek and Hoskovec, Anton\\'{\\i}n and Peska, Ladislav and T\\r{u}ma, Miroslav},\ntitle = {Scalable Approximate NonSymmetric Autoencoder for Collaborative Filtering},\nyear = {2023},\nisbn = {9798400702419},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3604915.3608827},\ndoi = {10.1145/3604915.3608827},\nabstract = {In the field of recommender systems, shallow autoencoders have recently gained significant attention. One of the most highly acclaimed shallow autoencoders is easer, favored for its competitive recommendation accuracy and simultaneous simplicity. However, the poor scalability of easer (both in time and especially in memory) severely restricts its use in production environments with vast item sets. In this paper, we propose a hyperefficient factorization technique for sparse approximate inversion of the data-Gram matrix used in easer. The resulting autoencoder, sansa, is an end-to-end sparse solution with prescribable density and almost arbitrarily low memory requirements — even for training. As such, sansa allows us to effortlessly scale the concept of easer to millions of items and beyond.},\nbooktitle = {Proceedings of the 17th ACM Conference on Recommender Systems},\npages = {763–770},\nnumpages = {8},\nkeywords = {Algorithm scalability, Numerical approximation, Sparse approximate inverse, Sparse autoencoders},\nlocation = {Singapore, Singapore},\nseries = {RecSys '23}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglami%2Fsansa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglami%2Fsansa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglami%2Fsansa/lists"}