{"id":13936722,"url":"https://github.com/evfro/polara","last_synced_at":"2025-04-04T06:07:02.129Z","repository":{"id":50652556,"uuid":"63395706","full_name":"evfro/polara","owner":"evfro","description":"Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.","archived":false,"fork":false,"pushed_at":"2024-12-26T11:06:22.000Z","size":2089,"stargazers_count":251,"open_issues_count":4,"forks_count":22,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-28T05:09:52.342Z","etag":null,"topics":["collaborative-filtering","evaluation","matrix-factorization","recommender-system","tensor-factorization","top-n-recommendations"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evfro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-07-15T06:00:04.000Z","updated_at":"2025-02-01T07:57:25.000Z","dependencies_parsed_at":"2025-01-15T15:13:29.090Z","dependency_job_id":"e845a832-b674-4cca-a049-8186735454d7","html_url":"https://github.com/evfro/polara","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evfro%2Fpolara","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evfro%2Fpolara/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evfro%2Fpolara/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evfro%2Fpolara/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evfro","download_url":"https://codeload.github.com/evfro/polara/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247128746,"owners_count":20888235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborative-filtering","evaluation","matrix-factorization","recommender-system","tensor-factorization","top-n-recommendations"],"created_at":"2024-08-07T23:02:56.420Z","updated_at":"2025-04-04T06:07:02.109Z","avatar_url":"https://github.com/evfro.png","language":"Python","readme":"# POLARA\nPolara is the first recommendation framework that allows a deeper analysis of recommender systems performance, based on the idea of feedback polarity (by analogy with sentiment polarity in NLP).\n\nIn addition to standard question of \"how good a recommender system is at recommending relevant items\", it allows assessing the ability of a recommender system to **avoid irrelevant recommendations** (thus, less likely to disappoint a user). You can read more about this idea in a research paper [Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks](http://arxiv.org/abs/1607.04228). The research results can be easily reproduced with this framework, visit a \"fixed state\" version of the code at https://github.com/Evfro/fifty-shades (there're also many usage examples).\nThe framework also features efficient tensor-based implementation of an algorithm, proposed in the paper, that takes full advantage of the polarity-based formulation.\n\n\n## Prerequisites\nCurrent version of Polara supports both Python 2 and Python 3 environments. Future versions are likely to drop support of Python 2 to make a better use of Python 3 features.\n\nThe framework heavily depends on `Pandas, Numpy, Scipy` and `Numba` packages. Better performance can be achieved with `mkl` (optional). It's also recommended to use `jupyter notebook` for experimentation. Visualization of results can be done with help of `matplotlib`. The easiest way to get all those at once is to use the latest [Anaconda distribution](https://www.continuum.io/downloads).\n\nIf you use a separate `conda` environment for testing, the following command can be issued to ensure that all required dependencies are in place (see [this](http://conda.pydata.org/docs/commands/conda-install.html) for more info):\n\n`conda install --file conda_req.txt`\n\nAlternatively, a new conda environment with all required packages can be created by:\n\n`conda create -n \u003cyour_environment_name\u003e python=3.7 --file conda_req.txt`\n\n\n## Installation\nThe easiest way is to install directly from source. Activate your conda environment and run:  \n`pip install --no-cache-dir --upgrade git+https://github.com/evfro/polara.git#egg=polara`  \nThis will install the current release version.  For the most recent developer version insert `@develop` between `polara.git` and `#egg=polara` in the line above.\n\nAlternatively, you can manually clone this repository to a local machine (`git clone git://github.com/evfro/polara.git`). Once in the root of the newly created local repository, run  \n`python setup.py install`\n\n\n## Usage example\nA special effort was made to make a *recsys for humans*, which stresses on the ease of use of the framework. For example, that's how you build a pure SVD recommender on top of the [Movielens 1M](http://grouplens.org/datasets/movielens/) dataset:\n\n```python\nfrom polara.recommender.data import RecommenderData\nfrom polara.recommender.models import SVDModel\nfrom polara.datasets.movielens import get_movielens_data\n# get data and convert it into appropriate format\nml_data = get_movielens_data(get_genres=False)\ndata_model = RecommenderData(ml_data, 'userid', 'movieid', 'rating')\n# build PureSVD model and evaluate it\nsvd = SVDModel(data_model)\nsvd.build()\nsvd.evaluate()\n```\nSeveral different scenarios and use cases, which cover many practical aspects, can also be found in the [examples directory](/examples).\n\n## Creating new recommender models\nBasic models can be extended by subclassing `RecommenderModel` class and defining two required methods: `self.build()` and `self.get_recommendations()`. Here's an example of a simple item-to-item recommender model:\n```python\nfrom polara.recommender.models import RecommenderModel\n\nclass CooccurrenceModel(RecommenderModel):\n    def __init__(self, *args, **kwargs):\n        super(CooccurrenceModel, self).__init__(*args, **kwargs)\n        self.method = 'item-to-item' # pick some meaningful name\n\n    def build(self):\n        # build model - calculate item-to-item matrix\n        user_item_matrix = self.get_training_matrix()\n        # rating matrix product  R^T R  gives cooccurrences count\n        i2i_matrix = user_item_matrix.T.dot(user_item_matrix) # gives CSC format\n        # exclude \"self-links\" and ensure only non-zero elements are stored\n        i2i_matrix.setdiag(0)\n        i2i_matrix.eliminate_zeros()\n        # store matrix for generating recommendations\n        self.i2i_matrix = i2i_matrix\n\n    def get_recommendations(self):\n        # get test users information and generate top-k recommendations\n        test_matrix, test_data = self.get_test_matrix()\n        # calculate predicted scores\n        i2i_scores = test_matrix.dot(self.i2i_matrix)\n        # prevent seen items from appearing in recommendations\n        if self.filter_seen:\n            self.downvote_seen_items(i2i_scores, test_data)\n        # generate top-k recommendations for every test user\n        top_recs = self.get_topk_elements(i2i_scores)\n        return top_recs\n```\nAnd the model is ready for evaluation:\n```python\ni2i = CooccurrenceModel(data_model)\ni2i.build()\ni2i.evaluate()\n```\n\n## Bulk experiments\nHere's an example of how to perform **top-*k* recommendations** experiments with *5-fold cross-validation* for several models at once:\n\n```python\nfrom polara.evaluation import evaluation_engine as ee\nfrom polara.recommender.models import PopularityModel, RandomModel\n\n# define models\ni2i = CooccurrenceModel(data_model)\nsvd = SVDModel(data_model)\npopular = PopularityModel(data_model)\nrandom = RandomModel(data_model)\nmodels = [i2i, svd, popular, random]\n\nmetrics = ['ranking', 'relevance'] # metrics for evaluation: NDGC, Precision, Recall, etc.\nfolds = [1, 2, 3, 4, 5] # use all 5 folds for cross-validation (default)\ntopk_values = [1, 5, 10, 20, 50] # values of k to experiment with\n\n# run 5-fold CV experiment\nresult = ee.run_cv_experiment(models, folds, metrics,\n                              fold_experiment=ee.topk_test,\n                              topk_list=topk_values)\n\n# calculate average values across all folds for e.g. relevance metrics\nscores = result.mean(axis=0, level=['top-n', 'model']) # use .std instead of .mean for standard deviation\nscores.xs('recall', level='metric', axis=1).unstack('model')\n```\nwhich results in something like:\n\n| **model** | **MP** | **PureSVD** | **RND** | **item-to-item** |\n| ---: |:---:|:---:|:---:|:---:|\n| **top-n** |\n| **1** |  0.017828 |  0.079428 |  0.000055 |  0.024673 |\n| **5** |  0.086604 |  0.219408 |  0.001104 |  0.126013 |\n| **10** |  0.138546 |  0.300658 |  0.001987 |  0.202134 |\n| ... | ... | ... | ... | ... |\n\n## Custom pipelines\nPolara by default takes care of raw data and helps to organize full evaluation pipeline, that includes splitting data into training, test and evaluation datasets, performing cross-validation and gathering results. However, if you need more control on that workflow, you can easily implement your custom usage scenario for you own needs.\n\n### Build models without evaluation\nIf you simply want to build a model on a provided data, then you only need to define a training set. This can be easily achieved with the help of `prepare_training_only` method (assuming you have a pandas dataframe named `train_data` with corresponding \"user\", \"item\" and \"rating\" columns):\n```python\ndata_model = RecommenderData(train_data, 'user', 'item', 'rating')\ndata_model.prepare_training_only()\n```\nNow you are ready to build your models (as in examples above) and export them to whatever workflow you currently have.\n\n### Warm-start and known-user scenarios\nBy default polara makes testset and trainset disjoint by users, which allows to evaluate models against *user warm-start*.\nHowever in some situations (for example, when polara is used within a larger pipeline) you might want to implement strictly a *known user* scenario to assess the quality of your recommender system on the unseen (held-out) items for the known users. The change between these two scenarios as controlled by setting `data_model.warm_start` attribute to `True` or `False`. See [Warm-start and standard scenarios](examples/Warm_start_and_standard_scenarios.ipynb) Jupyter notebook as an example.\n\n### Externally provided test data\nIf you don't want polara to perform data splitting (for example, when your test data is already provided), you can use the `set_test_data` method of a `RecommenderData` instance. It has a number of input arguments that cover all major cases of externally provided data. For example, assuming that you have new users' preferences encoded in the `unseen_data` dataframe and the corresponding held-out preferences in the `holdout` dataframe, the following command allows to include them into the data model:  \n```python\ndata_model.set_test_data(testset=unseen_data, holdout=holdout, warm_start=True)\n```\nPolara will automatically perform all required transformations to ensure correct functioning of the evaluation pipeline. To evaluate models you simply call standard methods without any modifications:\n```python\nsvd.build()\nsvd.evaluate()\n```\nIn this case the recommendations are generated based on the testset and evaluated against the holdout.\nSee more usage examples in the [Custom evaluation](examples/Custom_evaluation.ipynb) notebook.\n\n### Reproducing others work\nPolara offers even more options to highly customize experimentation pipeline and tailor it to specific needs. See, for example, [Reproducing EIGENREC results](examples/Reproducing_EIGENREC_results.ipynb) notebook to learn how Polara can be used to reproduce experiments from the *\"[EIGENREC: generalizing PureSVD for effective and efﬁcient top-N recommendations](https://arxiv.org/abs/1511.06033)\"* paper.\n\n## How to cite\nIf you find this framework useful for your research, please cite [the following paper](https://dl.acm.org/citation.cfm?id=3347055):\n```\n\"HybridSVD: when collaborative information is not enough\"; Evgeny Frolov and Ivan Oseledets, 2019. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). ACM, New York, NY, USA, 331-339.\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevfro%2Fpolara","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevfro%2Fpolara","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevfro%2Fpolara/lists"}