{"id":13710340,"url":"https://github.com/statisticianinstilettos/recmetrics","last_synced_at":"2026-01-14T10:57:00.241Z","repository":{"id":33276526,"uuid":"153137640","full_name":"statisticianinstilettos/recmetrics","owner":"statisticianinstilettos","description":"A library of metrics for evaluating recommender systems","archived":false,"fork":false,"pushed_at":"2024-01-11T20:34:53.000Z","size":5992,"stargazers_count":581,"open_issues_count":20,"forks_count":102,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-05-06T19:34:39.455Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statisticianinstilettos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-15T15:29:49.000Z","updated_at":"2025-04-22T12:36:43.000Z","dependencies_parsed_at":"2024-01-29T19:30:17.656Z","dependency_job_id":"07cf2427-b0cc-4c05-b3bb-6b536a659fe8","html_url":"https://github.com/statisticianinstilettos/recmetrics","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/statisticianinstilettos/recmetrics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticianinstilettos%2Frecmetrics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticianinstilettos%2Frecmetrics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticianinstilettos%2Frecmetrics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticianinstilettos%2Frecmetrics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statisticianinstilettos","download_url":"https://codeload.github.com/statisticianinstilettos/recmetrics/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticianinstilettos%2Frecmetrics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28417775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T10:47:48.104Z","status":"ssl_error","status_checked_at":"2026-01-14T10:46:19.031Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:00:54.698Z","updated_at":"2026-01-14T10:57:00.226Z","avatar_url":"https://github.com/statisticianinstilettos.png","language":"Jupyter Notebook","funding_links":[],"categories":["Evaluation","Recommendation Systems","推荐系统"],"sub_categories":["Synthetic Data","NLP","Automatic Plotting"],"readme":"# recmetrics\nA python library of evalulation metrics and diagnostic tools for recommender systems.\n\n_**This library is actively maintained. My goal is to continue to develop this as the main source of recommender metrics in python. Please submit issues, bug reports, feature requests or contribute directly through a pull request. If I do not respond you can ping me directly at longoclaire@gmail.com **_\n\n|Description|Command|\n|:---:|:---|\n|Installation|`pip install recmetrics`|\n|Notebook Demo|`make run_demo`|\n|Test|`make test`|\n\nFull documentation coming soon.... In the interim, the python notebook in this repo, `example.ipynb`, contains examples of these plots and metrics in action using the [MovieLens 20M Dataset](https://grouplens.org/datasets/movielens/20m/). You can also view my [Medium Article](https://towardsdatascience.com/evaluation-metrics-for-recommender-systems-df56c6611093).\n\n\u003ci\u003eThis library is an open source project. The goal is to create a go-to source for metrics related to recommender systems. I have begun by adding metrics and plots I found useful during my career as a Data Scientist at a retail company, and encourage the community to contribute. If you would like to see a new metric in this package, or find a bug, or have suggestions for improvement, please contribute!\n\u003c/i\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://media.giphy.com/media/YAnpMSHcurJVS/giphy.gif\" width=200\u003e\n\u003c/p\u003e\n\n## Long Tail Plot\n\n```python\nrecmetrics.long_tail_plot()\n```\n\nThe Long Tail plot is used to explore popularity patterns in user-item interaction data. Typically, a small number of items will make up most of the volume of interactions and this is referred to as the \"head\". The \"long tail\" typically consists of most products, but make up a small percent of interaction volume.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/long_tail_plot.png\" alt=\"Long Tail Plot\" width=600\u003e\n\u003c/p\u003e\n\nThe items in the \"long tail\" usually do not have enough interactions to accurately be recommended using user-based recommender systems like collaborative filtering due to inherent popularity bias in these models and data sparsity. Many recommender systems require a certain level of sparsity to train. A good recommender must balance sparsity requirements with popularity bias.\n\n## Mar@K and Map@K\n\n```python\nrecmetrics.mark()\n\nrecmetrics.mark_plot()\n\nrecmetrics.mapk_plot()\n```\nMean Average Recall at K (Mar@k) measures the recall at the kth recommendations. Mar@k considers the order of recommendations, and penalizes correct recommendations based on the order of the recommendations. Map@k and Mar@k are ideal for evaluating an ordered list of recommendations. .\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/mark_plot.png\" alt=\"Mar@k\" width=600\u003e\n\u003c/p\u003e\n\nMap@k and Mar@k metrics suffer from popularity bias. If a model works well on popular items, the majority of recommendations will be correct, and Mar@k and Map@k can appear to be high while the model may not be making useful or personalized recommendations.\n\n## Coverage\n\n```python\nrecmetrics.prediction_coverage()\n\nrecmetrics.catalog_coverage()\n\nrecmetrics.coverage_plot()\n```\n\nCoverage is the percent of items that the recommender is able to recommend. It is depicted by this formula.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/coverage_equation.gif\" alt=\"Coverage Equation\" width=200\u003e\n\u003c/p\u003e\n\nWhere 'I' is the number of unique items the model recommends in the test data, and 'N' is the total number of unique items in the training data.\nThe catalog coverage is the rate of distinct items recommended over a period of time\nto the user. For this purpose the catalog coverage function take also as parameter 'k' the number of observed recommendation lists. In essence, both of metrics quantify the proportion of items that the system is able to work with.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/coverage_plot.png\" alt=\"Coverage Plot\" width=400\u003e\n\u003c/p\u003e\n\n## Novelty\n\n```python\nrecmetrics.novelty()\n```\n\nNovelty measures the capacity of a recommender system to propose novel and unexpected items which a user is unlikely to know about already. It uses the self-information of the recommended item and it calculates the mean self-information per top-N recommended list and averages them over all users. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/novelty.gif\" alt=\"Coverage Equation\" width=200\u003e\n\u003c/p\u003e\n\nWhere the absolute U is the number of users, count(i) is the number of users consumed the specific item and N is the length of recommended list.\n\n## Personalization\n\n```python\nrecmetrics.personalization()\n```\n\nPersonalization is the dissimilarity between user's lists of recommendations.\nA high score indicates user's recommendations are different).\nA low personalization score indicates user's recommendations are very similar.\n\nFor example, if two users have recommendations lists [A,B,C,D] and [A,B,C,Y], the personalization can be calculated as:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/personalization_code.png\" alt=\"Coverage Plot\" width=400\u003e\n\u003c/p\u003e\n\n## Intra-list Similarity\n\n```python\nrecmetrics.intra_list_similarity()\n```\n\nIntra-list similarity uses a feature matrix to calculate the cosine similarity between the items in a list of recommendations.\nThe feature matrix is indexed by the item id and includes one-hot-encoded features.\nIf a recommender system is recommending lists of very similar items, the intra-list similarity will be high.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/ils_matrix.png\" alt=\"Coverage Plot\" width=400\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/ils_code.png\" alt=\"Coverage Plot\" width=400\u003e\n\u003c/p\u003e\n\n## MSE and RMSE\n\n```python\nrecmetrics.mse()\nrecmetrics.rmse()\n```\n\nMean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used to evaluate the accuracy of predicted values that such as ratings compared to the true value, y.\nThese can also be used to evalaute the reconstruction of a ratings matrix.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/mse.gif\" alt=\"MSE Equation\" width=200\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/rmse.gif\" alt=\"RMSE Equation\" width=200\u003e\n\u003c/p\u003e\n\n## Predicted Class Probability Distribution Plots\n\n```python\nrecmetrics.class_separation_plot()\n```\n\n\nThis is a plot of the distribution of the predicted class probabilities from a classification model. The plot is typically used to visualize how well a model is able to distinguish between two classes, and can assist a Data Scientist in picking the optimal decision threshold to classify observations to class 1 (0.5 is usually the default threshold for this method). The color of the distribution plots represent true class 0 and 1, and everything to the right of the decision threshold is classified as class 0.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/class_probs.png\" alt=\"binary class probs\" width=400\u003e\n\u003c/p\u003e\n\nThis plot can also be used to visualize the recommendation scores in two ways. \n\nIn this example, and item is considered class 1 if it is rated more than 3 stars, and class 0 if it is not. This example shows the performance of a model that recommends an item when the predicted 5-star rating is greater than 3 (plotted as a vertical decision threshold line). This plot shows that the recommender model will perform better if items with a predicted rating of 3.5 stars or greater is recommended. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/rec_scores.png\" alt=\"ratings scores\" width=500\u003e\n\u003c/p\u003e\n\nThe raw predicted 5 star rating for all recommended movies could be visualized with this plot to see the optimal predicted rating score to threshold into a prediction of that movie. This plot also visualizes how well the model is able to distinguish between each rating value. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/ratings_distribution.png\" alt=\"ratings distributions\" width=500\u003e\n\u003c/p\u003e\n\n## ROC and AUC\n\n```python\nrecmetrics.roc_plot()\n```\n\nThe Receiver Operating Characteristic (ROC) plot is used to visualize the trade-off between true positives and false positives for binary classification. The Area Under the Curve (AUC) is sometimes used as an evaluation metrics. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/ROC.png\" alt=\"ROC\" width=600\u003e\n\u003c/p\u003e\n\n## Recommender Precision and Recall\n```python\nrecmetrics.recommender_precision()\nrecmetrics.recommender_recall()\n```\n\nRecommender precision and recall uses all recommended items over all users to calculate traditional precision and recall. A recommended item that was actually interacted with in the test data is considered an accurate prediction, and a recommended item that is not interacted with, or received a poor interaction value, can be considered an inaccurate recommendation. The user can assign these values based on their judgment. \n\n## Precision and Recall Curve\n\n```python\nrecmetrics.precision_recall_plot()\n```\n\nThe Precision and Recall plot is used to visualize the trade-off between precision and recall for one class in a classification.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/PrecisionRecallCurve.png\" alt=\"PandRcurve\" width=400\u003e\n\u003c/p\u003e\n\n## Confusion Matrix\n\n```python\nrecmetrics.make_confusion_matrix()\n```\n\nTraditional confusion matrix used to evaluate false positive and false negative trade-offs.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/confusion_matrix.png\" alt=\"PandRcurve\" width=400\u003e\n\u003c/p\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatisticianinstilettos%2Frecmetrics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatisticianinstilettos%2Frecmetrics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatisticianinstilettos%2Frecmetrics/lists"}