{"id":13958374,"url":"https://github.com/amzn/pecos","last_synced_at":"2025-04-07T23:12:33.545Z","repository":{"id":38217541,"uuid":"286876809","full_name":"amzn/pecos","owner":"amzn","description":"PECOS - Prediction for Enormous and Correlated Spaces","archived":false,"fork":false,"pushed_at":"2024-05-28T19:07:17.000Z","size":5979,"stargazers_count":496,"open_issues_count":23,"forks_count":101,"subscribers_count":20,"default_branch":"mainline","last_synced_at":"2024-05-29T10:15:04.950Z","etag":null,"topics":["approximate-nearest-neighbor-search","extreme-multi-label-classification","extreme-multi-label-ranking","machine-learning-algorithms","transformers"],"latest_commit_sha":null,"homepage":"https://libpecos.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amzn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-12T00:25:51.000Z","updated_at":"2024-06-15T01:22:32.197Z","dependencies_parsed_at":"2024-05-28T22:13:39.569Z","dependency_job_id":"63001f9d-b45d-4986-a977-1e18ff5455dd","html_url":"https://github.com/amzn/pecos","commit_stats":{"total_commits":177,"total_committers":29,"mean_commits":6.103448275862069,"dds":0.711864406779661,"last_synced_commit":"951ed33a355d6fb8c259a700375568d7c8f3b65b"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fpecos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fpecos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fpecos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fpecos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amzn","download_url":"https://codeload.github.com/amzn/pecos/tar.gz/refs/heads/mainline","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744335,"owners_count":20988783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-nearest-neighbor-search","extreme-multi-label-classification","extreme-multi-label-ranking","machine-learning-algorithms","transformers"],"created_at":"2024-08-08T13:01:30.782Z","updated_at":"2025-04-07T23:12:33.523Z","avatar_url":"https://github.com/amzn.png","language":"Python","funding_links":[],"categories":["其他_推荐系统"],"sub_categories":["网络服务_其他"],"readme":"# PECOS - Predictions for Enormous and Correlated Output Spaces\n\n[![PyPi Latest Release](https://img.shields.io/pypi/v/libpecos)](https://img.shields.io/pypi/v/libpecos)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE)\n\nPECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.\nPECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.\n\nGiven an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance. \n\n\n### Features\n\n#### Extreme Multi-label Ranking and Classification\n* X-Linear ([`pecos.xmc.xlinear`](pecos/xmc/xlinear/README.md)): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the [PECOS paper (Yu et al., 2020)](https://arxiv.org/pdf/2010.05878.pdf).\n  + fast real-time inference in C++\n  + can handle 100MM output space\n\n* XR-Transformer ([`pecos.xmc.xtransformer`](pecos/xmc/xtransformer/README.md)): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in [XR-Transformer paper (Zhang et al., 2021)](https://arxiv.org/pdf/2110.00685.pdf).\n  + easy to extend with many pre-trained Transformer models from [huggingface transformers](https://github.com/huggingface/transformers).\n  + establishes the State-of-the-art on public XMC benchmarks.\n\n* ANN Search with HNSW ([`pecos.ann.hnsw`](pecos/ann/hnsw/README.md)): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm ([`Malkov et al., TPAMI 2018`](https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf)).\n  + Supports both sparse and dense input features\n  +  SIMD optimization for both dense/sparse distance computation\n  +  Supports thread-safe graph construction in parallel on multi-core shared memory machines\n  +  Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead\n\n\n## Requirements and Installation\n\n* Python (3.9, 3.10, 3.11, 3.12)\n* Pip (\u003e=19.3)\n\nSee other dependencies in [`setup.py`](https://github.com/amzn/pecos/blob/mainline/setup.py#L135)\nYou should install PECOS in a [virtual environment](https://docs.python.org/3/library/venv.html).\nIf you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).\n\n### Supporting Platforms\n* Ubuntu 20.04 and 22.04\n* Amazon Linux 2\n\n### Installation from Wheel\n\nPECOS can be installed using pip as follows:\n```bash\npython3 -m pip install libpecos\n```\n\n### Installation from Source\n\n#### Prerequisite builder tools\n* For Ubuntu (20.04, 22.04):\n``` bash\nsudo apt-get update \u0026\u0026 sudo apt-get install -y build-essential git python3 python3-distutils python3-venv\n```\n* For Amazon Linux 2:\n``` bash\nsudo yum -y install python3 python3-devel python3-distutils python3-venv \u0026\u0026 sudo yum -y groupinstall 'Development Tools'\n```\n\n#### Install and develop locally\n```bash\ngit clone https://github.com/amzn/pecos\ncd pecos\npython3 -m pip install --editable ./\n```\n\n\n## Quick Tour\nTo have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.\n\n### Toy Example\nThe eXtreme Multi-label Ranking (XMR) problem is defined by two matrices\n* instance-to-feature matrix `X`, of shape `N by D` in [`SciPy CSR format`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html)\n* instance-to-label matrix `Y`, of shape `N by L` in [`SciPy CSR format`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html)\n\nSome toy data matrices are available in the [`tst-data`](https://github.com/amzn/pecos/tree/mainline/test/tst-data/xmc/xlinear) folder. \n\nPECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):\n```python\n\u003e\u003e\u003e from pecos.xmc.xlinear.model import XLinearModel\n\u003e\u003e\u003e from pecos.xmc import Indexer, LabelEmbeddingFactory\n\n# Build hierarchical label tree and train a XR-Linear model\n\u003e\u003e\u003e label_feat = LabelEmbeddingFactory.create(Y, X)\n\u003e\u003e\u003e cluster_chain = Indexer.gen(label_feat)\n\u003e\u003e\u003e model = XLinearModel.train(X, Y, C=cluster_chain)\n\u003e\u003e\u003e model.save(\"./save-models\")\n```\n\nAfter learning the model, we do prediction and evaluation \n```python\n\u003e\u003e\u003e from pecos.utils import smat_util\n\u003e\u003e\u003e Yt_pred = model.predict(Xt)\n# print precision and recall at k=10\n\u003e\u003e\u003e print(smat_util.Metrics.generate(Yt, Yt_pred))\n```\n\nPECOS also offers optimized C++ implementation for fast real-time inference\n```python\n\u003e\u003e\u003e model = XLinearModel.load(\"./save-models\", is_predict_only=True)\n\u003e\u003e\u003e for i in range(X_tst.shape[0]):\n\u003e\u003e\u003e   y_tst_pred = model.predict(X_tst[i], threads=1)\n```\n\n\n## Citation\n\nIf you find PECOS useful, please consider citing the following paper:\n\n* [PECOS: Prediction for Enormous and Correlated Output Spaces (Yu et al., JMLR 2022)](https://arxiv.org/pdf/2010.05878.pdf) [[bib]](./bibtex/yu2020pecos.bib)\n\nSome papers from PECOS team:\n\n* [Representer Points for Explaining Recommender Systems (Tsai et al., ICML 2023)](./) [[bib]](./bibtex/)\n\n* [PINA: Leveraging Side Information in eXtreme Multilabel Classification via Predicted Instance Neighborhood Aggregation (Chien et al., ICML 2023)](./) [[bib]](./bibtex/)\n\n* [Uncertainty Quantification in Extreme Classification (Jiang et al., SIGIR 2023)](./) [[bib]](./bibtex/)\n\n* [FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search (Chen et al., WWW 2023)](https://dl.acm.org/doi/abs/10.1145/3543507.3583318) [[bib]](./bibtex/)\n\n* [End-to-End Learning to Index and Search in Large Output Space (Gupta et al., NeurIPS 2022)](https://papers.nips.cc/paper_files/paper/2022/hash/7d4f98f916494121aca3da02e36a4d18-Abstract-Conference.html) [[bib]](./bibtex/)\n\n* [Relevance under the Iceberg: Reasonable Prediction for Extreme Multi-label Classification (Jiang et al., SIGIR 2022)](https://dl.acm.org/doi/abs/10.1145/3477495.3531767) [[bib]](./bibtex/)\n\n* [Extreme Zero-Shot Learning for Extreme Text Classification (Xiong et al., NAACL 2022)](https://aclanthology.org/2022.naacl-main.399.pdf) [[bib]](./bibtex/)\n\n* [Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction (Chien et al., ICLR 2022)](https://openreview.net/pdf?id=KJggliHbs8) [[bib]](./bibtex/chien2021node.bib)\n\n* [Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees (Etter et al., WWW 2022)](https://dl.acm.org/doi/10.1145/3485447.3511973) [[bib]](./bibtex/etter2021accelerating.bib)\n\n* [Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification (Zhang et al., NeurIPS 2021)](https://arxiv.org/pdf/2110.00685.pdf) [[bib]](./bibtex/zhang2021fast.bib)\n\n* [Label Disentanglement in Partition-based Extreme Multilabel Classification (Liu et al., NeurIPS 2021)](https://arxiv.org/pdf/2106.12751.pdf) [[bib]](./bibtex/liu2021label.bib)\n\n* [Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification (Baharav et al., CIKM 2021)](https://arxiv.org/pdf/2106.00730.pdf) [[bib]](./bibtex/baharav2021enabling.bib)\n\n* [Extreme Multi-label Learning for Semantic Matching in Product Search (Chang et al., KDD 2021)](https://arxiv.org/pdf/2106.12657.pdf) [[bib]](./bibtex/chang2021extreme.bib)\n\n* [Session-Aware Query Auto-completion using Extreme Multi-label Ranking (Yadav et al., KDD 2021)](https://arxiv.org/pdf/2012.07654.pdf)  [[bib]](./bibtex/yadav2021session.bib)\n\n* [Top-k eXtreme Contextual Bandits with Arm Hierarchy (Sen et al., ICML 2021)](https://arxiv.org/pdf/2102.07800.pdf) [[bib]](./bibtex/sen2021top.bib)\n\n* [Taming pretrained transformers for extreme multi-label text classification (Chang et al., KDD 2020)](https://arxiv.org/pdf/1905.02331.pdf) [[bib]](./bibtex/chang2020taming.bib)\n\n\n## License\n\nCopyright (2021) Amazon.com, Inc.\n \nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n \n    http://www.apache.org/licenses/LICENSE-2.0\n \nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famzn%2Fpecos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famzn%2Fpecos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famzn%2Fpecos/lists"}