{"id":23420784,"url":"https://github.com/hand10ryo/pytorchcml","last_synced_at":"2025-04-12T14:05:14.894Z","repository":{"id":37961175,"uuid":"354592756","full_name":"hand10ryo/PyTorchCML","owner":"hand10ryo","description":"PyTorchCML is a library of PyTorch implementations of matrix factorization (MF) and collaborative metric learning (CML), algorithms used in recommendation systems and data mining.","archived":false,"fork":false,"pushed_at":"2022-07-16T13:29:09.000Z","size":428,"stargazers_count":20,"open_issues_count":6,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-12T04:36:29.790Z","etag":null,"topics":["collaborative-filtering","implicit-feedback","machine-learning","metric-learning","python","pytorch","recommender-system"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hand10ryo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-04T16:30:53.000Z","updated_at":"2024-03-11T11:44:01.000Z","dependencies_parsed_at":"2022-09-06T14:41:19.445Z","dependency_job_id":null,"html_url":"https://github.com/hand10ryo/PyTorchCML","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hand10ryo%2FPyTorchCML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hand10ryo%2FPyTorchCML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hand10ryo%2FPyTorchCML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hand10ryo%2FPyTorchCML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hand10ryo","download_url":"https://codeload.github.com/hand10ryo/PyTorchCML/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230903909,"owners_count":18297817,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborative-filtering","implicit-feedback","machine-learning","metric-learning","python","pytorch","recommender-system"],"created_at":"2024-12-23T02:11:54.245Z","updated_at":"2024-12-23T02:11:55.162Z","avatar_url":"https://github.com/hand10ryo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyTorchCML\n\n![https://github.com/hand10ryo/PyTorchCML/blob/image/images/icon.png](https://github.com/hand10ryo/PyTorchCML/blob/image/images/icon.png)\n\nPyTorchCML is a library of PyTorch implementations of matrix factorization (MF) and collaborative metric learning (CML), algorithms used in recommendation systems and data mining.\n\n日本語版READMEは[こちら](https://github.com/hand10ryo/PyTorchCML/blob/main/README_ja.md)\n\n# What is CML ?\n\nCML is an algorithm that combines metric learning and MF. It allows us to embed elements of two sets, such as user-item or document-word, into a joint distance metric space using their relational data.\n\nIn particular, CML is known to capture user-user and item-item relationships more precisely than MF and can achieve higher accuracy and interpretability than MF for recommendation systems [1]. In addition, the embeddings can be used for secondary purposes such as friend recommendations on SNS and similar item recommendations on e-commerce sites.\n\nFor more details, please refer to this reference [1].\n\n# Installation\n\nYou can install PyTorchCML using Python's package manager pip.\n\n```bash\npip install PyTorchCML\n```\n\nYou can also download the source code directly and build your environment with poetry.。\n\n```bash\ngit clone https://github.com/hand10ryo/PyTorchCML\npoetory install \n```\n\n## dependencies\n\nThe dependencies are as follows\n\n- python = \"\u003e=3.7.10,\u003c3.9\"\n- torch = \"^1.8.1\"\n- scikit-learn = \"^0.22.2\"\n- scipy = \"^1.4.1\"\n- numpy = \"^1.19.5\"\n- pandas = \"^1.1.5\"\n- tqdm = \"^4.41.1\"\n\n# Usage\n\n## Example\n\n[This](https://github.com/hand10ryo/PytorchCML/tree/main/examples/notebooks) is a jupyter notebook example using the Movielens 100k dataset.\n\n## Overview\n\nThis library consists of the following six modules.\n\n- trainers\n- models\n- samplers\n- losses\n- regularizers\n- evaluators\n\nBy combining these modules, you can implement a variety of algorithms.\n\nThe following figure shows the relationship between these modules.\n\n![https://github.com/hand10ryo/PyTorchCML/blob/image/images/diagram.png](https://github.com/hand10ryo/PyTorchCML/blob/image/images/diagram.png)\n\nThe most straightforward implementation is as follows.\n\n```python\nimport torch\nfrom torch import optim\nimport numpy as np\nfrom PyTorchCML import losses, models, samplers, trainers\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n# train dataset (whose columns are [user_id, item_id].)\ntrain_set = np.array([[0, 0], [0, 1], [1, 2], [1, 3]]) \ntrain_set_torch = torch.LongTensor(train_set).to(device)\nn_user = train_set[:,0].max() + 1\nn_item = train_set[:,1].max() + 1\n\n# model settings\nmodel = models.CollaborativeMetricLearning(n_user, n_item, n_dim=10).to(device)\noptimizer = optim.Adam(model.parameters(), lr=1e-3)\ncriterion = losses.MinTripletLoss(margin=1).to(device)\nsampler = samplers.BaseSampler(train_set_torch, n_user, n_item, device=device)\ntrainer = trainers.BaseTrainer(model, optimizer, criterion, sampler)\n\n# run \ntrainer.fit(n_batch=256, n_epoch=3)\n```\n\nThe input `train_set` represents a two-column NumPy array whose records are the user IDs and item IDs that received positive feedback. \n\nThe `n_user` and `n_item` are the number of users and items. Here, we assume that user ID and item ID start from 0 and that all users and items are included in the train_set.\n\nThen, define model, optimizer, criterion, and sampler, input them to a trainer and run the trainer's fit method to start learning CM\n\n## models\n\nThe models is the module that handles the embeddings.\n\nThere are currently two models to choose from as follows.\n\n- models.CollaborativeMetricLearning\n- models.LogitMatrixFactorization\n\nYou can predict the relationship between the target user and the item with the `predict` method.\n\nCML uses vector distance, while MF uses the inner product to represent the relationship.\n\nYou can also set the maximum norm and initial value of the embeddings.\n\nFor example, in `LogitMatrixFactorization`, this is how it works.\n\n```python\nmodel = models.LogitMatrixFactorization(\n    n_user, n_item, n_dim, max_norm=5,\n    user_embedding_init = torch.Tensor(U),   # shape = (n_user, n_dim)\n    item_embedding_init = torch.Tensor(V.T), # shape = (n_dim, n_item)\n).to(device)\n```\n\n## losses\n\nThe losses module is for handling the loss function for learning embeddings.\nWe can mainly divide the loss function into PairwiseLoss and TripletLoss.\n\nPairwiseLoss is the loss for each user-item pair $(u,i)$.\n\nTripletLoss is the loss per $(u,i_+,i_-)$.\nHere, $(u,i_+)$ is a positive pair, and $(u,i_-)$ is a negative pair.\n\nIn general, CML uses triplet loss, and MF uses pairwise loss.\n\n## samplers\n\nThe samplers is a module that handles the sampling of mini-batches during training.\n\nThere are two types of sampling done by the sampler.\n\n- Sampling of positive user-item pairs $(u,i_+)$,\n- Sampling of negative items $i_-$.\n\nThe default setting is to sample both with a uniform random probability.\n\nIt is also possible to weigh both positively and negatively.\n\nFor example, if you want to weigh the items by their popularity, you can follow.\n\n```python\nitem_ids, item_popularity = np.unique(train_set[:,1], return_counts=True)\nsampler = samplers.BaseSampler(\n    train_set_torch, neg_weight = item_popularity,\n    n_user, n_item, device=device\n)\n```\n\n## trainers\n\nThe trainers is the module that handles training.\n\nYou can train by setting up a model, optimizer, loss function, and sampler.\n\n## evaluators\n\nThe evaluators is a module for evaluating performance after learning.\n\nYou can evaluate your model as follows.\n\n```python\nfrom PyTorchCML import evaluators\n\n# test set (whose columns are [user_id, item_id, rating].)\ntest_set = np.array([[0, 2, 3], [0, 3, 4], [1, 0, 2], [1, 1, 5]])\ntest_set_torch = torch.LongTensor(test_set).to(device)\n\n# define metrics and evaluator\nscore_function_dict = {\n    \"nDCG\" : evaluators.ndcg,\n    \"MAP\" : evaluators.average_precision,\n    \"Recall\": evaluators.recall\n}\nevaluator = evaluators.UserwiseEvaluator(\n    test_set_torch, \n    score_function_dict, \n    ks=[3,5]\n)\n\n# calc scores\nscores = evaluator.score(model)\n```\n\nThe `test_set` is a three-column NumPy array with user ID, item ID, and rating records.\n\nThe `score_function_dict` is a dictionary of evaluation metrics. Its key is a name, and its value is a function to compute the evaluation metric. The evaluators module implements nDCG@k, MAP@k, and Recall@k as its functions. In this example, those three are set, but you can set any number of evaluation indicators.\n\nThe `evaluator` takes input test data, evaluation metrics, and a list with @k types. \n\nYou can calculate the scores by running the method `.score()` with the model as input.  Its output `scores` will be a single row pandas.DataFrame with each score. In this example, its columns are `[\"nDCG@3\", \"MAP@3\", \"Recall@3\", \"nDCG@5\", \"MAP@5\", \"Recall@5\"]`.\n\nAlso, inputting the evaluator to the `valid_evaluator` argument of the fit method of the trainer will allow you to evaluate the learning progress.\nThis system is helpful for hyperparameter tuning.\n\n```python\nvalid_evaluator = evaluators.UserwiseEvaluator(\n    test_set_torch, # eval set\n    score_function_dict, \n    ks=[3,5]\n)\ntrainer.fit(n_batch=50, n_epoch=15, valid_evaluator = valid_evaluator)\n```\n\n## regularizers\n\nThe regularizers is a module that handles the regularization terms of embedded vectors.\n\nYou can implement the L2 norm, etc., by entering a list of regularizer instances as the argument of the loss function, as shown below.\n\n```python\nfrom PyTorchCML import regularizers\nregs = [regularizers.L2Regularizer(weight=1e-2)]\ncriterion = losses.MinTripletLoss(margin=1, regularizers=regs).to(device)\n```\n\nIt is also possible to introduce multiple regularizations by increasing the length of the list.\n\n## adaptors\n\nThe adaptors is a module for realizing domain adaptation.\n\nDomain adaptation in CML is achieved by adding $L(v_i, \\theta) = \\|f(x_i;\\theta)-v_i\\|^2$ to the loss for feature $x_i$ of item  $i$. The same is true for the user. This allows us to reflect attribute information in the embedding vector.\n\nMLPAdaptor is a class of adaptors that assumes a multilayer perceptron in function $f(x_i;\\theta)$.\n\nYou can set up the adaptor as shown in the code below\n\n```python\nfrom PyTorchCML import adaptors\n\n# item_feature.shape = (n_item, n_feature)\nitem_feature_torch = torch.Tensor(item_feature)\nadaptor = adaptors.MLPAdaptor(\n    item_feature_torch, \n    n_dim=10, \n    n_hidden=[20], \n    weight=1e-4\n)\n\nmodel = models.CollaborativeMetricLearning(\n    n_user, n_item, n_dim, \n    item_adaptor=adaptor\n).to(device)\n```\n\n# Development\n\nBuild develop enviroment as below. \n\n```bash\npip install poetry\npip install poetry-dynamic-versioning\n\npoetry install\npoetry build\n```\n\nFollow the gitflow procedure for development.\n\nDevelop detailed features by deriving feature/xxx branches from the develop branch.\n\nEach time you push, the github workflow will run a unitest.\n\nSend a pull request to the develop branch when a series of feature development is finished.\n\n\n# Citation\n\nYou may use PyTorchCML under MIT License. If you use this program in your research then please cite:\n\n```jsx\n@misc{matsui2021pytorchcml,\n  author = {Ryo, Matsui},\n  title = {PyTorchCML},\n  year = {2021},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {https://github.com/hand10ryo/PyTorchCML}\n}\n```\n\n# References\n\n[1] Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin.Collaborative metric learning. InProceedings of the 26th International Conference on World WideWeb, pp. 193–201, 2017.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhand10ryo%2Fpytorchcml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhand10ryo%2Fpytorchcml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhand10ryo%2Fpytorchcml/lists"}