{"id":29248703,"url":"https://github.com/hkuds/llmrec","last_synced_at":"2025-07-04T00:08:34.576Z","repository":{"id":203052309,"uuid":"708601315","full_name":"HKUDS/LLMRec","owner":"HKUDS","description":"[WSDM'2024 Oral] \"LLMRec: Large Language Models with Graph Augmentation for Recommendation\"","archived":false,"fork":false,"pushed_at":"2024-06-10T06:39:28.000Z","size":8745,"stargazers_count":454,"open_issues_count":16,"forks_count":56,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-08T08:40:54.025Z","etag":null,"topics":["colloborative-filtering","content-based-recommendation","data-augmentation-strategies","graph-augmentation","graph-learning","multi-modal-recommendation","recommendation-system","recommendation-with-side-information"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.00423","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUDS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-23T01:56:37.000Z","updated_at":"2025-05-01T13:31:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"293c2247-49de-4f1c-bf1b-9594beb4a1b4","html_url":"https://github.com/HKUDS/LLMRec","commit_stats":null,"previous_names":["hkuds/llmrec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HKUDS/LLMRec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FLLMRec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FLLMRec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FLLMRec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FLLMRec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUDS","download_url":"https://codeload.github.com/HKUDS/LLMRec/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FLLMRec/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421922,"owners_count":23464051,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colloborative-filtering","content-based-recommendation","data-augmentation-strategies","graph-augmentation","graph-learning","multi-modal-recommendation","recommendation-system","recommendation-with-side-information"],"created_at":"2025-07-04T00:08:33.618Z","updated_at":"2025-07-04T00:08:34.447Z","avatar_url":"https://github.com/HKUDS.png","language":"Python","readme":"# LLMRec: Large Language Models with Graph Augmentation for Recommendation\n\n\u003cimg src='image/LLMRec.png' /\u003e\n\nPyTorch implementation for WSDM 2024 paper [LLMRec: Large Language Models with Graph Augmentation for Recommendation](https://arxiv.org/pdf/2311.00423.pdf).\n\n\n\n[Wei Wei](#), [Xubin Ren](https://rxubin.com/), [Jiabin Tang](https://tjb-tech.github.io/), [Qingyong Wang](#), [Lixin Su](#), [Suqi Cheng](#), [Junfeng Wang](#), [Dawei Yin](https://www.yindawei.com/) and [Chao Huang](https://sites.google.com/view/chaoh/home)*.\n(*Correspondence)\n\n**[Data Intelligence Lab](https://sites.google.com/view/chaoh/home)@[University of Hong Kong](https://www.hku.hk/)**, Baidu Inc.\n\n\u003ca href='https://llmrec.github.io/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\u003ca href='https://llmrec.github.io/'\u003e\u003cimg src='https://img.shields.io/badge/Demo-Page-purple'\u003e\u003c/a\u003e\n\u003ca href='https://arxiv.org/pdf/2311.00423.pdf'\u003e\u003cimg src='https://img.shields.io/badge/Paper-PDF-orange'\u003e\u003c/a\u003e \n[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/channel/UC1wKlPPlP9zKGYk62yR0K_g)\n\n\nThis repository hosts the code, original data and augmented data of **LLMRec**.\n\n-----------\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./image/llmrec_framework.png\" alt=\"LLMRec\" /\u003e\n\u003c/p\u003e\n\nLLMRec is a novel framework that enhances recommenders by applying three simple yet effective LLM-based graph augmentation strategies to recommendation system. LLMRec is to make the most of the content within online platforms (e.g., Netflix, MovieLens) to augment interaction graph by i) reinforcing u-i interactive edges, ii) enhancing item node attributes, and iii) conducting user node profiling, intuitively from the natural language perspective.\n\n-----------\n\n## 🎉 News 📢📢  \n\n- [x] [2024.3.20] 🚀🚀 📢📢📢📢🌹🔥🔥🚀🚀 Because baselines `LATTICE` and `MMSSL` require some minor modifications, we provide code that can be easily run by simply modifying the dataset path.\n\n- [x] [2023.11.3] 🚀🚀 Release the script for constructing the prompt.\n\n- [x] [2023.11.1] 🔥🔥 Release the multi-modal datasets (Netflix, MovieLens), including textual data and visual data.\n\n- [x] [2023.11.1] 🚀🚀 Release LLM-augmented textual data(by gpt-3.5-turbo-0613), and LLM-augmented embedding(by text-embedding-ada-002).\n\n- [x] [2023.10.28] 🔥🔥 The full paper of our LLMRec is available at [LLMRec: Large Language Models with Graph Augmentation for Recommendation](https://arxiv.org/pdf/2311.00423.pdf).\n\n- [x] [2023.10.28] 🚀🚀 Release the code of LLMRec.\n\n\n## 👉 TODO \n\n- [ ] Provide different larger version of the datasets.\n- [ ] ...\n\n\n-----------\n\n\u003ch2\u003e Dependencies \u003c/h2\u003e\n\n```\npip install -r requirements.txt\n```\n\n\n\u003ch2\u003eUsage \u003c/h2\u003e\n\n\u003ch4\u003eStage 1: LLM-based Data Augmentation\u003c/h4\u003e\n\n```\ncd LLMRec/LLM_augmentation/\npython ./gpt_ui_aug.py\npython ./gpt_user_profiling.py\npython ./gpt_i_attribute_generate_aug.py\n```\n\n\n\n\n\u003ch4\u003eStage 2: Recommender training with LLM-augmented Data\u003c/h4\u003e\n\n```\ncd LLMRec/\npython ./main.py --dataset {DATASET}\n```\nSupported datasets:  `netflix`, `movielens`\n\nSpecific code execution example on 'netflix':\n```\n# LLMRec\npython ./main.py\n\n# w/o-u-i\npython ./main.py --aug_sample_rate=0.0\n\n# w/o-u\npython ./main.py --user_cat_rate=0\n\n# w/o-u\u0026i\npython ./main.py --user_cat_rate=0  --item_cat_rate=0\n\n# w/o-prune\npython ./main.py --prune_loss_drop_rate=0\n```\n\n\n\n\n\n-----------\n\n\n\u003ch2\u003e Datasets \u003c/h2\u003e\n\n  ```\n  ├─ LLMRec/ \n      ├── data/\n        ├── netflix/\n        ...\n  ```\n\n\u003ch3\u003e Multi-modal Datasets \u003c/h3\u003e\n🌹🌹 Please cite our paper if you use the 'netflix' dataset~ ❤️  \n\nWe collected a multi-modal dataset using the original [Netflix Prize Data](https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data) released on the [Kaggle](https://www.kaggle.com/) website. The data format is directly compatible with state-of-the-art multi-modal recommendation models like [LLMRec](https://github.com/HKUDS/LLMRec), [MMSSL](https://github.com/HKUDS/MMSSL), [LATTICE](https://github.com/CRIPAC-DIG/LATTICE), [MICRO](https://github.com/CRIPAC-DIG/MICRO), and others, without requiring any additional data preprocessing.\n\n `Textual Modality:` We have released the item information curated from the original dataset in the \"item_attribute.csv\" file. Additionally, we have incorporated textual information enhanced by LLM into the \"augmented_item_attribute_agg.csv\" file. (The following three images represent (1) information about Netflix as described on the Kaggle website, (2) textual information from the original Netflix Prize Data, and (3) textual information augmented by LLMs.)\n\u003cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\"\u003e\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n   \u003cimg src=\"./image/textual_data1.png\" alt=\"Image 1\" style=\"width:270px;height:180px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data' on Kaggle.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n    \u003cimg src=\"./image/textual_data2.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data'.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n    \u003cimg src=\"./image/textual_data3.png\" alt=\"Image 2\" style=\"width:270px;height:180px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eLLM-augmented textual data.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e  \n\u003c/div\u003e\n \n `Visual Modality:` We have released the visual information obtained from web crawling in the \"Netflix_Posters\" folder. (The following image displays the poster acquired by web crawling using item information from the Netflix Prize Data.)\n \u003cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\"\u003e\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n   \u003cimg src=\"./image/visiual_data1.png\" alt=\"Image 1\" style=\"width:690px;height:590px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data' on Kaggle.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\u003c/div\u003e\n \n\n\u003ch3\u003e Original Multi-modal Datasets \u0026 Augmented Datasets \u003c/h3\u003e\n \u003cdiv style=\"display: flex; justify-content: center; align-items: flex-start;\"\u003e\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n   \u003cimg src=\"./image/datasets.png\" alt=\"Image 1\" style=\"width:480px;height:270px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data' on Kaggle.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\u003c/div\u003e\n\n\n\u003cbr\u003e\n\u003cp\u003e\n\n\u003ch3\u003e Download the Netflix dataset. \u003c/h3\u003e\n🚀🚀\nWe provide the processed data (i.e., CF training data \u0026 basic user-item interactions, original multi-modal data including images and text of items, encoded visual/textual features and LLM-augmented text/embeddings).  🌹 We hope to contribute to our community and facilitate your research 🚀🚀 ~\n\n- `netflix`: [Google Drive Netflix](https://drive.google.com/drive/folders/1BGKm3nO4xzhyi_mpKJWcfxgi3sQ2j_Ec?usp=drive_link).  [🌟(Image\u0026Text)](https://drive.google.com/file/d/1euAnMYD1JBPflx0M86O2M9OsbBSfrzPK/view?usp=drive_link)\n\n\n\n\u003ch3\u003e Encoding the Multi-modal Content. \u003c/h3\u003e\n\nWe use [CLIP-ViT](https://huggingface.co/openai/clip-vit-base-patch32) and [Sentence-BERT](https://www.sbert.net/) separately as encoders for visual side information and textual side information.\n\n\n\n\n-----------\n\n\n\n\u003ch2\u003e Prompt \u0026 Completion Example \u003c/h2\u003e\n\u003ch4\u003e LLM-based Implicit Feedback Augmentation \u003c/h4\u003e\n\n\u003e Prompt \n\u003e\u003e Recommend user with movies based on user history  that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms(2002), Comedy|Drama|Romance Candidate: [121]The Vampire Lovers (1970), Horror [155] Billabong Odyssey (2003),Documentary [248]The Invisible Guest 2016, Crime, Drama, Mystery   Output index of user's favorite and dislike movie from candidate.Please just give the index in [].\n\n\u003e Completion\n\u003e\u003e 248   121\n\n\u003ch4\u003e LLM-based User Profile Augmentation \u003c/h4\u003e\n\n\u003e Prompt \n\u003e\u003e Generate user profile based on the history of user, that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms (2002), Comedy|Drama|Romance  Please output the following infomation of user, output format: {age: , gender: , liked genre: , disliked genre: , liked directors: , country: , language: }\n\n\u003e Completion\n\u003e\u003e {age: 50, gender: female, liked genre: Comedy|Fantasy, Comedy|Drama|Romance, disliked genre: Thriller, Horror, liked directors: Ron Underwood, country: Canada, United States, language: English}\n\n\n\u003ch4\u003e LLM-based Item Attributes Augmentation \u003c/h4\u003e\n\n\u003e Prompt \n\u003e\u003e Provide the inquired information of the given movie. [332] Heart and Souls (1993), Comedy|Fantasy The inquired information is: director, country, language. And please output them in form of: director, country, language \n\n\u003e Completion\n\u003e\u003e Ron Underwood, USA, English\n\n\n\n\u003ch2\u003e Augmented Data \u003c/h2\u003e\n\n\u003ch4\u003e Augmented Implicit Feedback (Edge) \u003c/h4\u003e\nFor each user, 0 represents a positive sample, and 1 represents a negative sample.\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n    \u003cimg src=\"./image/u_i.png\" alt=\"Image 2\" style=\"width:150px;height:310px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data'.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\n\n\u003ch4\u003e Augmented User Profile (User Node) \u003c/h4\u003e\nFor each user, the dictionary stores augmented information such as 'age,' 'gender,' 'liked genre,' 'disliked genre,' 'liked directors,' 'country,' and 'language.'\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n    \u003cimg src=\"./image/u.png\" alt=\"Image 2\" style=\"width:900px;height:700px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data'.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\n\n##### Augmented item attribute\nFor each item, the dictionary stores augmented information such as 'director,' 'country,' and 'language.'\n  \u003cfigure style=\"text-align: center; margin: 10px;\"\u003e\n    \u003cimg src=\"./image/i.png\" alt=\"Image 2\" style=\"width:500px;height:660px;\"\u003e\n\u003c!--     \u003cfigcaption\u003eTextual data in original 'Netflix Prize Data'.\u003c/figcaption\u003e --\u003e\n  \u003c/figure\u003e\n\n\n\n\n\u003ch2\u003e Candidate Preparing for LLM-based Implicit Feedback Augmentation\u003c/h2\u003e\n\n step 1: select base model such as MMSSL or LATTICE\n \n step 2: obtain user embedding and item embedding\n \n step 3: generate candidate\n```\n      _, candidate_indices = torch.topk(torch.mm(G_ua_embeddings, G_ia_embeddings.T), k=10)  \n      pickle.dump(candidate_indices.cpu(), open('./data/' + args.datasets +  '/candidate_indices','wb'))\n```\nExample of specific candidate data.\n```\nIn [3]: candidate_indices\nOut[3]: \ntensor([[ 9765,  2930,  6646,  ..., 11513, 12747, 13503],\n        [ 3665,  8999,  2587,  ...,  1559,  2975,  3759],\n        [ 2266,  8999,  1559,  ...,  8639,   465,  8287],\n        ...,\n        [11905, 10195,  8063,  ..., 12945, 12568, 10428],\n        [ 9063,  6736,  6938,  ...,  5526, 12747, 11110],\n        [ 9584,  4163,  4154,  ...,  2266,   543,  7610]])\n\nIn [4]: candidate_indices.shape\nOut[4]: torch.Size([13187, 10])\n```\n\n\n\n\n\n-----------\n\n\u003ch1\u003e Citing \u003c/h1\u003e\n\nIf you find this work helpful to your research, please kindly consider citing our paper.\n\n\n```\n@article{wei2023llmrec,\n  title={LLMRec: Large Language Models with Graph Augmentation for Recommendation},\n  author={Wei, Wei and Ren, Xubin and Tang, Jiabin and Wang, Qinyong and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},\n  journal={arXiv preprint arXiv:2311.00423},\n  year={2023}\n}\n```\n\n\n## Acknowledgement\n\nThe structure of this code is largely based on [MMSSL](https://github.com/HKUDS/MMSSL), [LATTICE](https://github.com/CRIPAC-DIG/LATTICE), [MICRO](https://github.com/CRIPAC-DIG/MICRO). Thank them for their work.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fllmrec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkuds%2Fllmrec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Fllmrec/lists"}