{"id":29248731,"url":"https://github.com/hkuds/easyrec","last_synced_at":"2025-07-04T00:08:40.378Z","repository":{"id":253645484,"uuid":"844114653","full_name":"HKUDS/EasyRec","owner":"HKUDS","description":"\"EasyRec: Simple yet Effective Language Model for Recommendation\"","archived":false,"fork":false,"pushed_at":"2025-02-21T16:44:48.000Z","size":513,"stargazers_count":107,"open_issues_count":1,"forks_count":13,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-21T17:42:45.267Z","etag":null,"topics":["collaborative-filtering","language-model","large-language-models","recommendation","recommender-systems"],"latest_commit_sha":null,"homepage":"http://arxiv.org/abs/2408.08821","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUDS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-18T12:35:01.000Z","updated_at":"2025-02-21T16:44:52.000Z","dependencies_parsed_at":"2024-12-25T17:41:00.451Z","dependency_job_id":null,"html_url":"https://github.com/HKUDS/EasyRec","commit_stats":null,"previous_names":["hkuds/easyrec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HKUDS/EasyRec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FEasyRec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FEasyRec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FEasyRec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FEasyRec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUDS","download_url":"https://codeload.github.com/HKUDS/EasyRec/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUDS%2FEasyRec/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421916,"owners_count":23464051,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborative-filtering","language-model","large-language-models","recommendation","recommender-systems"],"created_at":"2025-07-04T00:08:38.941Z","updated_at":"2025-07-04T00:08:40.339Z","avatar_url":"https://github.com/HKUDS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EasyRec: Simple yet Effective Language Models for Recommendation\n\nThis is the PyTorch implementation by \u003ca href='https://github.com/Re-bin'\u003e@Re-bin\u003c/a\u003e for EasyRec model proposed in this [paper](https://arxiv.org/abs/2408.08821):\n\n\u003e**EasyRec: Simple yet Effective Language Models for Recommendation**\\\n\u003eXubin Ren, Chao Huang*\n\n\\* denotes corresponding author\n\nIn this paper, we propose an effective language model, *EasyRec*, for recommendation. EasyRec is trained using collaborative information from multiple recommendation datasets, leveraging collaborative user/item profiles as input and employing novel contrastive learning objectives. By encoding user/item profiles into high-quality semantic embeddings suitable for recommendation, EasyRec demonstrates strong performance in text-based zero-shot recommendations and text-enhanced collaborative filtering scenarios.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"EasyRec.png\" alt=\"EasyRec\"/\u003e\n\u003c/p\u003e\n\n## 📝 Environment\nPlease run the following  commands to create a conda environment:\n\n```bash\nconda create -y -n easyrec python=3.11\npip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1\npip install -U \"transformers==4.40.0\" --upgrade\npip install accelerate==0.28.0\npip install tqdm\npip install sentencepiece==0.2.0\npip install scipy==1.9.3\npip install setproctitle\npip install sentence_transformers\n```\n\n## 🚀 Use EasyRec\n\n### Example Codes\nPlease first download the codes.\n```ssh\ngit clone https://github.com/HKUDS/EasyRec.git\ncd EasyRec\n```\n\nHere is an example code snippet to utilize EasyRec for encoding **text embeddings** based on user and item profiles for recommendations.\n\n```Python\nimport torch\nfrom model import Easyrec\nimport torch.nn.functional as F\nfrom transformers import AutoConfig, AutoModel, AutoTokenizer\n\nconfig = AutoConfig.from_pretrained(\"hkuds/easyrec-roberta-large\")\nmodel = Easyrec.from_pretrained(\"hkuds/easyrec-roberta-large\", config=config,)\ntokenizer = AutoTokenizer.from_pretrained(\"hkuds/easyrec-roberta-large\", use_fast=False,)\n\nprofiles = [\n    'This user is a basketball fan and likes to play basketball and watch NBA games.', # user\n    'This basketball draws in NBA enthusiasts.', # item 1\n    'This item is nice for swimming lovers.'     # item 2\n]\n\ninputs = tokenizer(profiles, padding=True, truncation=True, max_length=512, return_tensors=\"pt\")\nwith torch.inference_mode():\n    embeddings = model.encode(input_ids=inputs.input_ids, attention_mask=inputs.attention_mask)\nembeddings = F.normalize(embeddings.pooler_output.detach().float(), dim=-1)\n\nprint(embeddings[0] @ embeddings[1])    # 0.8576\nprint(embeddings[0] @ embeddings[2])    # 0.2171\n```\n### Model List\nWe release a series of EasyRec checkpoints with varying sizes. You can easily load these models from Hugging Face by replacing the model name.\n|              Model              | Model Size | Recall@20 on Amazon-Sports |\n|:-------------------------------|:--------:| :--------:|\n| [hkuds/easyrec-roberta-small](https://huggingface.co/hkuds/easyrec-roberta-small) |  82M  | 0.0286 |\n| [hkuds/easyrec-roberta-base](https://huggingface.co/hkuds/easyrec-roberta-base)   |  125M  | 0.0518  |\n| [hkuds/easyrec-roberta-large](https://huggingface.co/hkuds/easyrec-roberta-large) |  355M  | 0.0557  |\n\n\n## 📚 Datasets with User/Item Profiles (for training and evaluation)\n\nYou can download the data by running following commands.\n```ssh\nwget https://archive.org/download/easyrec_data/data.zip\nunzip data.zip\n```\nYou can also download our data from the [[Google Drive](https://drive.google.com/file/d/1fcAb9UwWHXVTLyK3a_MBOGTBqDp64k0P/view?usp=drive_link)]\n\nWe utilize six datasets for training (`arts`, `movies`, `games`, `home`, `electronics`, `tools`) and three datasets for testing (`sports`, `steam`, `yelp`). The `steam` and `yelp` datasets are processed in accordance with [previous work (RLMRec)](https://github.com/HKUDS/RLMRec), while the others are derived from the [Amazon Review Data v2](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/).\n\nFor the training datasets, the files in the folder follow this structure:\n```\n- arts/electronics/games/home/movies/tools:\n|--- diverse_profile    # Diversified user/item profiles\n|--- prompts            # Prompts to obtain the user/item_profile.json\n|--- user_profile.json  # User profiles\n|--- item_profile.json  # Item profiles\n|--- trn_mat.pkl        # Training Interactions\n|--- val_mat.pkl        # Validation Interactions\n|--- tst_mat.pkl        # Test Interactions (NOT USED)\n```\n\nFor the testing datasets, the files in the folder follow this structure:\n```\n- sports/steam/yelp:\n|--- diverse_profile    # Diversified user/item profiles\n|--- prompts            # Prompts to obtain the user/item_profile.json (only sports)\n|--- user_profile.json  # User profiles\n|--- item_profile.json  # Item profiles\n|--- trn_mat.pkl        # Training Interactions\n|--- val_mat.pkl        # Validation Interactions (NOT USED in text-based recommendation)\n|--- tst_mat.pkl        # Test Interactions\n```\n\n🤗 In the `prompts` folder (only for the datasets we process here), we profile the input prompt for large language models to obtain the collaborative profiles of users and items. The JSON files also contain the original ID (e.g., ASIN) of users/items in the original dataset. For details of the `steam` and `yelp` datasets, please kindly refer to [RLMRec](https://github.com/HKUDS/RLMRec).\n\n## 🚀 Training\nEasyRec follows the same model achitecture of [RoBERTa](https://arxiv.org/abs/1907.11692).\n\nFirst of all, please download the checkpoint of RoBERTa with the following commands:\n```bash\nmkdir baseline_embedders\ncd baseline_embedders\ngit clone https://huggingface.co/FacebookAI/roberta-base\ngit clone https://huggingface.co/FacebookAI/roberta-large\ncd ../\n```\nThen, the command to train different version of EasyRec is as follows:\n\n- EasyRec-Small\n\n    ```bash\n    python create_roberta_small.py # create a small version of RoBERTa\n    sh train_small.sh\n    ```\n\n- EasyRec-Base\n\n    ```bash\n    sh train_base.sh\n    ```\n\n- EasyRec-Large\n\n    ```bash\n    sh train_large.sh\n    ```\n\nDuring training, we utilize multiple GPUs to train EasyRec. You can adjust the number of GPUs in the shell file according to your server configuration. After training, the model parameters will be saved in the `checkpoints` folder.\n\n## 📈 Evaluation\n\nYou can directly utilize the provided checkpoints on Hugging Face for evaluation or use the checkpoints obtained after training. Here, we provide example code for utilizing the checkpoints from Hugging Face.\n\n### Text-based Zero-shot Recommendation\n\nTo evaluate EasyRec for text-based zero-shot recommendation, we conduct experiments on the `sports`, `steam`, and `yelp` datasets. First, run the following commands to encode the text embeddings for user/item profiles under these three datasets (to utilize the checkpoints obtained from your own training, simply change the argument to `--model ./checkpoints/easyrec-roberta-large`):\n\n```bash\npython encode_easyrec.py --model hkuds/easyrec-roberta-large --cuda 0\n```\n\nThen, conduct the evaluation (only the model name for the argument `--model` here):\n\n```bash\npython eval_text_emb.py --model easyrec-roberta-large --cuda 0\n```\n\nSince there are 3 diversified profiles for both user and item in each dataset, we perform the evaluation 1+3 times with the corresponding text embeddings and calculate the mean value as the final result.\n\n### Text-enhanced Collaborative Filtering\n\nWe conduct evaluation on text-enhanced collaborative filtering performance on the `steam` dataset. To evaluate EasyRec, please ensure that you have already encoded text embeddings for the `steam` dataset with the command:\n\n```bash\npython encode_easyrec.py --model hkuds/easyrec-roberta-large --cuda 0\n```\n\nThen, navigate to the `cf_rec` folder by running `cd cf_rec`. After that, execute the following commands to evaluate:\n\n- Base model\n\n    ```bash\n    python run.py --model {model_name} --dataset steam --cuda 0\n    ```\n\n- Text-enhanced model\n\n    ```bash\n    python run.py --model {model_name}_plus --semantic easyrec-roberta-large --dataset steam --cuda 0\n    ```\n\n**Supported Models:** `gccf` and `lightgcn`.\n\n## 🔮 Profile Generation and Diversification\nHere we provide some examples with *Amazon-Arts* Data to conduct user/item profile generation and diversification.\n\nFirstly, we need to complete the following three steps.\n- Install the openai library `pip install openai`\n- Prepare your **OpenAI API Key**\n- Enter your key on `Line 5` of these files: `generation/generate_profile.py` and `generation/diverse_profile.py`.\n\nThen, here are the commands to generate the desired output with examples:\n\n  - **Profile Generation**:\n\n    ```python generation/generate_profile.py```\n\n  - **Profile Diversification**:\n\n    ```python generation/diverse_profile.py```\n\n😀 The **instructions** we designed are saved in the `generation/instruction` folder. You can modify them according to your requirements and generate the desired output!\n\n## 🌟 Citation\nIf you find this work is helpful to your research, please consider citing our paper:\n```bibtex\n@article{ren2024easyrec,\n  title={EasyRec: Simple yet Effective Language Models for Recommendation},\n  author={Ren, Xubin and Huang, Chao},\n  journal={arXiv preprint arXiv:2408.08821},\n  year={2024}\n}\n```\n\n**Thanks for your interest in our work!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Feasyrec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkuds%2Feasyrec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkuds%2Feasyrec/lists"}