{"id":30669569,"url":"https://github.com/snap-research/grid","last_synced_at":"2025-09-01T01:39:29.219Z","repository":{"id":307714558,"uuid":"1003226836","full_name":"snap-research/GRID","owner":"snap-research","description":"GRID: Generative Recommendation with Semantic IDs","archived":false,"fork":false,"pushed_at":"2025-08-27T21:52:31.000Z","size":936,"stargazers_count":240,"open_issues_count":2,"forks_count":35,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-28T05:59:14.521Z","etag":null,"topics":["generative-recommenders","large-language-models","recommender-systems","recsys","semantic-id","sequential-recommendation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2507.22224","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snap-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-16T20:26:51.000Z","updated_at":"2025-08-28T05:22:39.000Z","dependencies_parsed_at":"2025-08-01T19:51:47.562Z","dependency_job_id":null,"html_url":"https://github.com/snap-research/GRID","commit_stats":null,"previous_names":["snap-research/grid"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/snap-research/GRID","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2FGRID","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2FGRID/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2FGRID/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2FGRID/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snap-research","download_url":"https://codeload.github.com/snap-research/GRID/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2FGRID/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273064820,"owners_count":25039264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generative-recommenders","large-language-models","recommender-systems","recsys","semantic-id","sequential-recommendation"],"created_at":"2025-09-01T01:39:28.118Z","updated_at":"2025-09-01T01:39:29.213Z","avatar_url":"https://github.com/snap-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Generative Recommendation with Semantic IDs (GRID)\n[![PyTorch](https://img.shields.io/badge/pytorch-2.0%2B-red)](https://pytorch.org/)\n[![Hydra](https://img.shields.io/badge/config-hydra-89b8cd)](https://hydra.cc/)\n[![Lightning](https://img.shields.io/badge/pytorch-lightning-792ee5)](https://lightning.ai/)\n[![arXiv](https://img.shields.io/badge/arXiv-2507.22224-b31b1b.svg)](https://arxiv.org/abs/2507.22224)\n\n\n**GRID** (Generative Recommendation with Semantic IDs) is a state-of-the-art framework for generative recommendation systems using semantic IDs, developed by a group of scientists and engineers from [Snap Research](https://research.snap.com/team/user-modeling-and-personalization.html). This project implements novel approaches for learning semantic IDs from text embedding and generating recommendations through transformer-based generative models.\n\n## 🚀 Overview\n\nGRID facilitates generative recommendation three overarching steps:\n\n- **Embedding Generation with LLMs**: Converting item text into embeddings using any LLMs available on Huggingface. \n- **Semantic ID Learning**: Converting item embedding into hierarchical semantic IDs using Residual Quantization techniques such as RQ-KMeans, RQ-VAE, RVQ. \n- **Generative Recommendations**: Using transformer architectures to generate recommendation sequences as semantic ID tokens. \n\n\n## 📦 Installation\n\n### Prerequisites\n- Python 3.10+\n- CUDA-compatible GPU (recommended)\n\n### Setup Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/snap-research/GRID.git\ncd GRID\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n## 🎯 Quick Start\n\n### 1. Data Preparation\n\nPrepare your dataset in the expected format:\n```\ndata/\n├── train/       # training sequence of user history \n├── validation/  # validation sequence of user history \n├── test/        # testing sequence of user history \n└── items/       # text of all items in the dataset\n```\n\nWe provide pre-processed Amazon data explored in the [P5 paper](https://arxiv.org/abs/2203.13366) [4]. The data can be downloaded from this [google drive link](https://drive.google.com/file/d/1B5_q_MT3GYxmHLrMK0-lAqgpbAuikKEz/view?usp=sharing).\n\n### 2. Embedding Generation with LLMs\n\nGenerate embeddings from LLMs, which later will be transformed into semantic IDs. \n\n```bash\npython -m src.inference experiment=sem_embeds_inference_flat data_dir=data/amazon_data/beauty # avaiable data includes 'beauty', 'sports', and 'toys'\n```\n\n### 3. Train and Generate Semantic IDs\n\nLearn semantic ID centroids for embeddings generated in step 2:\n\n```bash\npython -m src.train experiment=rkmeans_train_flat \\\n    data_dir=data/amazon_data/beauty \\\n    embedding_path=\u003coutput_path_from_step_2\u003e/merged_predictions_tensor.pt \\ # this can be found in the log dirs in step2\n    embedding_dim=2048 \\ # the model dimension of the LLMs you use in step 2. 2048 for flan-t5-xl as used in this example.\n    num_hierarchies=3 \\  # we train 3 codebooks\n    codebook_width=256 \\ # each codebook has 256 rows of centroids  \n```\n\nGenerate SIDs:\n\n```bash\npython -m src.inference experiment=rkmeans_inference_flat \\\n    data_dir=data/amazon_data/beauty \\\n    embedding_path=\u003coutput_path_from_step_2\u003e/merged_predictions_tensor.pt \\ \n    embedding_dim=2048 \\ \n    num_hierarchies=3 \\  \n    codebook_width=256 \\ \n    ckpt_path=\u003cthe_checkpoint_you_just_get_above\u003e # this can be found in the log dir for training SIDs\n```\n\n\n### 4. Train Generative Recommendation Model with Semantic IDs\n\nTrain the recommendation model using the learned semantic IDs:\n\n```bash\npython -m src.train experiment=tiger_train_flat \\\n    data_dir=data/amazon_data/beauty \\ \n    semantic_id_path=\u003coutput_path_from_step_3\u003e/pickle/merged_predictions_tensor.pt \\\n    num_hierarchies=4 # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.\n```\n\n### 4. Generate Recommendations\n\nRun inference to generate recommendations:\n\n```bash\npython -m src.inference experiment=tiger_inference_flat \\\n    data_dir=data/amazon_data/beauty \\ \n    semantic_id_path=\u003coutput_path_from_step_3\u003e/pickle/merged_predictions_tensor.pt \\\n    ckpt_path=\u003cthe_checkpoint_you_just_get_above\u003e \\ # this can be found in the log dir for training GR models\n    num_hierarchies=4 \\ # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.\n```\n\n## Supported Models:\n\n### Semantic ID:\n\n1. Residual K-means proposed in One-Rec [2]\n2. Residual Vector Quantization\n3. Residual Quantization with Variational Autoencoder [3]\n\n### Generative Recommendation:\n\n1. TIGER [1]\n\n## 📚 Citation\n\nIf you use GRID in your research, please cite:\n\n```bibtex\n@inproceedings{grid,\n  title     = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},\n  author    = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},\n  booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM)},\n  year      = {2025}\n}\n```\n\n## 🤝 Acknowledgments\n\n- Built with [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/)\n- Configuration management by [Hydra](https://hydra.cc/)\n- Inspired by recent advances in generative AI and recommendation systems\n- Part of this repo is built on top of https://github.com/ashleve/lightning-hydra-template\n\n## 📞 Contact\n\nFor questions and support:\n- Create an issue on GitHub\n- Contact the development team: Clark Mingxuan Ju (mju@snap.com), Liam Collins (lcollins2@snap.com), and Leonardo Neves (lneves@snap.com).\n\n## Bibliography \n\n[1] Rajput, Shashank, et al. \"Recommender systems with generative retrieval.\" Advances in Neural Information Processing Systems 36 (2023): 10299-10315.\n\n[2] Deng, Jiaxin, et al. \"Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.\" arXiv preprint arXiv:2502.18965 (2025).\n\n[3] Lee, Doyup, et al. \"Autoregressive image generation using residual quantization.\" Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.\n\n[4] Geng, Shijie, et al. \"Recommendation as language processing (rlp): A unified pretrain, personalized prompt \u0026 predict paradigm (p5).\" Proceedings of the 16th ACM conference on recommender systems. 2022.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Fgrid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnap-research%2Fgrid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Fgrid/lists"}