https://github.com/snap-research/grid
GRID: Generative Recommendation with Semantic IDs
https://github.com/snap-research/grid
generative-recommenders large-language-models recommender-systems recsys semantic-id sequential-recommendation
Last synced: about 1 month ago
JSON representation
GRID: Generative Recommendation with Semantic IDs
- Host: GitHub
- URL: https://github.com/snap-research/grid
- Owner: snap-research
- License: other
- Created: 2025-06-16T20:26:51.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-08-27T21:52:31.000Z (about 1 month ago)
- Last Synced: 2025-08-28T05:59:14.521Z (about 1 month ago)
- Topics: generative-recommenders, large-language-models, recommender-systems, recsys, semantic-id, sequential-recommendation
- Language: Python
- Homepage: https://arxiv.org/abs/2507.22224
- Size: 914 KB
- Stars: 240
- Watchers: 3
- Forks: 35
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Generative Recommendation with Semantic IDs (GRID)
[](https://pytorch.org/)
[](https://hydra.cc/)
[](https://lightning.ai/)
[](https://arxiv.org/abs/2507.22224)**GRID** (Generative Recommendation with Semantic IDs) is a state-of-the-art framework for generative recommendation systems using semantic IDs, developed by a group of scientists and engineers from [Snap Research](https://research.snap.com/team/user-modeling-and-personalization.html). This project implements novel approaches for learning semantic IDs from text embedding and generating recommendations through transformer-based generative models.
## 🚀 Overview
GRID facilitates generative recommendation three overarching steps:
- **Embedding Generation with LLMs**: Converting item text into embeddings using any LLMs available on Huggingface.
- **Semantic ID Learning**: Converting item embedding into hierarchical semantic IDs using Residual Quantization techniques such as RQ-KMeans, RQ-VAE, RVQ.
- **Generative Recommendations**: Using transformer architectures to generate recommendation sequences as semantic ID tokens.## 📦 Installation
### Prerequisites
- Python 3.10+
- CUDA-compatible GPU (recommended)### Setup Environment
```bash
# Clone the repository
git clone https://github.com/snap-research/GRID.git
cd GRID# Install dependencies
pip install -r requirements.txt
```## 🎯 Quick Start
### 1. Data Preparation
Prepare your dataset in the expected format:
```
data/
├── train/ # training sequence of user history
├── validation/ # validation sequence of user history
├── test/ # testing sequence of user history
└── items/ # text of all items in the dataset
```We provide pre-processed Amazon data explored in the [P5 paper](https://arxiv.org/abs/2203.13366) [4]. The data can be downloaded from this [google drive link](https://drive.google.com/file/d/1B5_q_MT3GYxmHLrMK0-lAqgpbAuikKEz/view?usp=sharing).
### 2. Embedding Generation with LLMs
Generate embeddings from LLMs, which later will be transformed into semantic IDs.
```bash
python -m src.inference experiment=sem_embeds_inference_flat data_dir=data/amazon_data/beauty # avaiable data includes 'beauty', 'sports', and 'toys'
```### 3. Train and Generate Semantic IDs
Learn semantic ID centroids for embeddings generated in step 2:
```bash
python -m src.train experiment=rkmeans_train_flat \
data_dir=data/amazon_data/beauty \
embedding_path=/merged_predictions_tensor.pt \ # this can be found in the log dirs in step2
embedding_dim=2048 \ # the model dimension of the LLMs you use in step 2. 2048 for flan-t5-xl as used in this example.
num_hierarchies=3 \ # we train 3 codebooks
codebook_width=256 \ # each codebook has 256 rows of centroids
```Generate SIDs:
```bash
python -m src.inference experiment=rkmeans_inference_flat \
data_dir=data/amazon_data/beauty \
embedding_path=/merged_predictions_tensor.pt \
embedding_dim=2048 \
num_hierarchies=3 \
codebook_width=256 \
ckpt_path= # this can be found in the log dir for training SIDs
```### 4. Train Generative Recommendation Model with Semantic IDs
Train the recommendation model using the learned semantic IDs:
```bash
python -m src.train experiment=tiger_train_flat \
data_dir=data/amazon_data/beauty \
semantic_id_path=/pickle/merged_predictions_tensor.pt \
num_hierarchies=4 # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.
```### 4. Generate Recommendations
Run inference to generate recommendations:
```bash
python -m src.inference experiment=tiger_inference_flat \
data_dir=data/amazon_data/beauty \
semantic_id_path=/pickle/merged_predictions_tensor.pt \
ckpt_path= \ # this can be found in the log dir for training GR models
num_hierarchies=4 \ # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.
```## Supported Models:
### Semantic ID:
1. Residual K-means proposed in One-Rec [2]
2. Residual Vector Quantization
3. Residual Quantization with Variational Autoencoder [3]### Generative Recommendation:
1. TIGER [1]
## 📚 Citation
If you use GRID in your research, please cite:
```bibtex
@inproceedings{grid,
title = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},
author = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},
booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM)},
year = {2025}
}
```## 🤝 Acknowledgments
- Built with [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/)
- Configuration management by [Hydra](https://hydra.cc/)
- Inspired by recent advances in generative AI and recommendation systems
- Part of this repo is built on top of https://github.com/ashleve/lightning-hydra-template## 📞 Contact
For questions and support:
- Create an issue on GitHub
- Contact the development team: Clark Mingxuan Ju (mju@snap.com), Liam Collins (lcollins2@snap.com), and Leonardo Neves (lneves@snap.com).## Bibliography
[1] Rajput, Shashank, et al. "Recommender systems with generative retrieval." Advances in Neural Information Processing Systems 36 (2023): 10299-10315.
[2] Deng, Jiaxin, et al. "Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment." arXiv preprint arXiv:2502.18965 (2025).
[3] Lee, Doyup, et al. "Autoregressive image generation using residual quantization." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
[4] Geng, Shijie, et al. "Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5)." Proceedings of the 16th ACM conference on recommender systems. 2022.