{"id":20298102,"url":"https://github.com/zjunlp/OneGen","last_synced_at":"2025-05-07T20:34:18.683Z","repository":{"id":256376200,"uuid":"835195869","full_name":"zjunlp/OneGen","owner":"zjunlp","description":"[EMNLP 2024 Findings] OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs.","archived":false,"fork":false,"pushed_at":"2024-11-13T08:14:59.000Z","size":862,"stargazers_count":137,"open_issues_count":0,"forks_count":15,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-11-13T09:21:26.913Z","etag":null,"topics":["artificial-intelligence","efficient","generation","large-language-models","llm","natural-language-processing","onegen","rag","retrieval","retrieval-augmented-generation","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-29T10:57:56.000Z","updated_at":"2024-11-13T08:15:02.000Z","dependencies_parsed_at":"2024-10-17T21:14:04.605Z","dependency_job_id":"d7a8925a-0558-4b17-92bd-579f91e4e96f","html_url":"https://github.com/zjunlp/OneGen","commit_stats":null,"previous_names":["zjunlp/onegen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOneGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOneGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOneGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOneGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/OneGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252953717,"owners_count":21830890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","efficient","generation","large-language-models","llm","natural-language-processing","onegen","rag","retrieval","retrieval-augmented-generation","transformer"],"created_at":"2024-11-14T16:02:09.559Z","updated_at":"2025-05-07T20:34:18.663Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1 align=\"center\"\u003e 👉 OneGen 👈 \u003c/h1\u003e\n\u003cb\u003eOneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs\u003c/b\u003e\n\n[![Awesome](https://awesome.re/badge.svg)](https://github.com/zjunlp/OneGen) \n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n![](https://img.shields.io/github/last-commit/zjunlp/OneGen?color=green) \n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://drive.google.com/drive/folders/1ByufnAyvsfnrIVJzMwOHql3lYFVy6IJx?usp=drive_link\"\u003e☁️ Google Drive (Data)\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://arxiv.org/abs/2409.05152\"\u003e📄arXiv\u003c/a\u003e •\n  \u003ca href=\"https://x.com/zxlzr/status/1833433788036354523\"\u003e𝕏 Blog\u003c/a\u003e •\n  \u003ca\u003e🌐 Web\u003c/a\u003e\n  \u003cbr\u003e\n  \u003cbr\u003e\n  \u003ca\u003e🤗 HF (Model)👇\u003c/a\u003e •\n  \u003ca\u003e🔭 Model Scope (Model)👇\u003c/a\u003e •\n  \u003ca\u003e🧊 Wise Model (Model)👇\u003c/a\u003e \n\u003c/p\u003e\n\n| 🎯 Task Name      | 🤗 HuggingFace                              | 🔭 ModelScope                               | 🧊 WiseModel                                |\n| -------------- | ---------------------------------------- | ---------------------------------------- | ---------------------------------------- |\n| Entity Linking | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-EntityLinking-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-EntityLinking-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-EntityLinking-Llama2-7B) |\n| Single-hop QA  | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-SelfRAG-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-SelfRAG-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-SelfRAG-Llama2-7B) |\n| Multi-hop QA   | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-MultiHop-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-MultiHop-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-MultiHop-Llama2-7B) |\n\u003c/div\u003e\n\n\n\n\n## Table of Contents\n\n- 📋[TODO](#todo)\n- 👀[Overview](#overview)\n- 🔧[Installation](#installation)\n- 🏃[Quick Start](#quick-start)\n- 🚩[Citation](#citation)\n\n\n## 📋TODO\n\n- [ ] Support LoRA train\n- [ ] Code documentation\n- [ ] Support vLLM inference\n- [ ] Support distributed embedding\n- [ ] Gradio\n\n## 👀Overview\n\nWe introduce a **One**-pass **Gen**eration and retrieval framework (**OneGen**) for fine-tuning LLMs on generation, retrieval, or hybrid tasks. Our core idea is to integrate generation and retrieval to the same context by allocating the retrieval task to *retirval tokens* generated in an autoregressive manner, thus enabling LLM to perform both tasks in a single forward pass.\n\nThe following figure illustrates the training process. We first introduce the concept named `roles of tokens in LLMs`. A token $x_i$ is the basic unit processed by an LLM. Token in the input of an LLM serves three different roles:\n- *Generating next token*, noted as $role(x_i)=\\texttt{GEN}$.\n- *Providing context information*, noted as $role(x_i)=\\texttt{CTX}$.\n- *Representing a sentence*, noted as $role(x_i)=\\texttt{RET}$.\n\nHence, we apply the *cross-entropy loss* for the token $x_i$ where $role(x_i)=\\texttt{GEN}$ and apply the *contrastive loss* for the token $x_i$ where $role(x_i)=\\texttt{RET}$. This is the training overview.\n\n![](./assets/train.jpg)\n\nThe following figure illustrates the inference process of different methods for RAG task. First, we can see both GritLM and OneGen only need to deploy a single model, which can lower the deployment cost. However, GritLM achieves generation and retrieval within a single model by switching back and forth between causal attention and bidirectional attention. Additionally, both GritLM and the Pipeline method require explicit queries, which leads to the need for two forward passes for the queries. In contrast, OneGen can perform retrieval during the generation process, thus **avoiding the two forward pass calculations** for the queries and **allowing for the direct use of kv-cache**, significantly reducing inference costs.\n\n![](./assets/comparison.jpg)\n\n## 🔧Installation\n\n```bash\ngit clone https://github.com/zjunlp/OneGen\ncd OneGen\nconda create -n onegen python=3.9 -y\nconda activate onegen\npip install -r requirements.txt\n```\n\n## 🏃Quick Start\n\n\u003e The inference section focuses on running model predictions to get output results (Single-hop QA is an exception). The evaluation of these results is discussed in the Evaluation section. \n\n### Download the data\n\nDownload `train_data.tar.gz` and `eval_data.tar.gz` from [Google Drive](https://drive.google.com/drive/folders/1ByufnAyvsfnrIVJzMwOHql3lYFVy6IJx?usp=drive_link). After extracting, you will get two folders: `train_data` and `eval_data`. Move these two folders into the `data` directory. Use the following commands to extract the files:\n```bash\ntar -xzvf train_data.tar.gz\ntar -xzvf eval_data.tar.gz\n```\n\nPlease note that the training data we are using is available on Hugging Face, so you do not need to download `train_data.tar.gz`. Just run the training scripts!\n\n### Download the trained model (Optional)\n\n\u003cdetails\u003e \n\u003csummary\u003e\u003cb\u003eDownload the trained model (Optional)\u003c/b\u003e\u003c/summary\u003e \n  \nThe model weights trained on three tasks have been made public and are available for download on three platforms: `🤗Huggingface`, `🔭ModelScope`, and `🧊WiseModel`. For detailed information, please refer to the table below:\n| 🎯 Task Name      | 🤗 HuggingFace                              | 🔭 ModelScope                               | 🧊 WiseModel                                |\n| -------------- | ---------------------------------------- | ---------------------------------------- | ---------------------------------------- |\n| Entity Linking | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-EntityLinking-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-EntityLinking-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-EntityLinking-Llama2-7B) |\n| Single-hop QA  | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-SelfRAG-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-SelfRAG-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-SelfRAG-Llama2-7B) |\n| Multi-hop QA   | [Llama2-7B](https://huggingface.co/zjunlp/OneGen-MultiHop-Llama2-7B) | [Llama2-7B](https://www.modelscope.cn/models/ZJUNLP/OneGen-MultiHop-Llama2-7B) | [Llama2-7B](https://www.wisemodel.cn/models/zjunlp/OneGen-MultiHop-Llama2-7B) |\n\n\u003c/details\u003e\n\n\n\u003e [!NOTE]\n\u003e It is worth noting that for the Entity Linking task, we have pre-stored the entity embeddings. Click [here](https://huggingface.co/zjunlp/OneGenEmbedding/blob/main/OneGen-EntityLinking-Llama2-7B-Embedding.pkl) to download them.\n\n### Training model from scratch (Optional)\n\n\u003cdetails\u003e \n\u003csummary\u003e\u003cb\u003eTraining model from scratch (Optional)\u003c/b\u003e\u003c/summary\u003e\n\nWe provide the training scripts for three tasks. If you are using a locally downloaded model, you can modify the `info-model` field in the `workflow/{task}/{model}.json` file. Update the `model_path` and `tokenizer_path` with the local paths. Note that the hyperparameters in the configuration files are set for 8xA800 GPUs. If you encounter OOM (Out of Memory) issues, please reduce the `per_device_train_batch_size`, `n_pos_per_sent`, `n_neg_per_pos`, and `max_length`.\n\n```bash\n# Entity Linking\ndeepspeed train.py --workflow workflow/entity_linking/llama2.json\n# Single-Hop QA\ndeepspeed train.py --workflow workflow/self_rag/llama2.json\n# Multi-hop QA\ndeepspeed train.py --workflow workflow/multi_hop_qa/llama2.json\n```\n\u003c/details\u003e\n\n### Inference\n\nHere are the inference scripts for the Entity Linking and Multi-hop QA tasks. The inference script for Single-Hop QA is introduced in the next section. You can modify the values of fields such as `model_path`, `tokenizer_path`, `file`, and `output_file_path` in `{config}/{eval_config}/{task}/{config}.json` as needed. \n\nDuring the model inference process, we now support using [Faiss](https://github.com/facebookresearch/faiss) as the vector retrieval engine. You just need to set the `use_faiss` field in the `inference` section of the `config.json` file to true.\n\n```bash\n# Entity Linking (Need GPU)\npython eval.py --config config/eval_config/entity_linking/llama2_wo_pkl.json\n# Multi-hop QA (Need GPU)\npython eval.py --config config/eval_config/multi_hop_qa/llama2.json\n```\n\n\n### Evaluation\n\nBelow are the evaluation scripts for the Entity Linking and Multi-hop QA tasks. `/your/path/to/result.jsonl` is the file saved during the inference stage.\n\n```bash\n# Entity Linking (CPU)\nbash scripts/eval_el.sh el /your/path/to/result.jsonl\n\n# Multi-hop QA for HotpotQA dataset (CPU)\nbash scripts/eval_multi_hop_qa.sh /your/path/to/result.jsonl hotpotqa\n\n# Multi-hop QA for 2WIKI dataset (CPU)\nbash scripts/eval_multi_hop_qa.sh /your/path/to/result.jsonl 2wiki\n```\n\nHere is the evaluation for the Single-Hop QA task, mainly based on [Self-RAG](https://github.com/AkariAsai/self-rag):\n```bash\n# Single-hop QA using Self-RAG (Need GPU)\n# [CUDA_VISIBLE_DEVICES] [MODE] [MODEL_PATH] [SAVE_TAG] [SAVED_DATASET_PATH] [N_DOC] [ENV] [SCORE]\nbash scripts/eval_self_rag.sh 0 always_retrieve /your/path/to/model model_tag saved_rank_path 5 true true\n```\n\n## 🚩Citation\n\nIf this work is helpful, please kindly cite as:\n\n```bibtex\n@inproceedings{EMNLP24_OneGen,\n    title = \"{O}ne{G}en: Efficient One-Pass Unified Generation and Retrieval for {LLM}s\",\n    author = \"Zhang, Jintian  and\n      Peng, Cheng  and\n      Sun, Mengshu  and\n      Chen, Xiang  and\n      Liang, Lei  and\n      Zhang, Zhiqiang  and\n      Zhou, Jun  and\n      Chen, Huajun  and\n      Zhang, Ningyu\",\n    editor = \"Al-Onaizan, Yaser  and\n      Bansal, Mohit  and\n      Chen, Yun-Nung\",\n    booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2024\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.findings-emnlp.237\",\n    pages = \"4088--4119\",\n}\n\n```\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2FOneGen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2FOneGen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2FOneGen/lists"}