{"id":29688631,"url":"https://github.com/openmoss/longllada","last_synced_at":"2025-07-23T05:37:56.234Z","repository":{"id":300742497,"uuid":"1005296521","full_name":"OpenMOSS/LongLLaDA","owner":"OpenMOSS","description":"LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs","archived":false,"fork":false,"pushed_at":"2025-07-22T07:29:32.000Z","size":1732,"stargazers_count":22,"open_issues_count":0,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-22T09:37:35.387Z","etag":null,"topics":["diffusion","diffusion-language-models","large-language-models","length-extrapolation","long-context"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2506.14429","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-20T02:22:57.000Z","updated_at":"2025-07-22T07:29:35.000Z","dependencies_parsed_at":"2025-06-23T11:37:20.920Z","dependency_job_id":"c74d37f0-b58a-4fa0-afc1-f70eaa523e65","html_url":"https://github.com/OpenMOSS/LongLLaDA","commit_stats":null,"previous_names":["openmoss/longllada"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMOSS/LongLLaDA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongLLaDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongLLaDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongLLaDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongLLaDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMOSS","download_url":"https://codeload.github.com/OpenMOSS/LongLLaDA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongLLaDA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266625266,"owners_count":23958305,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","diffusion-language-models","large-language-models","length-extrapolation","long-context"],"created_at":"2025-07-23T05:37:55.586Z","updated_at":"2025-07-23T05:37:56.220Z","avatar_url":"https://github.com/OpenMOSS.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eLongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs\u003c/h1\u003e\nXiaoran Liu\u003csup\u003e1,2\u003c/sup\u003e, Yuerong Song\u003csup\u003e1,2\u003c/sup\u003e, Zhigeng Liu\u003csup\u003e1,2\u003c/sup\u003e, Zengfeng Huang\u003csup\u003e1,2\u003c/sup\u003e, Qipeng Guo\u003csup\u003e2,3\u003c/sup\u003e, Ziwei He\u003csup\u003e2,†\u003c/sup\u003e, Xipeng Qiu\u003csup\u003e1,2,†\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003e Fudan Univerisity, \u003csup\u003e2\u003c/sup\u003eShanghai Innovation Institute, \u003csup\u003e3\u003c/sup\u003eShanghai AI Laboratory\n\n[\u003ca href=\"https://arxiv.org/abs/2506.14429\"\u003e📝 Paper\u003c/a\u003e] | [\u003ca href=\"https://huggingface.co/papers/2506.14429\"\u003e🤗 HF\u003c/a\u003e] | [\u003ca href=\"https://github.com/OpenMOSS/LongLLaDA\"\u003e🚀 Code\u003c/a\u003e]\n\u003c/div\u003e\n\n## Introduction\n\nIn this work, we present the first systematic investigation comparing the long-context performance of diffusion LLMs and traditional auto-regressive LLMs. We first identify a unique characteristic of diffusion LLMs, unlike auto-regressive LLMs, they maintain remarkably ***stable perplexity*** during direct context extrapolation. \n\nMoreover, where auto-regressive models fail outright during the Needle-In-A-Haystack task with context exceeding their pretrained length, we discover diffusion LLMs exhibit a distinct ***local perception*** phenomenon, enabling successful retrieval from recent context segments. We explain both phenomena through the lens of Rotary Position Embedding (RoPE) scaling theory. \n\nBuilding on these observations, we propose ***LongLLaDA***, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation. Our results validate that established extrapolation scaling laws remain effective for extending the context windows of diffusion LLMs. \n\nFurthermore, we identify long-context tasks where diffusion LLMs outperform auto-regressive LLMs and others where they fall short. Consequently, this study establishes ***the first length extrapolation method for diffusion LLMs*** while providing essential theoretical insights and empirical benchmarks critical for advancing future research on long-context diffusion LLMs. This is the official implementation of LongLLaDA. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./img/intro.png\" width=\"750\"/\u003e\n\u003cp\u003e\n\n## Installation\n\n### Prepare Your OpenCompass\n\nWe run our downstream evaluation based on [OpenCompass](https://github.com/open-compass/opencompass).\n\n```bash\ngit clone https://github.com/open-compass/opencompass\ncd opencompass\npip install -e .\n```\n\nThe necessary Python packages we use and their corresponding versions.\n\n```\nflash-attn==2.7.4.post1\ntorch==2.6.0\ntransformers==4.46.3\nopencompass==0.4.2\n```\n\n### Prepare Your Model\n\nCopy the folder `LongLLaDA/llada/` to `opencompass/models/` and add the following line to the end of `opencompass/models/__init__.py`.\n\n```python\nfrom .llada.llada_wrapper import LLaDACausalLM\n```\n\n## Evaluation\n\nCopy the folder `LongLLaDA/eval/` to your OpenCompass directory and then you can try the following evaluations.\n\n### Needle-In-A-Haystack (NIAH) evaluation\n\n1. Add a NIAH evaluation script with customizable context length and depth. Copy `LongLLaDA/needlebench/needlebench` to `opencompass/configs/datasets/needlebench` and replace `opencompass/configs/summarizers/needlebench.py` with `LongLLaDA/needlebench/needlebench.py`.\n\n2. Edit the prompt format of the RULER benchmark to enable the base model to respond more effectively by replacing `opencompass/datasets/needlebench/origin.py` with `LongLLaDA/needlebench/origin.py`.\n\n3. You can also modify the plotting code in `opencompass/summarizers/needlebench.py` as shown in `LongLLaDA/needlebench/needlebench_summarizer.py`, which is optional.\n\n4. Execute the following command.\n\n```bash\npython run.py eval/eval_llada_niah.py --dump-eval-details -r\n```\n\n### LongBench evaluation\n\n1. Execute the following command.\n\n```bash\npython run.py eval/eval_llada_long.py --dump-eval-details -r\n```\n\n### RULER evaluation\n\n1. Edit the prompt format of the RULER benchmark to enable the base model to respond more effectively. In `ruler_cwe_gen.py`, `ruler_fwe_gen.py`, `ruler_niah_gen.py`, `ruler_qa_gen.py`, `ruler_vt_gen.py` under the path `opencompass/configs/datasets/ruler/`, comment out the '\\n' at the end of the prompt. The following is an example in `opencompass/configs/datasets/ruler/ruler_vt_gen.py`.\n\n```python\nvt_datasets = [\n    {\n        'abbr': 'ruler_vt',\n        'type': RulerVtDataset,\n        'num_chains': 1,\n        'num_hops': 4,\n        'reader_cfg': dict(input_columns=['prompt'], output_column='answer'),\n        'infer_cfg': dict(\n            prompt_template=dict(\n                type=PromptTemplate,\n                template=dict(\n                    round=[\n                        dict(role='HUMAN', prompt='{prompt}'),\n                        # dict(role='BOT', prompt='{answer}\\n'),    # comment out this line\n                    ]\n                ),\n            ),\n            retriever=dict(type=ZeroRetriever),\n            inferencer=dict(type=GenInferencer),\n        ),\n        'eval_cfg': dict(\n            evaluator=dict(type=RulerVtEvaluator),\n        ),\n    }\n]\n```\n\n2. Execute the following command.\n\n```bash\npython run.py eval/eval_llada_ruler.py --dump-eval-details -r\n```\n\n### Perplexity (PPL) Evaluation\n\n\u003e We calculate the perplexity in LongLLaDA directory instead of OpenCompass as follows.\n\n1. Execute the following command to get the perplexity curve of LLaMA3.\n\n```bash\npython ppl/get_ppl_llama.py \n```\n\n2. Execute the following command to get the perplexity curve of LLaDA with block_size=64 for efficiency.\n\n```bash\npython ppl/get_ppl_llada.py \n```\n\n3. Organize the related results and execute the following command to get Figure 1 in our paper.\n\n```bash\npython ppl/get_ppl_plot.py \n```\n\n## Results\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./img/direct_extra.png\" width=\"750\"/\u003e\n\u003cp\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./img/ntk_extra.png\" width=\"750\"/\u003e\n\u003cp\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./img/ruler.png\" width=\"750\"/\u003e\n\u003cp\u003e\n\n## Citation\n\n```\n@article{liu2025longllada,\n  title={LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs},\n  author={Liu, Xiaoran and Liu, Zhigeng and Huang, Zengfeng and Guo, Qipeng and He, Ziwei and Qiu, Xipeng},\n  journal={arXiv preprint arXiv:2506.14429},\n  year={2025}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flongllada","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmoss%2Flongllada","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flongllada/lists"}