{"id":28702270,"url":"https://github.com/modelscope/nexus-gen","last_synced_at":"2025-06-14T12:32:17.432Z","repository":{"id":290485292,"uuid":"973457298","full_name":"modelscope/Nexus-Gen","owner":"modelscope","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-27T05:43:48.000Z","size":5015,"stargazers_count":199,"open_issues_count":8,"forks_count":11,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-27T06:34:21.591Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-27T03:06:36.000Z","updated_at":"2025-05-27T05:43:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"d1286a47-6698-4ec6-8967-4623b6dab2cd","html_url":"https://github.com/modelscope/Nexus-Gen","commit_stats":null,"previous_names":["modelscope/nexus-gen"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/modelscope/Nexus-Gen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FNexus-Gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FNexus-Gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FNexus-Gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FNexus-Gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/Nexus-Gen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FNexus-Gen/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259816186,"owners_count":22915828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-14T12:30:57.205Z","updated_at":"2025-06-14T12:32:17.427Z","avatar_url":"https://github.com/modelscope.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"assets/logo.jpg\"/\u003e\n    \u003cbr\u003e\n\u003cp\u003e\n\u003ch1 align=\"center\"\u003eNexus-Gen: A Unified Model for Image Understanding, Generation, and Editing\u003c/h1\u003e\n \n\u003cdiv align=\"center\"\u003e\n\n  \u003ca href=\"http://arxiv.org/abs/2504.21356\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Tech%20Report\u0026message=Arxiv\u0026color=red\"\u003e\u003c/a\u003e \u0026ensp;\n  \u003ca href=\"https://www.modelscope.cn/models/DiffSynth-Studio/Nexus-Gen\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Model\u0026message=ModelScope\u0026color=blue\"\u003e\u003c/a\u003e \u0026ensp;\n  \u003ca href=\"https://huggingface.co/modelscope/Nexus-Gen\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Model\u0026message=HuggingFace\u0026color=yellow\"\u003e\u003c/a\u003e \u0026ensp;\n  \u003ca href=\"https://www.modelscope.cn/studios/DiffSynth-Studio/Nexus-Gen\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Online%20Demo\u0026message=ModeScope\u0026color=green\"\u003e\u003c/a\u003e \u0026ensp;\n\n\u003c/div\u003e\n\n## News\n- **May 27, 2025**: We fine-tuned Nexus-Gen using the [BLIP-3o-60k](https://huggingface.co/datasets/BLIP3o/BLIP3o-60k) dataset, significantly improving the model's robustness to text prompts in image generation, **achieving a GenEval score of 0.79**. The [model checkpoints](https://www.modelscope.cn/models/DiffSynth-Studio/Nexus-Gen) have been updated.\n\n## What is Nexus-Gen\nNexus-Gen is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual-phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high-fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position-embedded special tokens instead of continuous embeddings. Through dual-phase training, Nexus-Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.\n![cover](assets/illustrations/gen_edit.jpg)\n![architecture](assets/illustrations/architecture.png)\n\n## Getting Started\n### Installation\n1. Install [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git) from source:\n```shell\ngit clone https://github.com/modelscope/DiffSynth-Studio.git\ncd DiffSynth-Studio\npip install -e .\n```\n2. Install requirements\n```\npip install -r requirements.txt\n```\n3. Install [ms-swift](https://github.com/modelscope/ms-swift.git) if you want to perform finetuning on Nexus-Gen.\n```\npip install ms-swift -U\n```\n### Prepare models\n```shell\npython download_models.py\n```\n### Image Understanding\n```shell\npython image_understanding.py\n```\n\n### Image Generation\nImage generation with detailed prompt. (Needs at least 37 GB VRAM)\n```shell\npython image_generation.py\n```\nPolish prompt and generate images with Nexus-Gen.\n```shell\nimage_generation_with_selfpolish.py\n```\nImage generation with less VRAM by cpu offload. (Needs at least 24 GB VRAM)\n```shell\npython image_generation_offload.py\n```\n### Image Editing\n```shell\npython image_editing.py\n```\n\n### Gradio demo\n```shell\npython app.py\n```\n\n### Training Codes\nNexus-Gen is trained base on [ms-swift](https://github.com/modelscope/ms-swift.git) and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git). You can find the training scripts in `train/scripts/train_decoder.sh` and `train_llm.sh`.\n\n### Citation\n```\n@article{zhang2025nexus-gen,\n      title={Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing}, \n      author={Hong Zhang and Zhongjie Duan and Xingjun Wang and Yingda Chen and Yuze Zhao and Yu Zhang},\n      journal={arXiv preprint arXiv:2504.21356},\n      year={2025}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fnexus-gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2Fnexus-gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fnexus-gen/lists"}