{"id":14964558,"url":"https://github.com/foundationvision/llamagen","last_synced_at":"2025-05-15T16:07:15.864Z","repository":{"id":243776150,"uuid":"809671338","full_name":"FoundationVision/LlamaGen","owner":"FoundationVision","description":"Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation","archived":false,"fork":false,"pushed_at":"2024-08-15T20:10:35.000Z","size":5612,"stargazers_count":1741,"open_issues_count":58,"forks_count":77,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-05-07T17:37:13.723Z","etag":null,"topics":["auto-regressive-model","diffusion","diffusion-models","image-generation","llama","llm","text2image"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.06525","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FoundationVision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-03T08:18:41.000Z","updated_at":"2025-05-07T13:57:02.000Z","dependencies_parsed_at":"2024-06-28T05:34:13.129Z","dependency_job_id":"793f1d9f-0850-48b0-936c-6d4f6e2a52de","html_url":"https://github.com/FoundationVision/LlamaGen","commit_stats":{"total_commits":14,"total_committers":7,"mean_commits":2.0,"dds":0.5714285714285714,"last_synced_commit":"ce98ec41803a74a90ce68c40ababa9eaeffeb4ec"},"previous_names":["foundationvision/llamagen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FLlamaGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FLlamaGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FLlamaGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FLlamaGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FoundationVision","download_url":"https://codeload.github.com/FoundationVision/LlamaGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374475,"owners_count":22060611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-regressive-model","diffusion","diffusion-models","image-generation","llama","llm","text2image"],"created_at":"2024-09-24T13:33:23.503Z","updated_at":"2025-05-15T16:07:15.840Z","avatar_url":"https://github.com/FoundationVision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation\n\n\n\u003cdiv align=\"center\"\u003e\n\n[![demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Online_Demo-blue)](https://huggingface.co/spaces/FoundationVision/LlamaGen)\u0026nbsp;\n[![arXiv](https://img.shields.io/badge/arXiv%20paper-2406.06525-b31b1b.svg)](https://arxiv.org/abs/2406.06525)\u0026nbsp;\n[![project page](https://img.shields.io/badge/Project_page-More_visualizations-green)](https://peizesun.github.io/llamagen/)\u0026nbsp;\n\n\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/teaser.jpg\" width=95%\u003e\n\u003cp\u003e\n\n\n\nThis repo contains pre-trained model weights and training/sampling PyTorch(torch\u003e=2.1.0) codes used in\n\n\u003e [**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**](https://arxiv.org/abs/2406.06525)\u003cbr\u003e\n\u003e [Peize Sun](https://peizesun.github.io/), [Yi Jiang](https://enjoyyi.github.io/), [Shoufa Chen](https://www.shoufachen.com/), [Shilong Zhang](https://jshilong.github.io/), [Bingyue Peng](), [Ping Luo](http://luoping.me/), [Zehuan Yuan](https://shallowyuan.github.io/)\n\u003e \u003cbr\u003eHKU, ByteDance\u003cbr\u003e\n\nYou can find more visualizations on [![project page](https://img.shields.io/badge/Project_page-More_visualizations-green)](https://peizesun.github.io/llamagen/)\n\n## 🔥 Update\n- [2024.06.28] Image tokenizers and AR models for text-conditional image generation are released ! Try it !\n- [2024.06.15] All models ranging from 100M to 3B parameters are supported by vLLM ! \n- [2024.06.11] Image tokenizers and AR models for class-conditional image generation are released !\n- [2024.06.11] Code and Demo are released !\n\n## 🌿 Introduction\nWe introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction`` paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, ``without inductive biases`` on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spaces of image tokenizers, scalability properties of image generation models, and their training data quality.\n\nIn this repo, we release:\n* Two image tokenizers of downsample ratio 16 and 8.\n* Seven class-conditional generation models ranging from 100M to 3B parameters.\n* Two text-conditional generation models of 700M parameters.\n* Online demos in  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/FoundationVision/LlamaGen) for running pre-trained models.\n* Supported vLLM serving framework to enable 300% - 400% speedup.\n\n## 🦄 Class-conditional image generation on ImageNet\n### VQ-VAE models\nMethod | params | tokens | rFID (256x256) | weight\n--- |:---:|:---:|:---:|:---:\nvq_ds16_c2i | 72M | 16x16 | 2.19 | [vq_ds16_c2i.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/vq_ds16_c2i.pt) \nvq_ds16_c2i | 72M | 24x24 | 0.94 | above\nvq_ds16_c2i | 72M | 32x32 | 0.70 | above\nvq_ds8_c2i  | 70M | 32x32 | 0.59 | [vq_ds8_c2i.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/vq_ds8_c2i.pt)\n\n### AR models\nMethod | params | training | tokens | FID (256x256) | weight \n--- |:---:|:---:|:---:|:---:|:---:|\nLlamaGen-B   | 111M | DDP | 16x16 | 5.46 | [c2i_B_256.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_B_256.pt)\nLlamaGen-B   | 111M | DDP | 24x24 | 6.09 | [c2i_B_384.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_B_384.pt)\nLlamaGen-L   | 343M | DDP | 16x16 | 3.80 | [c2i_L_256.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_L_256.pt)\nLlamaGen-L   | 343M | DDP | 24x24 | 3.07 | [c2i_L_384.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_L_384.pt)\nLlamaGen-XL  | 775M | DDP | 24x24 | 2.62 | [c2i_X_384L.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_XL_384.pt)\nLlamaGen-XXL | 1.4B | FSDP | 24x24 | 2.34 | [c2i_XXL_384.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_XXL_384.pt)\nLlamaGen-3B  | 3.1B | FSDP | 24x24 | 2.18 | [c2i_3B_384.pt](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/c2i_3B_384.pt)\n\n\n### Demo\nPlease download models, put them in the folder `./pretrained_models`, and run\n```\npython3 autoregressive/sample/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_L_384.pt --gpt-model GPT-L --image-size 384\n# or\npython3 autoregressive/sample/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XXL_384.pt --gpt-model GPT-XXL --from-fsdp --image-size 384\n```\nThe generated images will be saved to `sample_c2i.png`.\n\n### Gradio Demo \u003ca href='https://github.com/gradio-app/gradio'\u003e\u003cimg src='https://img.shields.io/github/stars/gradio-app/gradio'\u003e\u003c/a\u003e\n\nYou can use our online gradio demo [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/FoundationVision/LlamaGen) or run gradio locally:\n```bash\npython app.py\n```\n\n\n## 🚀 Text-conditional image generation\n### VQ-VAE models\nMethod | params | tokens | data | weight\n--- |:---:|:---:|:---:|:---:\nvq_ds16_t2i | 72M | 16x16 | LAION COCO (50M) + internal data (10M) | [vq_ds16_t2i.pt](https://huggingface.co/peizesun/llamagen_t2i/resolve/main/vq_ds16_t2i.pt)\n\n### AR models\nMethod | params | tokens | data | weight \n--- |:---:|:---:|:---:|:---:\nLlamaGen-XL  | 775M | 16x16 | LAION COCO (50M) | [t2i_XL_stage1_256.pt](https://huggingface.co/peizesun/llamagen_t2i/resolve/main/t2i_XL_stage1_256.pt)\nLlamaGen-XL  | 775M | 32x32 | internal data (10M) | [t2i_XL_stage2_512.pt](https://huggingface.co/peizesun/llamagen_t2i/resolve/main/t2i_XL_stage2_512.pt)\n\n### Demo\nBefore running demo, please refer to [language readme](language/README.md) to install the required packages and language models.  \n\nPlease download models, put them in the folder `./pretrained_models`, and run\n```\npython3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage1_256.pt --gpt-model GPT-XL --image-size 256\n# or\npython3 autoregressive/sample/sample_t2i.py --vq-ckpt ./pretrained_models/vq_ds16_t2i.pt --gpt-ckpt ./pretrained_models/t2i_XL_stage2_512.pt --gpt-model GPT-XL --image-size 512\n```\nThe generated images will be saved to `sample_t2i.png`.\n\n### Local Gradio Demo\n\n\n\n## ⚡ Serving\nWe use serving framework [vLLM](https://github.com/vllm-project/vllm) to enable higher throughput. Please refer to [serving readme](autoregressive/serve/README.md) to install the required packages.  \n```\npython3 autoregressive/serve/sample_c2i.py --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XXL_384.pt --gpt-model GPT-XXL --from-fsdp --image-size 384\n```\nThe generated images will be saved to `sample_c2i_vllm.png`.\n\n\n## Getting Started\nSee [Getting Started](GETTING_STARTED.md) for installation, training and evaluation.\n\n\n## License\nThe majority of this project is licensed under MIT License. Portions of the project are available under separate license of referred projects, detailed in corresponding files.\n\n\n## BibTeX\n```bibtex\n@article{sun2024autoregressive,\n  title={Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation},\n  author={Sun, Peize and Jiang, Yi and Chen, Shoufa and Zhang, Shilong and Peng, Bingyue and Luo, Ping and Yuan, Zehuan},\n  journal={arXiv preprint arXiv:2406.06525},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundationvision%2Fllamagen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoundationvision%2Fllamagen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundationvision%2Fllamagen/lists"}