Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://github.com/YangLing0818/RPG-DiffusionMaster
image-editting large-language-models multimodal-large-language-models text-to-image
Last synced: 5 days ago
JSON representation
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
- Host: GitHub
- URL: https://github.com/YangLing0818/RPG-DiffusionMaster
- Owner: YangLing0818
- License: mit
- Created: 2024-01-22T01:07:23.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-10-10T02:44:57.000Z (26 days ago)
- Last Synced: 2024-10-29T15:34:38.092Z (6 days ago)
- Topics: image-editting, large-language-models, multimodal-large-language-models, text-to-image
- Language: Jupyter Notebook
- Homepage: https://proceedings.mlr.press/v235/yang24ai.html
- Size: 46 MB
- Stars: 1,684
- Watchers: 25
- Forks: 97
- Open Issues: 42
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
- ai-game-devtools - RPG-DiffusionMaster - to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | | | Image | (<span id="image">Image</span> / <span id="tool">Tool (AI LLM)</span>)
- AiTreasureBox - YangLing0818/RPG-DiffusionMaster - 11-02_1686_1](https://img.shields.io/github/stars/YangLing0818/RPG-DiffusionMaster.svg)|Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)| (Repos)
- StarryDivineSky - YangLing0818/RPG-DiffusionMaster - 4、Gemini-Pro)或开源的本地MLLM(如miniGPT-4)作为提示的字幕重配和区域规划器,通过我们的互补区域扩散来实现SOTA文本到图像的生成和编辑。我们的框架非常灵活,可以推广到任意MLLM架构和扩散主干网。RPG还能够生成超高分辨率的图像。高度准确的图像生成: RPG框架能够根据复杂的描述生成高度准确和详细的图像,尤其在处理包含多个对象、属性和关系的场景时表现出色,生成的图像与文本描述高度一致。超越现有技术: 与现有的文本到图像模型相比,RPG框架展现了更好的性能,尤其在处理多元素组合和文本-图像语义对齐方面。灵活性和广泛适用性: 实验表明,RPG框架能够与不同的多模态大型语言模型和扩散模型兼容,适用于多种图像生成场景。提升质量和细节: 生成的图像不仅在视觉上吸引人,而且细节丰富,对于艺术创作、设计和娱乐等领域至关重要。RPG框架还能够处理复杂的交互和环境,生成的图像在构图和细节方面表现出色。 (多模态大模型 / 网络服务_其他)