https://github.com/Zeqiang-Lai/Mini-DALLE3

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
https://github.com/Zeqiang-Lai/Mini-DALLE3

dall-e-3 dalle dalle-3 dalle3 interactive-text-to-image mini-dalle3

Last synced: 3 months ago
JSON representation

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Host: GitHub
URL: https://github.com/Zeqiang-Lai/Mini-DALLE3
Owner: Zeqiang-Lai
Created: 2023-09-21T14:41:39.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-12-28T13:53:32.000Z (over 1 year ago)
Last Synced: 2025-03-24T09:05:41.285Z (3 months ago)
Topics: dall-e-3, dalle, dalle-3, dalle3, interactive-text-to-image, mini-dalle3
Language: Python
Homepage: https://minidalle3.github.io/
Size: 168 KB
Stars: 307
Watchers: 4
Forks: 29
Open Issues: 7
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        




    

       





Technical Report •

Project page •

Demo (Temporarily Unavailable)



https://github.com/Zeqiang-Lai/Mini-DALLE3/assets/26198430/5b6c0a0c-ebbf-48db-981e-f97d542a38b4

![teaser4](https://github.com/Zeqiang-Lai/Mini-DALLE3/assets/26198430/1f17e3c3-6804-4c4e-9266-e902ecedeae8)

> An experimental attempt to obtain the interactive and interleave text-to-image and text-to-text experience of [DALL•E 3](https://openai.com/dall-e-3) and [ChatGPT](https://openai.com/chatgpt).

## Try Yourself 🤗 

- Download the [checkpoint](https://huggingface.co/h94/IP-Adapter) and save it as following 

```bash

checkpoints

   - models

   - sdxl_models

```

- run the following commands, and you will get a gradio-based web demo.

```bash

export OPENAI_API_KEY="your key"

python -m minidalle3.web 

```

- To use other LLM rather than ChatGPT, such as `baichuan`.

```bash

python -m minidalle3.llm.baichuan

export OPENAI_API_BASE="http://0.0.0.0:10039/v1"

python -m minidalle3.web

```

>  `chatglm`, `baichuan`, `internlm` are tested.

> llama have not supported yet. qwen is not tested.

## TODO

- [x] Support generating image interleaved in the conversations.

- [ ] Support generating multiple images at once.

- [ ] Support selecting image.

- [ ] Support refinement.

- [ ] Support prompt refinement/variation.

- [ ] Instruct tuned LLM/SD.

## Citation

If you find this repo helpful, please consider citing us.

```bibtex

@misc{minidalle3,

    author={Lai, Zeqiang and Zhu, Xizhou and Dai, Jifeng and Qiao, Yu and Wang, Wenhai},

    title={Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models},

    year={2023},

    url={https://github.com/Zeqiang-Lai/Mini-DALLE3},

}

```

## Acknowledgement

[IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) • [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fgithub.com%2FZeqiang-Lai%2FMini-DALLE3&countColor=%23263759&style=flat)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Zeqiang-Lai/Mini-DALLE3

Awesome Lists containing this project

README