https://github.com/kyegomez/multimodal-tot
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
https://github.com/kyegomez/multimodal-tot
artificial-intelligence gpt4 multi-modal multi-modality multi-modality-data
Last synced: about 1 year ago
JSON representation
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
- Host: GitHub
- URL: https://github.com/kyegomez/multimodal-tot
- Owner: kyegomez
- License: mit
- Created: 2023-09-21T03:15:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-11T21:03:23.000Z (over 1 year ago)
- Last Synced: 2025-04-19T20:16:58.679Z (about 1 year ago)
- Topics: artificial-intelligence, gpt4, multi-modal, multi-modality, multi-modality-data
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 81.2 MB
- Stars: 16
- Watchers: 4
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# MultiModal Tree of Thoughts
Multi Modal tree of thoughts that leverages the GPT-4 language model and the
Stable Diffusion model to generate a multimodal output and evaluate the
output based a metric from 0.0 to 1.0 and then run a search algorithm using DFS and BFS and return the best output.
task: Generate an image of a swarm of bees -> Image generator -> GPT4V evaluates the img from 0.0 to 1.0 -> DFS/BFS -> return the best output
- GPT4Vision will evaluate the image from 0.0 to 1.0 based on how likely it accomplishes the task
- DFS/BFS will search for the best output based on the evaluation from GPT4Vision
- The output will be a multimodal output that is a combination of the image and the text
- The output will be evaluated by GPT4Vision
- The prompt to the image generator will be optimized from the output of GPT4Vision and the search
# Usage
`streamlit run app.py`
# License
MIT