{"id":19156258,"url":"https://github.com/kyegomez/multimodal-tot","last_synced_at":"2025-05-07T07:43:14.687Z","repository":{"id":196029844,"uuid":"694456245","full_name":"kyegomez/MultiModal-ToT","owner":"kyegomez","description":"Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement","archived":false,"fork":false,"pushed_at":"2024-11-11T21:03:23.000Z","size":85172,"stargazers_count":16,"open_issues_count":1,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-19T20:16:58.679Z","etag":null,"topics":["artificial-intelligence","gpt4","multi-modal","multi-modality","multi-modality-data"],"latest_commit_sha":null,"homepage":"https://discord.gg/GYbXvDGevY","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyegomez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2023-09-21T03:15:08.000Z","updated_at":"2025-04-18T02:51:27.000Z","dependencies_parsed_at":"2023-09-21T03:45:47.423Z","dependency_job_id":"eab7f3a3-1924-48d8-8cc2-d3aaf017cbee","html_url":"https://github.com/kyegomez/MultiModal-ToT","commit_stats":null,"previous_names":["kyegomez/multimodal-tot"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Paper-Implementation-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FMultiModal-ToT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FMultiModal-ToT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FMultiModal-ToT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FMultiModal-ToT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyegomez","download_url":"https://codeload.github.com/kyegomez/MultiModal-ToT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252837148,"owners_count":21811846,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","gpt4","multi-modal","multi-modality","multi-modality-data"],"created_at":"2024-11-09T08:33:52.235Z","updated_at":"2025-05-07T07:43:14.661Z","avatar_url":"https://github.com/kyegomez.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# MultiModal Tree of Thoughts\nMulti Modal tree of thoughts that leverages the GPT-4 language model and the\nStable Diffusion model to generate a multimodal output and evaluate the\noutput based a metric from 0.0 to 1.0 and then run a search algorithm using DFS and BFS and return the best output.\n    \n    \ntask: Generate an image of a swarm of bees -\u003e Image generator -\u003e GPT4V evaluates the img from 0.0 to 1.0 -\u003e DFS/BFS -\u003e return the best output\n\n\n- GPT4Vision will evaluate the image from 0.0 to 1.0 based on how likely it accomplishes the task\n- DFS/BFS will search for the best output based on the evaluation from GPT4Vision\n- The output will be a multimodal output that is a combination of the image and the text\n- The output will be evaluated by GPT4Vision\n- The prompt to the image generator will be optimized from the output of GPT4Vision and the search\n\n# Usage\n`streamlit run app.py`\n\n# License\nMIT\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fmultimodal-tot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyegomez%2Fmultimodal-tot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fmultimodal-tot/lists"}