https://github.com/Yuan-ManX/ai-game-devtools

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
https://github.com/Yuan-ManX/ai-game-devtools
List: ai-game-devtools
ai-platform ai-toolkit aigc artificial-intelligence awesome-list deep-learning game-ai game-development game-engine mechine-learing unity
Last synced: about 1 month ago
JSON representation
Host: GitHub
URL: https://github.com/Yuan-ManX/ai-game-devtools
Owner: Yuan-ManX
License: mit
Created: 2023-03-21T03:01:17.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-04-07T06:29:37.000Z (2 months ago)
Last Synced: 2025-05-07T02:02:01.747Z (about 2 months ago)
Topics: ai-platform, ai-toolkit, aigc, artificial-intelligence, awesome-list, deep-learning, game-ai, game-development, game-engine, mechine-learing, unity
Homepage: https://yuan-manx.github.io/ai-game-devtools/
Size: 1.78 MB
Stars: 825
Watchers: 29
Forks: 80
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-opensource-unity - AI Game DevTools (AI-GDT)
awesome_ai_agents - Ai-Game-Devtools - Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, An… (Building / Tools)
awesome_ai_agents - Ai-Game-Devtools - Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, An… (Building / Tools)
ultimate-awesome - ai-game-devtools - Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥. (Programming Language Lists / Python Lists)
README

        # AI Game DevTools (AI-GDT) 🎮



  



Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥

## Table of Contents

* [Tool (AI LLM)](#tool)

* [Game (Agent)](#game)

* [Code](#code)

* [Writer](#writer)

* [Image](#image)

* [Texture](#texture)

* [Shader](#shader)

* [3D Model](#model)

* [Avatar](#avatar)

* [Animation](#animation)

* [Visual](#visual)

* [Video](#video)

* [Audio](#audio)

* [Music](#music)

* [Singing Voice](#voice)

* [Speech](#speech)

* [Analytics](#analytics)

## Project List

###  Tool (AI LLM)

| Source 
| :-------------------------------------------------- 
| [AgentGPT](https://github.com/reworkd/AgentGPT) 
| [AICommand](https://github.com/keijiro/AICommand) 
| [AIOS](https://github.com/agiresearch/AIOS) 
| [AI Scientist](https://github.com/SakanaAI/AI-Scientist) 
| [Assistant CLI](https://github.com/diciaup/assistant-cli) 
| [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT) 
| [BabyAGI](https://github.com/yoheinakajima/babyagi) 
| [👶🤖🖥️ BabyAGI UI](https://github.com/miu 
| [baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 
| [Baichuan 2](https://github.com/baichuan-inc/Baichuan2) 
| [Bisheng](https://github.com/dataelement/bisheng) 
| [Character-LLM](https://github.com/choosewhatulike/ 
| [ChatDev](https://github.com/OpenBMB/ChatDev) 
| [ChatGPT-API-unity](https://github.com/mochi-neko/C 
| [ChatGPTForUnity](https://github.com/sunsvip/ChatGPTForUnity) 
| [ChatRWKV](https://github.com/BlinkDL/ChatRWKV) 
| [ChatYuan](https://github.com/clue-ai/ChatYuan) 
| [Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/C 
| [Chrome-GPT](https://github.com/richardyc/Chrome-GPT) 
| [CogVLM](https://www.modelscope.cn/models/ZhipuAI/C 
| [CoreNet](https://github.com/apple/corenet) 
| [Cosmos](https://github.com/NVIDIA/Cosmos) 
| [DBRX](https://github.com/databricks/dbrx) 
| [DCLM](https://github.com/mlfoundations/dclm) 
| [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) 
| [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) 
| [DemoGPT](https://github.com/melih-unsal/DemoGPT) 
| [Design2Code](https://github.com/NoviScl/Design2Code) 
| [Devika](https://github.com/stitionai/devika) 
| [Devon](https://github.com/entropy-research/Devon) 
| [Dora](https://www.dora.run/ai) 
| [Flowise](https://github.com/FlowiseAI/Flowise) 
| [Gemini](https://deepmind.google/technologies/gemini) 
| [Gemma](https://github.com/google/gemma_pytorch) 
| [gemma.cpp](https://github.com/google/gemma.cpp) 
| [GLM-4](https://github.com/THUDM/GLM-4) 
| [GPT4All](https://github.com/nomic-ai/gpt4all) 
| [GPT-4o](https://openai.com/index/hello-gpt-4o/) 
| [GPTScript](https://github.com/gptscript-ai/gptscript) 
| [Grok-1](https://x.ai/blog/grok-os) 
| [HuggingChat](https://huggingface.co/chat/) 
| [Hugging Face API Unity Integration](https://github 
| [ImageBind](https://github.com/facebookresearch/ImageBind) 
| [Index-1.9B](https://github.com/bilibili/Index-1.9B) 
| [InteractML-Unity](https://github.com/Interactml/iml-unity) 
| [InteractML-Unreal Engine](https://github.com/Inter 
| [InternLM](https://github.com/InternLM/InternLM) 
| [InternLM-XComposer](https://github.com/InternLM/In 
| [Jan](https://github.com/janhq/jan) 
| [Janus](https://github.com/deepseek-ai/Janus) 
| [Lamini](https://github.com/lamini-ai/lamini) 
| [LaMini-LM](https://github.com/mbzuai-nlp/LaMini-LM) 
| [LangChain](https://github.com/hwchase17/langchain) 
| [LangFlow](https://github.com/logspace-ai/langflow) 
| [LaVague](https://github.com/lavague-ai/LaVague) 
| [Lemur](https://github.com/OpenLemur/Lemur) 
| [Lepton AI](https://github.com/leptonai/leptonai) 
| [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama) 
| [llama2-webui](https://github.com/liltom-eth/llama2-webui) 
| [Llama 3](https://github.com/meta-llama/llama3) 
| [Llama 3.1](https://github.com/meta-llama/llama-models) 
| [LLaSM](https://github.com/LinkSoul-AI/LLaSM) 
| [LLM Answer Engine](https://github.com/developersdi 
| [llm.c](https://github.com/karpathy/llm.c) 
| [LLMUnity](https://github.com/undreamai/LLMUnity) 
| [LLocalSearch](https://github.com/nilsherzig/LLocalSearch) 
| [LogicGamesSolver](https://github.com/fabridigua/Lo 
| [LongWriter](https://github.com/THUDM/LongWriter) 
| [Large World Model (LWM)](https://github.com/LargeW 
| [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X) 
| [MetaGPT](https://github.com/geekan/MetaGPT) 
| [MiniCPM-2B](https://github.com/OpenBMB/MiniCPM) 
| [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4) 
| [MiniGPT-5](https://github.com/eric-ai-lab/MiniGPT-5) 
| [MiniMax-01](https://github.com/MiniMax-AI/MiniMax-01) 
| [Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) 
| [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/) 
| [Mistral Large](https://mistral.ai/news/mistral-large/) 
| [MLC LLM](https://github.com/mlc-ai/mlc-llm) 
| [MobiLlama](https://github.com/mbzuai-oryx/MobiLlama) 
| [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA) 
| [Moshi](https://www.moshi.chat/?queue_id=talktomoshi) 
| [Moshi](https://github.com/kyutai-labs/moshi) 
| [MOSS](https://github.com/OpenLMLab/MOSS) 
| [mPLUG-Owl🦉](https://github.com/X-PLUG/mPLUG-Owl) 
| [Nemotron-4](https://arxiv.org/abs/2402.16819) 
| [NExT-GPT](https://github.com/NExT-GPT/NExT-GPT) 
| [OLMo](https://github.com/allenai/OLMo) 
| [OmniLMM](https://github.com/OpenBMB/OmniLMM) 
| [OneLLM](https://github.com/csuhan/OneLLM) 
| [Open-Assistant](https://github.com/LAION-AI/Open-Assistant) 
| [Open Deep Research](https://github.com/dzhng/deep-research) 
| [OpenDevin](https://github.com/OpenDevin/OpenDevin) 
| [Orion-14B](https://github.com/OrionStarAI/Orion) 
| [Panda](https://github.com/dandelionsllm/pandallm) 
| [Perplexica](https://github.com/ItzCrazyKns/Perplexica) 
| [Pi](https://heypi.com/talk) 
| [Qwen1.5](https://github.com/QwenLM/Qwen1.5) 
| [Qwen2](https://github.com/QwenLM/Qwen2) 
| [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder) 
| [Qwen-7B](https://github.com/QwenLM/Qwen-7B) 
| [RepoAgent](https://github.com/OpenBMB/RepoAgent) 
| [s1](https://github.com/simplescaling/s1) 
| [Sanity AI Engine](https://github.com/tosos/SanityEngine) 
| [SearchGPT](https://github.com/tobiasbueschel/search-gpt) 
| [ShareGPT4V](https://sharegpt4v.github.io/) 
| [SkyThought](https://github.com/NovaSky-AI/SkyThought) 
| [Skywork](https://github.com/SkyworkAI/Skywork) 
| [StableLM](https://github.com/Stability-AI/StableLM) 
| [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) 
| [Text generation web UI](https://github.com/oobaboo 
| [TinyChatEngine](https://github.com/mit-han-lab/TinyChatEngine) 
| [ToolBench](https://github.com/openbmb/toolbench) 
| [Unity ChatGPT](https://github.com/dilmerv/UnityChatGPT) 
| [Unity OpenAI-API Integration](https://github.com/h 
| [Unreal Engine 5 Llama LoRA](https://github.com/bub 
| [UnrealGPT](https://github.com/TREE-Ind/UnrealGPT) 
| [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) 
| [WebGPT](https://github.com/0hq/WebGPT) 
| [Web3-GPT](https://github.com/Markeljan/Web3GPT) 
| [WordGPT](https://github.com/filippofinke/WordGPT) 
| [XAgent](https://github.com/OpenBMB/XAgent) 
| [Yi](https://github.com/01-ai/Yi) 
| [01 Project](https://github.com/OpenInterpreter/01)

| Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   | ---------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: | | 🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.                                                                                                                      |          |              |   Tool   | | ChatGPT integration with Unity Editor.                                                                                                                                                         |          |     Unity    |   Tool   | | LLM Agent Operating System.                                                                                                                                                                    |          |              |   Tool   | | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.                                                       |[arXiv](https://arxiv.org/abs/2408.06292)  |             |   Tool   | | A comfortable CLI tool to use ChatGPT service🔥                                                                                                                                               |          |              |   Tool   | | An experimental open-source attempt to make GPT-4 fully autonomous.                                                                                                                            |           |             |   Tool   | | This Python script is an example of an AI-powered task management system.                                                                                                                      |          |              |   Tool   | rla/babyagi-ui)                                    | BabyAGI UI is designed to make it easier to run and develop with babyagi in a web app, like a ChatGPT.                                                                                      |           |             |   Tool   | | A large-scale 7B pretraining language model developed by Baichuan.                                                                                                                             |           |             |   Tool   | | A 13B large language model developed by Baichuan Intelligent Technology.                                                                                                                       |          |              |   Tool   | | A series of large language models developed by Baichuan Intelligent Technology.                                                                                                                |           |             |   Tool   | | Bisheng is an open LLM devops platform for next generation AI applications.                                                                                                                    |           |             |   Tool   | trainable-agents)                           | A Trainable Agent for Role-Playing.                                                                                              |[arXiv](https://arxiv.org/abs/2310.10158)  |             |   Tool   | | Communicative Agents for Software Development.                                                                                   |[arXiv](https://arxiv.org/abs/2307.07924)  |             |   Tool   | hatGPT-API-unity)                           | Binds ChatGPT chat completion API to pure C# on Unity.                                                                                                                                         |          |     Unity    |   Tool   | | ChatGPT for unity.                                                                                                                                                                             |           |    Unity    |   Tool   | | ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.                                                                                                       |           |             |   Tool   | | Large Language Model for Dialogue in Chinese and English.                                                                                                                                      |           |             |   Tool   | hinese-LLaMA-Alpaca-3)                      | (Chinese Llama-3 LLMs) developed from Meta Llama 3.                                                                                                                                            |            |            |   Tool   | | An AutoGPT agent that controls Chrome on your desktop.                                                                                                                                         |           |             |   Tool   | ogVLM/summary)                              | CogVLM, a powerful open-source visual language foundation model.                                                                 |[arXiv](https://arxiv.org/abs/2311.03079)  |             |   Tool   | | A library for training deep neural networks.                                                                                                                                                   |            |            |   Tool   | | Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs.      |            |             |   LLM   | | DBRX is a large language model trained by Databricks.                                                                                                                                          |          |              |   Tool   | | DataComp for Language Models.                                                                                                    |[arXiv](https://arxiv.org/abs/2406.11794)  |             |   Tool   | | DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.       |             |             |   LLM   | | DeepSeek-V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.             |[arXiv](https://arxiv.org/abs/2412.19437)  |             |   LLM   | | Auto Gen-AI App Generator with the Power of Llama 2                                                                                                                                            |          |              |   Tool   | | Automating Front-End Engineering                                                                                                                                                               |          |              |   Tool   | | Devika is an Agentic AI Software Engineer.                                                                                                                                                     |          |              |   Tool   | | An open-source pair programmer.                                                                                                                                                                |          |              |   Tool   | | Generating powerful websites, one prompt at a time.                                                                                                                                            |           |             |   Tool   | | Drag & drop UI to build your customized LLM flow using LangchainJS.                                                                                                                            |           |             |   Tool   | | Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code.                                                                      |          |              |   Tool   | | Gemma is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models.                                                      |          |              |   Tool   | | lightweight, standalone C++ inference engine for Google's Gemma models.                                                                                                                        |          |              |   Tool   | | GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.                                                                   |          |              |   Tool   | | A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.                                                                                        |           |             |   Tool   | | GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs.                                                                                                                                                          |          |              |   Tool   | | Develop LLM Apps in Natural Language.                                                                                                                                                          |          |              |   Tool   | | The weights and architecture of our 314 billion parameter Mixture-of-Experts model, Grok-1.                                                                                                   |          |              |   Tool   | | Making the community's best AI chat models available to everyone.                                                                                                                              |          |              |   Tool   | .com/huggingface/unity-api)                 | This Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects.       |          |     Unity     |   Tool   | | ImageBind One Embedding Space to Bind Them All.                                                                                       |[arXiv](https://arxiv.org/abs/2305.05665)  |        |   Tool   | | A SOTA lightweight multilingual LLM.                                                                                                                                                            |          |              |   Tool   | | InteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D.                                                                                                            |          |     Unity     |   Tool   | actml/iml-ue4)                              | Bringing Machine Learning to Unreal Engine.                                                                                                                                                    |          | Unreal Engine |   Tool   | | InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system.   |[arXiv](https://arxiv.org/abs/2403.17297)  |     |   Tool   | ternLM-XComposer)                           | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.  |[arXiv](https://arxiv.org/abs/2404.06512)  |     |   Tool   | | Bring AI to your Desktop.                                                                                                                                                                      |          |              |   Tool   | | Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.                                                   |[arXiv](https://arxiv.org/abs/2410.13848)  |     |   LLM   | | Lamini allows any engineering team to outperform general purpose LLMs through RLHF and fine- tuning on their own data.                                                                      |          |              |   Tool   | | LaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions.                                  |          |              |   Tool   | | LangChain is a framework for developing applications powered by language models.                                                                                                               |          |              |   Tool   | | ⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.                                                                   |          |              |   Tool   | | Automate automation with Large Action Model framework.                                                                                                                                         |          |              |   Tool   | | Open Foundation Models for Language Agents.                                                                                                                                                    |          |              |   Tool   | | A Pythonic framework to simplify AI service building.                                                                                                                                          |          |              |   Tool   | | Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training.                   |          |              |   Tool   | | Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).                                                                                                            |          |              |   Tool   | | The official Meta Llama 3 GitHub site.                                                                                                                                                         |          |              |   Tool   | | Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas.                                                                                                                                                         |          |              |   Tool   | | Large Language and Speech Model.                                                                                                                                                               |          |              |   Tool   | gest/llm-answer-engine)                     | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper.                                                                              |           |             |   Tool   | | LLM training in simple, raw C/CUDA.                                                                                                                                                            |          |              |   Tool   | | Create characters in Unity with LLMs!                                                                                                                                                          |          |     Unity    |   Tool   | | LLocalSearch is a completely locally running search engine using LLM Agents.                                                                                                                   |          |              |   Tool   | gicGamesSolver)                             | A Python tool to solve logic games with AI, Deep Learning and Computer Vision.                                                                                                                 |          |              |   Tool   | | LongWriter: Unleashing 10,000+ Word Generation From Long Context LLMs.                                                          |[arXiv](https://arxiv.org/abs/2408.07055)  |              |   Tool   | orldModel/LWM)                              | Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model.                                |[arXiv](https://arxiv.org/abs/2402.08268)  |              |   Tool   | | Lumina-T2X is a unified framework for Text to Any Modality Generation.                                                          |[arXiv](https://arxiv.org/abs/2405.05945)  |              |   Tool   | | The Multi-Agent Framework                                                                                                                                                                      |          |              |   Tool   | | An end-side LLM outperforms Llama2-13B.                                                                                                                                                        |          |              |   Tool   | | Enhancing Vision-language Understanding with Advanced Large Language Models.                                                    |[arXiv](https://arxiv.org/abs/2304.10592)  |              |   Tool   | | Interleaved Vision-and-Language Generation via Generative Vokens.                                                               |[arXiv](https://arxiv.org/abs/2310.02239)  |              |   Tool   | | MiniMax-01: Scaling Foundation Models with Lightning Attention.                                                                 |[arXiv](https://arxiv.org/abs/2501.08313)  |              |   LLM   | | A high quality Sparse Mixture-of-Experts.                                                                                       |[arXiv](https://arxiv.org/abs/2401.04088)  |              |   Tool   | | The best 7B model to date, Apache 2.0.                                                                                                                                                         |          |              |   Tool   | | Mistral Large is a new cutting-edge text generation model. It reaches top-tier reasoning capabilities.                                                                                         |          |              |   Tool   | | Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.                                                                                                      |          |              |   Tool   | | Towards Accurate and Lightweight Fully Transparent GPT.                                                                         |[arXiv](https://arxiv.org/abs/2402.16840)  |              |   Tool   | | Mixture of Experts for Large Vision-Language Models.                                                                                |[arXiv](https://arxiv.org/abs/2401.15947)  |              |   Tool   | | Moshi is an experimental conversational AI.                                                                                                                                                    |          |              |   Tool   | | Moshi: a speech-text foundation model for real time dialogue.                                                                                                                                                    |          |              |   Tool   | | An open-source tool-augmented conversational language model from Fudan University.                                                                                                             |          |              |   Tool   | | Modularization Empowers Large Language Models with Multimodality.                                                               |[arXiv](https://arxiv.org/abs/2304.14178)  |              |   Tool   | | A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.                               |[arXiv](https://arxiv.org/abs/2402.16819)  |              |   Tool   | | Any-to-Any Multimodal Large Language Model.                                                                                                                                                    |          |              |   Tool   | | Open Language Model                                                                                                              |[arXiv](https://arxiv.org/abs/2402.00838)  |             |   Tool   | | Large multi-modal models for strong performance and efficient deployment.                                                                                                                      |          |              |   Tool   | | One Framework to Align All Modalities with Language.                                                                            |[arXiv](https://arxiv.org/abs/2312.03700)  |              |   Tool   | | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.                                        |          |              |   Tool   | | An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.                                   |          |              |   LLM   | | An autonomous AI software engineer.                                                                                                                                                            |          |              |   Tool   | | Orion-14B is a family of models includes a 14B foundation LLM, and a series of models.                                          |[arXiv](https://arxiv.org/abs/2401.12246)  |              |   Tool   | | Overseas Chinese open source large language model, based on Llama-7B, -13B, -33B, -65B for continuous pre-training in the Chinese field.                                                    |          |              |   Tool   | | An AI-powered search engine.                                                                                                                                                                   |           |             |   Tool   | | AI chatbot designed for personal assistance and emotional support.                                                                                                                             |          |              |   Tool   | | Qwen1.5 is the improved version of Qwen.                                                                                                                                                       |           |             |   Tool   | | Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.                                                                                                                |           |             |   LLM   | | Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.                        |[arXiv](https://arxiv.org/abs/2409.12186)  |             |   LLM   | | The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.                                                                                    |          |              |   LLM   | | RepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects.    |[arXiv](https://arxiv.org/abs/2402.16667)  |              |   Tool   | | s1: Simple test-time scaling.                                                                                                                 |[arXiv](https://arxiv.org/abs/2501.19393)  |              |   LLM   | | Sanity AI Engine for the Unity Game Development Tool.                                                                                                                                          |          |     Unity     |   Tool   | | 🌳 Connecting ChatGPT with the Internet                                                                                                                                                       |          |              |   Tool   | | Improving Large Multi-Modal Models with Better Captions.                                                                                                                                       |          |              |   Tool   | | Sky-T1: Train your own O1 preview model within $450.                                                                                                                                           |          |              |   LLM   | | Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data.                                                                  |          |              |   Tool   | | Stability AI Language Models.                                                                                                   |[arXiv](https://arxiv.org/abs/2402.17834)  |              |   Tool   | | An Instruction-following LLaMA Model.                                                                                                                                                          |          |              |   LLM   | ga/text-generation-webui)                   | A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA.                                                                                           |          |              |   Tool   | | On-Device LLM Inference Library.                                                                                                                                                               |          |              |   Tool   | | An open platform for training, serving, and evaluating large language model for tool learning.                                                                                            |           |             |   Tool   | | Unity ChatGPT Experiments.                                                                                                                                                                     |          |     Unity     |   Tool   | imanshuskyrockets/Unity_OpenAI)             | Integrate openai GPT-3 language model and ChatGPT API into a Unity project.                                                                                                                    |          |     Unity     |   Tool   | lint/ue5-llama-lora)                        | A proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools.                                        |          | Unreal Engine |   Tool   | | A collection of Unreal Engine 5 Editor Utility widgets powered by GPT3/4.                                                                                                                      |          | Unreal Engine |   Tool   | | Learning United Visual Representation by Alignment Before Projection.                                                           |[arXiv](https://arxiv.org/abs/2311.10122)  |              |   Tool   | | Run GPT model on the browser with WebGPU.                                                                                                                                                      |          |              |   Tool   | | Deploy smart contracts with AI                                                                                                                                                                 |          |              |   Tool   | | 🤖 Bring the power of ChatGPT to Microsoft Word                                                                                                                                               |          |              |   Tool   | | An Autonomous LLM Agent for Complex Task Solving.                                                                                                                                              |          |              |   Tool   | | A series of large language models trained from scratch by developers.                                                                                                                          |          |              |   Tool   | | The open-source language model computer.                                                                                                                                                       |          |              |   Tool   |

^ Back to Contents ^


## Game (Agent)

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AgentBench](https://github.com/thudm/agentbench)                                              | A Comprehensive Benchmark to Evaluate LLMs as Agents.                                                                                 |[arXiv](https://arxiv.org/abs/2308.03688)  |        |   Agent  |

| [Agent Group Chat](https://github.com/MikeGu721/AgentGroup)                                    | An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior.                                                |[arXiv](https://arxiv.org/abs/2403.13433)  |        |   Agent  |

| [Agent K](https://github.com/mikekelly/AgentK)                                                 | An autoagentic AGI that is self-evolving and modular.                                                                                                                            |         |              |   Agent  |

| [Agent Laboratory](https://github.com/SamuelSchmidgall/AgentLaboratory)                        | Agent Laboratory: Using LLM Agents as Research Assistants.                                                                            |[arXiv](https://arxiv.org/abs/2501.04227)  |        |   Agent  |

| [AgentScope](https://github.com/modelscope/agentscope)                                         | Start building LLM-empowered multi-agent applications in an easier way.                                                               |[arXiv](https://arxiv.org/abs/2402.14034)  |              |   Agent  |

| [AgentSims](https://github.com/py499372727/AgentSims/)                                         | An Open-Source Sandbox for Large Language Model Evaluation.                                                                                                                            |         |              |   Agent  |

| [AI Town](https://github.com/a16z-infra/ai-town)                                               | AI Town is a virtual town where AI characters live, chat and socialize.                                                                                                                |         |              |   Agent  |

| [anime.gf](https://github.com/cyanff/anime.gf)                                                 | Local & Open Source Alternative to CharacterAI.                                                                                                                                         |        |              |   Game   |

| [Astrocade](https://www.astrocade.com/)                                                        | Create games with AI                                                                                                                                                                    |        |              |   Game   |

| [Atomic Agents](https://github.com/KennyVaneetvelde/atomic_agents)                             | The Atomic Agents framework is designed to be modular, extensible, and easy to use.                                                                                                     |        |              |   Agent  |

| [AutoAgents](https://github.com/Link-AGI/AutoAgents)                                           | A Framework for Automatic Agent Generation.                                                                                                                                             |        |              |   Agent  |

| [AutoGen](https://github.com/microsoft/autogen)                                                | Enable Next-Gen Large Language Model Applications.                                                                              |[arXiv](https://arxiv.org/abs/2308.08155)  |              |   Agent  |

| [behaviac](https://github.com/Tencent/behaviac)                                                | Behaviac is a framework of the game AI development.                                                                                                                               |              |              | Framework |

| [Biomes](https://github.com/ill-inc/biomes-game)                                               | Biomes is an open source sandbox MMORPG built for the web using web technologies such as Next.js, Typescript, React and WebAssembly.                                                    |       |              |   Game   |

| [Buffer of Thoughts](https://github.com/YangLing0818/buffer-of-thought-llm)                    | Thought-Augmented Reasoning with Large Language Models.                                                                         |[arXiv](https://arxiv.org/abs/2406.04271)  |              |   Agent  |

| [Byzer-Agent](https://github.com/allwefantasy/byzer-agent)                                     | Easy, fast, and distributed agent framework for everyone.                                                                                                                               |        |              |   Agent  |

| [Cat Town](https://github.com/ykhli/cat-town)                                                  | A C(h)atGPT-powered simulation with cats.                                                                                                                                               |        |              |   Agent  |

| [Cat Town](https://github.com/ykhli/cat-town)                                                  | A C(h)atGPT-powered simulation with cats.                                                                                                                                               |        |              |   Agent  |

| [CharacterGLM](https://github.com/thu-coai/CharacterGLM-6B)                                    | Customizing Chinese Conversational AI Characters with Large Language Models.                                                          |[arXiv](https://arxiv.org/abs/2311.16832)  |              |   Agent  |

| [ChatDev](https://github.com/OpenBMB/ChatDev)                                                  | Communicative Agents for Software Development.                                                                                        |[arXiv](https://arxiv.org/abs/2405.04219)  |              |   Agent  |

| [CogAgent](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)                         | CogAgent is an open-source visual language model improved based on CogVLM.                                                            |[arXiv](https://arxiv.org/abs/2312.08914)  |              |   Agent  |

| [Cradle](https://github.com/BAAI-Agents/Cradle)                                                | Towards General Computer Control.                                                                                                                                                         |      |              |   Agent  |

| [crewAI](https://github.com/joaomdmoura/crewAI)                                                | Framework for orchestrating role-playing, autonomous AI agents.                                                                                                                          |       |              |   Agent  |

| [Dify](https://github.com/langgenius/dify)                                                     | Dify is an open-source LLM app building platform.                                                                                                                                        |       |              |   Agent  |

| [Digital Life Project](https://digital-life-project.com/)                                      | Autonomous 3D Characters with Social Intelligence.                                                                                    |[arXiv](https://arxiv.org/abs/2312.04547)  |              |   Agent  |

| [everything-ai](https://github.com/AstraBert/everything-ai)                                    | Your fully proficient, AI-powered and local chatbot assistant🤖.                                                                                                                        |       |              |   Agent  |

| [fabric](https://github.com/danielmiessler/fabric)                                             | fabric is an open-source framework for augmenting humans using AI.                                                                                                                       |       |              |   Agent  |

| [FastGPT](https://github.com/labring/FastGPT)                                                  | FastGPT is a knowledge-based platform built on the LLM.                                                                                                                                  |       |              |   Agent  |

| [fastRAG](https://github.com/IntelLabs/fastRAG)                                                | Efficient Retrieval Augmentation and Generation Framework.                                                                                                                               |       |              |   Agent  |

| [GameAISDK](https://github.com/Tencent/GameAISDK)                                              | Image-based game AI automation framework.                                                                                                                                         |              |              | Framework |

| [GameNGen](https://gamengen.github.io/)                                                        | Diffusion Models Are Real-Time Game Engines.                                                                                          |[arXiv](https://arxiv.org/abs/2408.14837)  |              |   Game  |

| [GameGen-O](https://github.com/GameGen-O/GameGen-O)                                            | GameGen-O: Open-world Video Game Generation.                                                                                                                                           |         |              |   Game   |

| [GenAgent](https://github.com/xxyQwQ/GenAgent)                     | GenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI.                                                                                              |[arXiv](https://arxiv.org/abs/2409.01392)  |              |   Agent  |

| [Generative Agents](https://github.com/joonspk-research/generative_agents)                     | Interactive Simulacra of Human Behavior.                                                                                              |[arXiv](https://arxiv.org/abs/2304.03442)  |              |   Agent  |

| [Genesis](https://github.com/Genesis-Embodied-AI/Genesis)                                      | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond.                                                                                                            |         |              |   Game   |

| [Genie](https://sites.google.com/view/genie-2024/home)                                         | Generative Interactive Environments.                                                                                                                                                   |         |              |   Game   |

| [gigax](https://github.com/GigaxGames/gigax)                                                   | Runtime, LLM-powered NPCs.                                                                                                                                                               |       |              |   Game   |

| [HippoRAG](https://github.com/OSU-NLP-Group/HippoRAG)                                       | Neurobiologically Inspired Long-Term Memory for Large Language Models.                                                                   |[arXiv](https://arxiv.org/abs/2405.14831)  |              |   Agent   |

| [Interactive LLM Powered NPCs](https://github.com/AkshitIreddy/Interactive-LLM-Powered-NPCs)   | Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game!                                    |        |              |   Game   |

| [IoA](https://github.com/OpenBMB/IoA)                                                          | An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.                      |  |              |   Agent   |

| [KwaiAgents](https://github.com/KwaiKEG/KwaiAgents)                                            | A generalized information-seeking agent system with Large Language Models (LLMs).                                                     |[arXiv](https://arxiv.org/abs/2312.04889)  |              |   Agent  |

| [LangChain](https://github.com/langchain-ai/langchain)                                         | Get your LLM application from prototype to production.                                                                                                                                  |        |              |   Agent  |

| [Langflow](https://github.com/logspace-ai/langflow)                                            | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.                                                               |        |              |   Agent  |

| [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio)                           | LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications.        |        |              |   Agent  |

| [LARP](https://github.com/MiAO-AI-Lab/LARP)                                                    | Language-Agent Role Play for open-world games.                                                                                  |[arXiv](https://arxiv.org/abs/2312.17653)  |              |   Agent  |

| [LLama Agentic System](https://github.com/meta-llama/llama-agentic-system)                     | Agentic components of the Llama Stack APIs.                                                                                                                                               |      |              |   Agent  |

| [LlamaIndex](https://github.com/run-llama/llama_index)                                         | LlamaIndex is a data framework for your LLM application.                                                                                                                                  |      |              |   Agent  |

| [MindSearch](https://github.com/InternLM/MindSearch)                                           | 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT).                                                                                     |      |              |   Agent  |

| [Mixture of Agents (MoA)](https://github.com/togethercomputer/MoA)                             | Mixture-of-Agents Enhances Large Language Model Capabilities.                                                                   |[arXiv](https://arxiv.org/abs/2406.04692)  |              |   Agent  |

| [MMRole](https://github.com/YanqiDai/MMRole)                                                   | MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents.                               |[arXiv](https://arxiv.org/abs/2408.04203v1)  |              |   Agent  |

| [Moonlander.ai](https://www.moonlander.ai/)                                                    | Start building 3D games without any coding using generative AI.                                                                                                                          |       |              | Framework |

| [MuG Diffusion](https://github.com/Keytoyze/Mug-Diffusion)                                     | MuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves.               |       |              |   Game   |

| [Oasis](https://github.com/etched-ai/open-oasis)                                               | Oasis is an interactive world model developed by Decart and Etched. Based on diffusion transformers, Oasis takes in user keyboard input and generates gameplay in an autoregressive manner.                |       |              |   Game   |

| [OmAgent](https://github.com/om-ai-lab/OmAgent)                                                | A multimodal agent framework for solving complex tasks.                                                                                                                                 |        |              |   Agent  |

| [OpenAgents](https://github.com/xlang-ai/OpenAgents)                                           | An Open Platform for Language Agents in the Wild.                                                                                                                                       |        |              |   Agent  |

| [Opus](https://opus.ai/)                                                                       | An AI app that turns text into a video game.                                                                                                                                             |       |              |   Game   |

| [Pipecat](https://github.com/pipecat-ai/pipecat)                                            | Open Source framework for voice and multimodal conversational AI.                                                                                                                           |       |              |   Agent   |

| [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent)                                             | Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen.                             |        |              |   Agent  |

| [Ragas](https://github.com/explodinggradients/ragas)                                           | Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.                                                                                     |       |              |   Agent  |

| [RPBench-Auto](https://github.com/boson-ai/RPBench-Auto)                                       | An automated pipeline for evaluating LLMs for role-playing.                                                                                                                              |       |              |   Game   |

| [SIMA](https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/)          | A generalist AI agent for 3D virtual environments.                                                                                                                         |       |              |   Agent  |

| [StoryGames.ai](https://storygames.buildbox.com/)                                              | AI for Dreamers Make Games.                                                                                                                                                              |       |              |   Game   |

| [SWE-agent](https://github.com/princeton-nlp/SWE-agent)                                        | Agent Computer Interfaces Enable Software Engineering Language Models.                                                                |[arXiv](https://arxiv.org/abs/2405.15793)  |              |   Agent  |

| [TaskGen](https://github.com/simbianai/taskgen)                                                | A Task-based agentic framework building on StrictJSON outputs by LLM agents.                                                                                              |       |              |   Agent  |

| [TEN Agent](https://github.com/TEN-framework/TEN-Agent)                                        | TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.              |       |              |   Agent  |

| [Translation Agent](https://github.com/andrewyng/translation-agent)                            | Agentic translation using reflection workflow.                                                                                                                            |       |              |   Agent  |

| [Twitter](https://github.com/wordware-ai/twitter)                                              | Twitter Personality is a web application that analyzes your Twitter handle to create a personalized personality profile using Wordware AI Agent.                                      |       |              |   Agent  |

| [Unbounded](https://generative-infinite-game.github.io/)                                         | Unbounded: A Generative Infinite Game of Character Life Simulation.                                                                 |[arXiv](https://arxiv.org/abs/2410.18975)  |              |   Game   |

| [Video2Game](https://github.com/video2game/video2game)                                         | Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video.                                             |[arXiv](https://arxiv.org/abs/2404.09833)  |              |   Game   |

| [V-IRL](https://virl-platform.github.io/)                                                      | Grounding Virtual Intelligence in Real Life.                                                                                          |[arXiv](https://arxiv.org/abs/2402.03310)  |              |   Agent  |

| [WebDesignAgent](https://github.com/DAMO-NLP-SG/WebDesignAgent)                                | An agent used for webdesign.                                                                                                                                             |        |              |   Agent  |

| [XAgent](https://github.com/OpenBMB/XAgent)                                                    | An Autonomous LLM Agent for Complex Task Solving.                                                                                                                                        |       |              |   Agent  |

^ Back to Contents ^


## Code

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AI Code Translator](https://github.com/mckaywrigley/ai-code-translator)                       | Use AI to translate code from one language to another.                                                                                                                         |  |        |   Code   |

| [aiXcoder-7B](https://github.com/aixcoder-plugin/aiXcoder-7B)                                  | aiXcoder-7B Code Large Language Model.                                                                                                           |                                                 |              |   Code   |

| [bloop](https://github.com/BloopAI/bloop)                                                      | bloop is a fast code search engine written in Rust.                                                                                              |                                                 |              |   Code   |

| [Chapyter](https://github.com/chapyter/chapyter)                                               | ChatGPT Code Interpreter in Jupyter Notebooks.                                                                                                     |                                               |              |   Code   |

| [CodeGeeX](https://github.com/THUDM/CodeGeeX)                                                  | An Open Multilingual Code Generation Model.                                                                                   |[arXiv](https://arxiv.org/abs/2303.17568)    |              |   Code   |

| [CodeGeeX2](https://github.com/THUDM/CodeGeeX2)                                                | A More Powerful Multilingual Code Generation Model.                                                                                               |                                                |              |   Code   |

| [CodeGeeX4](https://github.com/THUDM/CodeGeeX4)                                                | CodeGeeX4: Open Multilingual Code Generation Model.                                                                                               |                                                |              |   Code   |

| [CodeGen](https://github.com/salesforce/CodeGen)                                               | CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.                  |[arXiv](https://arxiv.org/abs/2203.13474)    |              |   Code   |

| [CodeGen2](https://github.com/salesforce/CodeGen2)                                             | CodeGen2 models for program synthesis.                                                                                        |[arXiv](https://arxiv.org/abs/2305.02309)    |              |   Code   |

| [Code Llama](https://github.com/facebookresearch/codellama)                                    | Code Llama is a large language models for code based on Llama 2.                                                                                    |                                              |              |   Code   |

| [CodeTF](https://github.com/salesforce/codetf)                                                 | One-stop Transformer Library for State-of-the-art Code LLM.                                                                                        |                                               |              |   Code   |

| [CodeT5](https://github.com/salesforce/codet5)                                                 | Open Code LLMs for Code Understanding and Generation.                                                                                              |                                               |              |   Code   |

| [Cursor](https://www.cursor.so/)                                                               | Write, edit, and chat about your code with GPT-4 in a new type of editor.                                                                          |                                               |              |   Code   |

| [DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder)                                | DeepSeek Coder: Let the Code Write Itself.                                                                                    |[arXiv](https://arxiv.org/abs/2401.14196)    |              |   Code   |

| [OpenAI Codex](https://openai.com/blog/openai-codex)                                           | OpenAI Codex is a descendant of GPT-3.                                                                                                            |                                                |              |   Code   |

| [PandasAI](https://github.com/gventuri/pandas-ai)                                              | Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational.        |                                     |              |   Code   |

| [RobloxScripterAI](https://www.haddock.ai/search?platform=Roblox)                              | RobloxScripterAI is an AI-powered code generation tool for Roblox.                                                                                        |                                        |     Roblox    |   Code   |

| [Scikit-LLM](https://github.com/iryna-kondr/scikit-llm)                                        | Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.                                         |                                           |              |   Code   |

| [SoTaNa](https://github.com/DeepSoftwareAnalytics/SoTaNa)                                      | The Open-Source Software Development Assistant.                                                                               |[arXiv](https://arxiv.org/abs/2308.13416)    |              |   Code   |

| [Stable Code 3B](https://bit.ly/3O4oGWW)                                                       | Coding on the Edge.                                                                                                                                |                                               |              |   Code   |

| [StarCoder](https://github.com/bigcode-project/starcoder)                                      | 💫 StarCoder is a language model (LM) trained on source code and natural language text.                                      |[arXiv](https://arxiv.org/abs/2305.06161)    |              |   Code   |

| [StarCoder 2](https://github.com/bigcode-project/starcoder2)                                   | StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues.   |[arXiv](https://arxiv.org/abs/2402.19173)    |              |   Code   |

| [UnityGen AI](https://github.com/himanshuskyrockets/UnityGen-AI)                               | UnityGen AI is an AI-powered code generation plugin for Unity.                                                                                                 |                                   |     Unity     |   Code   |

| [Void](https://github.com/voideditor/void)                                                     | Void is an open source Cursor alternative. Write code with the best AI tools, retain full control over your data, and access powerful AI features.             |                                               |              |   Code   |

^ Back to Contents ^


## Writer

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AI-Writer](https://github.com/BlinkDL/AI-Writer)                                              | AI writes novels, generates fantasy and romance web articles, etc. Chinese pre-trained generative model.                                                                      |               |              |  Writer  |

| [Notebook.ai](https://github.com/indentlabs/notebook)                                          | Notebook.ai is a set of tools for writers, game designers, and roleplayers to create magnificent universes – and everything within them.                                    |               |              |  Writer  |

| [Novel](https://github.com/steven-tey/novel)                                                   | Notion-style WYSIWYG editor with AI-powered autocompletions.                                                                                                                     |               |              |  Writer  |

| [NovelAI](https://novelai.net/)                                                                | Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around.                                                                   |               |              |  Writer  |

^ Back to Contents ^


## Image

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AnyDoor](https://ali-vilab.github.io/AnyDoor-Page/)                                           | Zero-shot Object-level Image Customization.                                                                                     |[arXiv](https://arxiv.org/abs/2307.09481)  |              |   Image   |

| [AnyText](https://github.com/tyxsspa/AnyText)                                                  | Multilingual Visual Text Generation And Editing.                                                                                |[arXiv](https://arxiv.org/abs/2311.03054)  |              |   Image   |

| [AutoStudio](https://github.com/donahowe/AutoStudio)                                           | Crafting Consistent Subjects in Multi-turn Interactive Image Generation.                                                        |[arXiv](https://arxiv.org/abs/2406.01388)  |              |   Image   |

| [Blender-ControlNet](https://github.com/coolzilj/Blender-ControlNet)                           | Using ControlNet right in Blender.                                                                                              |                                          |    Blender    |   Image   |

| [BriVL](https://github.com/BAAI-WuDao/BriVL)                                                   | Bridging Vision and Language Model.                                                                                             |[arXiv](https://arxiv.org/abs/2103.06561)  |              |   Image   |

| [CatVTON](https://github.com/Zheng-Chong/CatVTON)                                              | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models.                                                |[arXiv](https://arxiv.org/abs/2407.15886)  |              |   Image   |

| [CLIPasso](https://github.com/yael-vinker/CLIPasso)                                            | A method for converting an image of an object to a sketch, allowing for varying levels of abstraction.                          |[arXiv](https://arxiv.org/abs/2202.05822)  |              |   Image   |

| [ClipDrop](https://clipdrop.co/)                                                               | Create stunning visuals in seconds.                                                                                                                                                      |        |              |   Image   |

| [ComfyUI](https://github.com/comfyanonymous/ComfyUI)                                           | A powerful and modular stable diffusion GUI with a graph/nodes interface.                                                                                                               |         |              |   Image   |

| [ConceptLab](https://github.com/kfirgoldberg/ConceptLab)                                       | Creative Generation using Diffusion Prior Constraints.                                                                          |[arXiv](https://arxiv.org/abs/2308.02669)  |              |   Image   |

| [ControlNet](https://github.com/lllyasviel/ControlNet)                                         | ControlNet is a neural network structure to control diffusion models by adding extra conditions.                           |[arXiv](https://arxiv.org/abs/2302.05543)  |              |   Image   |

| [CSGO](https://github.com/instantX-research/CSGO)                                              | CSGO: Content-Style Composition in Text-to-Image Generation.                                                                    |[arXiv](https://arxiv.org/abs/2408.16766)  |              |   Image   |

| [DALL·E 2](https://openai.com/product/dall-e-2)                                                | DALL·E 2 is an AI system that can create realistic images and art from a description in natural language.                                                                             |         |              |   Image   |

| [Dashtoon Studio](https://www.dashtoon.ai/)                                                    | Dashtoon Studio is an AI powered comic creation platform.                                                                                                                               |         |              |   Comic   |

| [DeepAI](https://deepai.org/)                                                                  | DeepAI offers a suite of tools that use AI to enhance your creativity.                                                                                                                   |        |              |   Image   |

| [DeepFloyd IF](https://github.com/deep-floyd/IF)                                               | IF by DeepFloyd Lab at StabilityAI.                                                                                                                                                    |          |              |   Image   |

| [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2)                        | Depth Anything V2                                                                                                               |[arXiv](https://arxiv.org/abs/2406.09414)  |              |   Image   |

| [Depth map library and poser](https://github.com/jexom/sd-webui-depth-lib)                     | Depth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui.                                                                             |          |              |   Image   |

| [Diffuse to Choose](https://diffuse2choose.github.io/)                                         | Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All.                                          |[arXiv](https://arxiv.org/abs/2401.13795)  |              |   Image   |

| [Disco Diffusion](https://github.com/alembics/disco-diffusion)                                 | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations.                                                                       |          |              |   Image   |

| [DragGAN](https://github.com/XingangPan/DragGAN)                                               | Interactive Point-based Manipulation on the Generative Image Manifold.                                                          |[arXiv](https://arxiv.org/abs/2305.10973)  |              |   Image   |

| [Draw Things](https://drawthings.ai/)                                                          | AI- assisted image generation in Your Pocket.                                                                                                                                          |          |              |   Image   |

| [DWPose](https://github.com/idea-research/dwpose)                                              | Effective Whole-body Pose Estimation with Two-stages Distillation.                                                              |[arXiv](https://arxiv.org/abs/2307.15880)  |              |   Image   |

| [EasyPhoto](https://github.com/aigc-apps/sd-webui-EasyPhoto)                                   | Your Smart AI Photo Generator.                                                                                                                                                          |         |              |   Image   |

| [Flux](https://github.com/black-forest-labs/flux)                                              | This repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers.                                                  |         |              |   Image   |

| [Follow-Your-Click](https://github.com/mayuelala/FollowYourClick)                              | Open-domain Regional Image Animation via Short Prompts.                                                                         |[arXiv](https://arxiv.org/abs/2403.08268)  |              |   Image   |

| [Fooocus](https://github.com/lllyasviel/Fooocus)                                               | Focus on prompting and generating.                                                                                                                                                   |            |              |   Image   |

| [GIFfusion](https://github.com/DN6/giffusion)                                                  | Create GIFs and Videos using Stable Diffusion.                                                                                                                                         |          |              |   Image   |

| [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything)        | Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs.                                        |[arXiv](https://arxiv.org/abs/2401.14159)  |              |   Image   |

| [HivisionIDPhotos](https://github.com/Zeyi-Lin/HivisionIDPhotos)                               | HivisionIDPhotos: a lightweight and efficient AI ID photos tools.                                                                                                                    |            |              |   Image   |

| [Hua](https://github.com/BlinkDL/Hua)                                                          | Hua is an AI image editor with Stable Diffusion (and more).                                                                                                                          |            |              |   Image   |

| [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)                                           | A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding.                        |[arXiv](https://arxiv.org/abs/2405.08748)  |              |   Image   |

| [IC-Light](https://github.com/lllyasviel/IC-Light)                                             | IC-Light is a project to manipulate the illumination of images.                                                                                                                      |            |              |   Image   |

| [Ideogram](https://ideogram.ai/login)                                                          | Helping people become more creative.                                                                                                                                                  |           |              |   Image   |

| [Imagen](https://imagen.research.google/)                                                      | Imagen is an AI system that creates photorealistic images from input text.                                                                                                           |            |              |   Image   |

| [img2img-turbo](https://github.com/GaParmar/img2img-turbo)                                     | One-Step Image-to-Image with SD-Turbo.                                                                                                                                               |            |              |   Image   |

| [Img2Prompt](https://www.img2prompt.io/)                                                       | Get prompts from stable diffusion generated images.                                                                                                                                  |            |              |   Image   |

| [Infinity](https://github.com/FoundationVision/Infinity)                                       | Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis.                                          |[arXiv](https://arxiv.org/abs/2412.04431)  |              |   Image   |

| [InstantID](https://github.com/InstantID/InstantID)                                            | Zero-shot Identity-Preserving Generation in Seconds.                                                                            |[arXiv](https://arxiv.org/abs/2401.07519)  |              |   Image   |

| [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer)                          | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.    |[arXiv](https://arxiv.org/abs/2401.16420)  |              |   Image   |

| [KOALA](https://youngwanlee.github.io/KOALA/)                                                  | Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis.                                                       |                |              |   Image   |

| [Kolors](https://github.com/Kwai-Kolors/Kolors)                                                | Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis.                                                                                              |             |              |   Image   |

| [Komiko](https://komiko.app/)                                                    | Komiko is an AI-powered storytelling platform that lets you create original characters, comics, and animations with ease.                                                                                                              |         |              |   Comic   |

| [KREA](https://www.krea.ai/)                                                                   | Generate images and videos with a delightful AI-powered design tool.                                                                                                                |             |              |   Image   |

| [LaVi-Bridge](https://github.com/ShihaoZhaoZSH/LaVi-Bridge)                                    | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation.                           |[arXiv](https://arxiv.org/abs/2403.07860)  |              |   Image   |

| [LayerDiffusion](https://github.com/layerdiffusion/LayerDiffusion)                             | Transparent Image Layer Diffusion using Latent Transparency.                                                                    |[arXiv](https://arxiv.org/abs/2305.18676)  |              |   Image   |

| [Lexica](https://lexica.art/)                                                                  | A Stable Diffusion prompts search engine.                                                                                                                                           |             |              |   Image   |

| [LlamaGen](https://github.com/FoundationVision/LlamaGen)                                       | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.                                                      |[arXiv](https://arxiv.org/abs/2406.06525)  |              |   Image   |

| [Lumina-Image 2.0](https://github.com/Alpha-VLLM/Lumina-Image-2.0)                             | Lumina-Image 2.0 : A Unified and Efficient Image Generative Model.                                                                                                                  |             |              |   Image   |

| [Lumina-mGPT](https://github.com/Alpha-VLLM/Lumina-mGPT)                                       | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.               |[arXiv](https://arxiv.org/abs/2408.02657)  |              |   Image   |

| [MakeAnything](https://github.com/showlab/MakeAnything)                                        | MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation.                           |[arXiv](https://arxiv.org/abs/2502.01572)  |              |   Image   |

| [MetaShoot](https://metashoot.vinzi.xyz/)                                                      | MetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way. |  | Unreal Engine |   Image   |

| [Midjourney](https://www.midjourney.com/)                                                      | Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.                                               |            |              |   Image   |

| [MIGC](https://github.com/limuloo/MIGC)                                                        | MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis.                                                         |[arXiv](https://arxiv.org/abs/2402.05408)  |              |   Image   |

| [MimicBrush](https://github.com/ali-vilab/MimicBrush)                                          | Zero-shot Image Editing with Reference Imitation.                                                                               |[arXiv](https://arxiv.org/abs/2406.07547)  |              |   Image   |

| [OmniGen](https://github.com/VectorSpaceLab/OmniGen)                                           | OmniGen: Unified Image Generation.                                                                                              |[arXiv](https://arxiv.org/abs/2409.11340)  |              |   Image   |

| [Omost](https://github.com/lllyasviel/Omost)                                | Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability.                                                                     |            |              |   Image   |

| [Openpose Editor](https://github.com/fkunn1326/openpose-editor)                                | Openpose Editor for AUTOMATIC1111's stable-diffusion-webui.                                                                                                                          |            |              |   Image   |

| [Outfit Anyone](https://humanaigc.github.io/outfit-anyone/)                                    | Ultra-high quality virtual try-on for Any Clothing and Any Person.                                                                                                                     |          |              |   Image   |

| [PaintsUndo](https://github.com/lllyasviel/Paints-UNDO)                                        | PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings.                                                                                                                    |          |              |   Image   |

| [PhotoMaker](https://photo-maker.github.io/)                                                   | Customizing Realistic Human Photos via Stacked ID Embedding.                                                                    |[arXiv](https://arxiv.org/abs/2312.04461)  |              |   Image   |

| [Photoroom](https://www.photoroom.com/backgrounds)                                             | AI Background Generator.                                                                                                                                                              |           |              |   Image   |

| [Plask](https://plask.ai/)                                                                     | AI image generation in the cloud.                                                                                                                                                      |          |              |   Image   |

| [Prompt.Art](https://prompt.art/)                                                              | The Generators Hub.                                                                                                                                                                     |         |              |   Image   |

| [PuLID](https://github.com/ToTheBeginning/PuLID)                                               | Pure and Lightning ID Customization via Contrastive Alignment.                                                                  |[arXiv](https://arxiv.org/abs/2404.16022)  |              |   Image   |

| [Rich-Text-to-Image](https://github.com/SongweiGe/rich-text-to-image)                          | Expressive Text-to-Image Generation with Rich Text.                                                                             |[arXiv](https://arxiv.org/abs/2304.06720)  |              |   Image   |

| [RPG-DiffusionMaster](https://github.com/YangLing0818/RPG-DiffusionMaster)                     | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG).                                                                                  |          |              |   Image   |

| [SEED-Story](https://github.com/TencentARC/SEED-Story)                                         | SEED-Story: Multimodal Long Story Generation with Large Language Model.                                                         |[arXiv](https://arxiv.org/abs/2407.08683)  |              |   Image   |

| [Segment Anything](https://segment-anything.com/)                                              | Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click.   |[arXiv](https://arxiv.org/abs/2304.02643)  |              |   Image   |

| [Segment Anything Model 2 (SAM 2)](https://github.com/facebookresearch/segment-anything-2)     | SAM 2: Segment Anything in Images and Videos.                                                                                   |[arXiv](https://arxiv.org/abs/2408.00714)  |              |   Image   |

| [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet)                         | WebUI extension for ControlNet.                                                                                                                                                        |          |              |   Image   |

| [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)                              | Progressive Adversarial Diffusion Distillation.                                                                                 |[arXiv](https://arxiv.org/abs/2402.13929)  |              |   Image   |

| [SDXS](https://github.com/IDKiro/sdxs)                                                         | Real-Time One-Step Latent Diffusion Models with Image Conditions.                                                                                                                     |           |              |   Image   |

| [Stable.art](https://github.com/isekaidev/stable.art)                                          | Photoshop plugin for Stable Diffusion with Automatic1111 as backend (locally or with Google Colab).                                                                               |          |              |   Image   |

| [Stable Cascade](https://github.com/Stability-AI/StableCascade)                                | Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name "Stable Cascade".                              |          |              |   Image   |

| [Stable Diffusion](https://github.com/CompVis/stable-diffusion)                                | A latent text-to-image diffusion model.                                                                                                                                                |          |              |   Image   |

| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp)                         | Stable Diffusion in pure C/C++.                                                                                                                                                         |         |              |   Image   |

| [Stable Diffusion web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui)             | A browser interface based on Gradio library for Stable Diffusion.                                                                                                                       |         |              |   Image   |

| [Stable Diffusion web UI](https://github.com/Sygil-Dev/sygil-webui)                            | Web-based UI for Stable Diffusion.                                                                                                                                                      |         |              |   Image   |

| [Stable Diffusion WebUI Chinese](https://github.com/VinsonLaro/stable-diffusion-webui-chinese) | Chinese version of stable-diffusion-webui.                                                                                                                                             |          |              |   Image   |

| [Stable Diffusion XL](https://clipdrop.co/stable-diffusion)                                    | Generate images from text.                                                                                                      |[arXiv](https://arxiv.org/abs/2307.01952)  |              |   Image   |

| [Stable Diffusion XL Turbo](https://clipdrop.co/stable-diffusion-turbo)                        | Real-Time Text-to-Image Generation.                                                                                                                                                     |         |              |   Image   |

| [Stable Diffusion 3.5](https://github.com/Stability-AI/sd3.5)                                  | Stable Diffusion 3.5 open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo.                                     |         |              |   Image   |

| [Stable Doodle](https://clipdrop.co/stable-doodle)                                             | Stable Doodle is a sketch-to-image tool that converts a simple drawing into a dynamic image.                                                                                        |         |              |   Image   |

| [StableStudio](https://github.com/Stability-AI/StableStudio)                                   | StableStudio by Stability AI                                                                                                                                                         |            |              |   Image   |

| [StoryMaker](https://github.com/RedAIGC/StoryMaker)                                            | StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.                                                 |[arXiv](https://arxiv.org/abs/2409.12576)  |              |   Image   |

| [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion)                            | A Pipeline-Level Solution for Real-Time Interactive Generation.                                                                                                                        |          |              |   Image   |

| [StyleDrop](https://styledrop.github.io/)                                                      | Text-To-Image Generation in Any Style.                                                                                          |[arXiv](https://arxiv.org/abs/2306.00983)  |              |   Image   |

| [SyncDreamer](https://github.com/liuyuan-pal/SyncDreamer)                                      | Generating Multiview-consistent Images from a Single-view Image.                                                                |[arXiv](https://arxiv.org/abs/2309.03453)  |              |   Image   |

| [UltraEdit](https://github.com/HaozheZhao/UltraEdit)                                           | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.                                                               |[arXiv](https://arxiv.org/abs/2407.05282)  |              |   Image   |

| [UltraPixel](https://github.com/catcathh/UltraPixel)                                           | UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks.                                                       |[arXiv](https://arxiv.org/abs/2407.02158)  |              |   Image   |

| [Unity ML Stable Diffusion](https://github.com/keijiro/UnityMLStableDiffusion)                 | Core ML Stable Diffusion on Unity.                                                                                                                                                    |           |     Unity     |   Image   |

| [Vispunk Visions](https://vispunk.com/image)                                                   | Text-to-Image generation platform.                                                                                                                                                    |           |              |   Image   |

^ Back to Contents ^


## Texture

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [CRM](https://github.com/thu-ml/CRM)                                                           | Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.                                                       |[arXiv](https://arxiv.org/abs/2403.05034)  |              |  Texture  |

| [DreamMat](https://github.com/zzzyuqing/DreamMat)                                              | High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models.                                           |[arXiv](https://arxiv.org/abs/2405.17176)  |              |  Texture  |

| [DreamSpace](https://github.com/ybbbbt/dreamspace)                                             | Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation.                                                                                    |                                         |              |  Texture  |

| [Dream Textures](https://github.com/carson-katri/dream-textures)                               | Stable Diffusion built-in to Blender. Create textures, concept art, background assets, and more with a simple text prompt.                             |                                           |    Blender    |  Texture  |

| [InstructHumans](https://github.com/viridityzhu/InstructHumans)                                | Editing Animated 3D Human Textures with Instructions.                                                                           |[arXiv](https://arxiv.org/abs/2404.04037)  |              |  Texture  |

| [InteX](https://github.com/ashawkey/InTeX)                                                     | Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting.                                                       |[arXiv](https://arxiv.org/abs/2403.11878)  |              |  Texture  |

| [LLaMA-Mesh](https://github.com/nv-tlabs/LLaMA-Mesh)                                           | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models.                                                                   |[arXiv](https://arxiv.org/abs/2411.09595)  |              |  Mesh  |

| [MaterialSeg3D](https://github.com/PROPHETE-pro/MaterialSeg3D_)                                | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets.                                                         |[arXiv](https://arxiv.org/abs/2404.13923)  |              |  Texture  |

| [MeshAnything](https://github.com/buaacyw/MeshAnything)                                        | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets.                                                         |[arXiv](https://arxiv.org/abs/2406.10163)  |              |  Mesh  |

| [Neuralangelo](https://github.com/NVlabs/neuralangelo)                                         | High-Fidelity Neural Surface Reconstruction.                                                                                    |[arXiv](https://arxiv.org/abs/2306.03092)  |              |  Texture  |

| [Paint-it](https://github.com/postech-ami/paint-it)                                            | Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering.                                             |                                              |              |  Texture  |

| [Polycam](https://poly.cam/material-generator)                                                 | Create your own 3D textures just by typing.                                                                                                             |                                             |              |  Texture  |

| [TexFusion](https://research.nvidia.com/labs/toronto-ai/texfusion/)                            | Synthesizing 3D Textures with Text-Guided Image Diffusion Models.                                                               |[arXiv](https://arxiv.org/abs/2310.13772)  |              |  Texture  |

| [Text2Tex](https://daveredrum.github.io/Text2Tex/)                                             | Text-driven texture Synthesis via Diffusion Models.                                                                             |[arXiv](https://arxiv.org/abs/2303.11396)  |              |  Texture  |

| [Texture Lab](https://www.texturelab.xyz/)                                                     | AI-generated texures. You can generate your own with a text prompt.                                                                                     |                                             |              |  Texture  |

| [With Poly](https://withpoly.com/browse/textures)                                              | Create Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library.                          |                                            |              |  Texture  |

| [X-Mesh](https://github.com/xmu-xiaoma666/X-Mesh)                                              | X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.                             |[arXiv](https://arxiv.org/abs/2303.15764)  |              |  Texture  |

^ Back to Contents ^


## Shader

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AI Shader](https://github.com/keijiro/AIShader)                                               | ChatGPT-powered shader generator for Unity.                                                                                        |                                       |     Unity     |  Shader  |

^ Back to Contents ^


## 3D Model

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [Animate3D](https://github.com/yanqinJiang/Animate3D)                                          | Animate3D: Animating Any 3D Model with Multi-view Video Diffusion.                                                              |[arXiv](https://arxiv.org/abs/2407.11398)  |              |   3D   |

| [Anything-3D](https://github.com/Anything-of-anything/Anything-3D)                             | Segment-Anything + 3D. Let's lift the anything to 3D.                                                                           |[arXiv](https://arxiv.org/abs/2304.10261)  |              |   Model   |

| [Any2Point](https://github.com/Ivan-Tang-3D/Any2Point)                                         | Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding.                                                 |[arXiv](https://arxiv.org/abs/2404.07989)  |              |   3D   |

| [BlenderGPT](https://github.com/gd3kr/BlenderGPT)                                              | Use commands in English to control Blender with OpenAI's GPT-4.                                                                                         |                                          |    Blender    |   Model   |

| [Blender-GPT](https://github.com/TREE-Ind/Blender-GPT)                                         | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration.                                                                              |                                            |    Blender    |   Model   |

| [BlenderMCP](https://github.com/ahujasid/blender-mcp)                                          | BlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.                                                                                                   |                                            |    Blender    |   Model   |

| [Blockade Labs](https://www.blockadelabs.com/)                                                 | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts.         |                                          |              |   Model   |

| [CF-3DGS](https://github.com/NVlabs/CF-3DGS)                                                   | COLMAP-Free 3D Gaussian Splatting.                                                                                              |[arXiv](https://arxiv.org/abs/2312.07504)  |              |   3D   |

| [CharacterGen](https://github.com/zjp-shadow/CharacterGen)                                     | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization.                     |[arXiv](https://arxiv.org/abs/2402.17214)  |              |   3D   |

| [chatGPT-maya](https://github.com/LouisRossouw/chatGPT-maya)                                   | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions.                                                |                                           |     Maya     |   Model   |

| [CityDreamer](https://github.com/hzxie/city-dreamer)                                           | Compositional Generative Model of Unbounded 3D Cities.                                                                          |[arXiv](https://arxiv.org/abs/2309.00610)  |              |   3D   |

| [CSM](https://www.csm.ai/)                                                                     | Generate 3D worlds from images and videos.                                                                                                             |                                           |              |   3D   |

| [Dash](https://www.polygonflow.io/)                                                            | Your Copilot for World Building in Unreal Engine.                                                                                                     |                                            | Unreal Engine |   3D   |

| [DreamCatalyst](https://github.com/kaist-cvml-lab/DreamCatalyst)                               | DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation.                         |[arXiv](https://arxiv.org/abs/2407.11394)  |              |   3D   |

| [DreamGaussian4D](https://github.com/jiawei-ren/dreamgaussian4d)                               | Generative 4D Gaussian Splatting.                                                                                               |[arXiv](https://arxiv.org/abs/2312.17142)  |              |   4D   |

| [DUSt3R](https://github.com/naver/dust3r)                                                      | Geometric 3D Vision Made Easy.                                                                                                  |[arXiv](https://arxiv.org/abs/2312.14132)  |              |   3D   |

| [Edify 3D](https://research.nvidia.com/labs/dir/edify-3d/)                                     | Edify 3D: Scalable High-Quality 3D Asset Generation.                                                                            |[arXiv](https://arxiv.org/abs/2411.07135)  |              |   3D   |

| [GALA3D](https://github.com/VDIGPKU/GALA3D)                                                    | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting.                            |[arXiv](https://arxiv.org/abs/2402.07207)  |              |   3D   |

| [GaussCtrl](https://github.com/ActiveVisionLab/gaussctrl)                                      | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing.                                                     |[arXiv](https://arxiv.org/abs/2403.08733)  |              |   3D   |

| [GaussianCube](https://github.com/GaussianCube/GaussianCube)                                   | A Structured and Explicit Radiance Representation for 3D Generative Modeling.                                                   |[arXiv](https://arxiv.org/abs/2403.19655)  |              |   3D   |

| [GaussianDreamer](https://github.com/hustvl/GaussianDreamer)                                   | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors.                                                     |[arXiv](https://arxiv.org/abs/2310.08529)  |              |   3D   |

| [GenieLabs](https://www.genielabs.tech/)                                                       | Empower your game with AI-UGC.                                                                                                                         |                                           |              |   3D   |

| [HiFA](https://hifa-team.github.io/HiFA-site/)                                                 | High-fidelity Text-to-3D with advance Diffusion guidance.                                                                                              |                                           |              |   Model   |

| [HoloDreamer](https://github.com/zhouhyOcean/HoloDreamer)                                      | HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions.                                                     |[arXiv](https://arxiv.org/abs/2407.15187)  |              |   3D   |

| [Hunyuan3D-1.0](https://github.com/Tencent/Hunyuan3D-1)                                        | Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation.                                                   |[arXiv](https://arxiv.org/abs/2411.02293)  |              |   3D   |

| [Hunyuan3D 2.0](https://github.com/Tencent/Hunyuan3D-2)                                        | Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation.                           |[arXiv](https://arxiv.org/abs/2501.12202)  |              |   3D   |

| [Infinigen](https://github.com/princeton-vl/infinigen)                                         | Infinite Photorealistic Worlds using Procedural Generation.                                                                     |[arXiv](https://arxiv.org/abs/2306.09310)  |              |   3D   |

| [Instruct-NeRF2NeRF](https://instruct-nerf2nerf.github.io/)                                    | Editing 3D Scenes with Instructions.                                                                                            |[arXiv](https://arxiv.org/abs/2303.12789)  |              |   Model   |

| [Interactive3D](https://github.com/interactive-3d/interactive3d)                               | Create What You Want by Interactive 3D Generation.                                                                              |[arXiv](https://arxiv.org/abs/2404.16510)  |              |   3D   |

| [Isotropic3D](https://github.com/pkunliu/Isotropic3D)                                          | Image-to-3D Generation Based on a Single CLIP Embedding.                                                                                              |                                            |              |   3D   |

| [LATTE3D](https://research.nvidia.com/labs/toronto-ai/LATTE3D/)                                | Large-scale Amortized Text-To-Enhanced3D Synthesis.                                                                             |[arXiv](https://arxiv.org/abs/2403.15385)  |              |   3D   |

| [LION](https://github.com/nv-tlabs/LION)                                                       | Latent Point Diffusion Models for 3D Shape Generation.                                                                          |[arXiv](https://arxiv.org/abs/2210.06978)  |              |   Model   |

| [Luma AI](https://lumalabs.ai/)                                                                | Capture in lifelike 3D. Unmatched photorealism, reflections, and details. The future of VFX is now, for everyone!                                    |                                           |              |   Model   |

| [lumine AI](https://ilumine.ai/)                                                               | AI-Powered Creativity.                                                                                                                                |                                            |              |   3D   |

| [Make-It-3D](https://github.com/junshutang/Make-It-3D)                                         | High-Fidelity 3D Creation from A Single Image with Diffusion Prior.                                                             |[arXiv](https://arxiv.org/abs/2303.14184)  |              |   Model   |

| [Meshy](https://www.meshy.ai/)                                                                 | Create Stunning 3D Game Assets with AI.                                                                                                                |                                           |              |   3D   |

| [Mootion](https://mootion.com/landing)                                                         | Magical 3D AI Animation Maker.                                                                                                                         |                                           |              |   3D   |

| [MVDream](https://github.com/MV-Dream/MVDream)                                                 | Multi-view Diffusion for 3D Generation.                                                                                         |[arXiv](https://arxiv.org/abs/2308.16512)  |              |   3D   |

| [NVIDIA Instant NeRF](https://github.com/NVlabs/instant-ngp)                                   | Instant neural graphics primitives: lightning fast NeRF and more.                                                                                      |                                           |              |   Model   |

| [One-2-3-45](https://one-2-3-45.github.io/)                                                    | Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.                                                       |[arXiv](https://arxiv.org/abs/2306.16928)  |              |   Model   |

| [Paint3D](https://github.com/OpenTexture/Paint3D)                                              | Paint Anything 3D with Lighting-Less Texture Diffusion Models.                                                                  |[arXiv](https://arxiv.org/abs/2312.13913)  |              |   3D   |

| [PAniC-3D](https://github.com/shuhongchen/panic3d-anime-reconstruction)                        | Stylized Single-view 3D Reconstruction from Portraits of Anime Characters.                                                      |[arXiv](https://arxiv.org/abs/2303.14587)  |              |   Model   |

| [Point·E](https://github.com/openai/point-e)                                                  | Point cloud diffusion for 3D model synthesis.                                                                                                           |                                           |              |   Model   |

| [ProlificDreamer](https://ml.cs.tsinghua.edu.cn/prolificdreamer/)                              | High-Fidelity and diverse Text-to-3D generation with Variational score Distillation.                                            |[arXiv](https://arxiv.org/abs/2305.16213)  |              |   Model   |

| [SF3D](https://github.com/Stability-AI/stable-fast-3d)                                         | SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement.                      |[arXiv](https://arxiv.org/abs/2408.00653)  |              |   3D   |

| [Shap-E](https://github.com/openai/shap-e)                                                    | Generate 3D objects conditioned on text or images.                                                                               |[arXiv](https://arxiv.org/abs/2305.02463)  |              |   Model   |

| [Sloyd](https://www.sloyd.ai/)                                                                 | 3D modelling has never been easier.                                                                                                                   |                                            |              |   Model   |

| [Spline AI](https://spline.design/ai)                                                          | The power of AI is coming to the 3rd dimension. Generate objects, animations, and textures using prompts.                                      |                                           |              |   Model   |

| [Stable Dreamfusion](https://github.com/ashawkey/stable-dreamfusion)                           | A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.                                   |                                          |              |   Model   |

| [SV3D](https://sv3d.github.io/)                                                                | Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion.                            |[arXiv](https://arxiv.org/abs/2403.12008)  |              |   3D   |

| [Tafi](https://maketafi.com/ai)                                                                | AI text to 3D character engine.                                                                                                                         |                                          |              |   Model   |

| [3D-GPT](https://chuny1.github.io/3DGPT/3dgpt.html)                                            | Procedural 3D Modeling with Large Language Models.                                                                              |[arXiv](https://arxiv.org/abs/2310.12945)  |              |   3D   |

| [3D-LLM](https://github.com/UMass-Foundation-Model/3D-LLM)                                     | Injecting the 3D World into Large Language Models.                                                                              |[arXiv](https://arxiv.org/abs/2307.12981)  |              |   3D   |

| [3Dpresso](https://3dpresso.ai/)                                                               | Extract a 3D model of an object, captured on a video.                                                                                                   |                                          |              |   Model   |

| [3DTopia](https://github.com/3DTopia/3DTopia)                                                  | Text-to-3D Generation within 5 Minutes.                                                                                         |[arXiv](https://arxiv.org/abs/2403.02234)  |              |   3D   |

| [3DTopia-XL](https://github.com/3DTopia/3DTopia-XL)                                                  | 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.                                                                                         |[arXiv](https://arxiv.org/abs/2409.12957)  |              |   3D   |

| [threestudio](https://github.com/threestudio-project/threestudio)                              | A unified framework for 3D content generation.                                                                                                           |                                         |              |   Model   |

| [TripoSR](https://github.com/VAST-AI-Research/TripoSR)                                         | A state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image.                                |[arXiv](https://arxiv.org/abs/2403.02151)  |              |   Model   |

| [Unique3D](https://github.com/AiuniAI/Unique3D)                                                | High-Quality and Efficient 3D Mesh Generation from a Single Image.                                                              |[arXiv](https://arxiv.org/abs/2405.20343)  |              |   3D   |

| [UnityGaussianSplatting](https://github.com/aras-p/UnityGaussianSplatting)                     | Toy Gaussian Splatting visualization in Unity.                                                                                                          |                                          |     Unity     |   3D   |

| [ViVid-1-to-3](https://github.com/ubc-vision/vivid123)                                         | Novel View Synthesis with Video Diffusion Models.                                                                               |[arXiv](https://arxiv.org/abs/2312.01305)  |              |   3D   |

| [Voxcraft](https://voxcraft.ai/)                                                               | Crafting Ready-to-Use 3D Models with AI.                                                                                                               |                                           |              |   3D   |

| [Wonder3D](https://github.com/xxlong0/Wonder3D)                                                | Single Image to 3D using Cross-Domain Diffusion.                                                                                |[arXiv](https://arxiv.org/abs/2310.15008)  |              |   3D   |

| [Zero-1-to-3](https://github.com/cvlab-columbia/zero123)                                       | Zero-shot One Image to 3D Object.                                                                                               |[arXiv](https://arxiv.org/abs/2303.11328)  |              |   Model   |

^ Back to Contents ^


## Avatar

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AniPortrait](https://github.com/Zejun-Yang/AniPortrait)                                       | Audio-Driven Synthesis of Photorealistic Portrait Animations.                                                                   |[arXiv](https://arxiv.org/abs/2403.17694)  |              |  Avatar  |

| [CALM](https://github.com/NVlabs/CALM)                                                         | Conditional Adversarial Latent Models for Directable Virtual Characters.                                                        |[arXiv](https://arxiv.org/abs/2305.02195)  |              |  Avatar  |

| [ChatAvatar](https://hyperhuman.deemos.com/chatavatar)                                         | Progressive generation Of Animatable 3D Faces Under Text guidance.                                                              |     |              |  Avatar  |

| [ChatdollKit](https://github.com/uezo/ChatdollKit)                                             | ChatdollKit enables you to make your 3D model into a chatbot.                                                                                                |                                      |     Unity     |  Avatar  |

| [Ditto](https://github.com/antgroup/ditto-talkinghead)                                         | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis.                                                 |[arXiv](https://arxiv.org/abs/2411.19509)  |              |  Avatar  |

| [DreamTalk](https://github.com/ali-vilab/dreamtalk)                                            | When Expressive Talking Head Generation Meets Diffusion Probabilistic Models.                                                   |[arXiv](https://arxiv.org/abs/2312.09767)  |              |  Avatar  |

| [Duix](https://github.com/GuijiAI/duix.ai)                                                     | Duix - Silicon-Based Digital Human SDK 🌐🤖                                                                                                                     |                                  |              |  Avatar  |

| [EchoMimic](https://github.com/BadToBest/EchoMimic)                                            | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions.                           |[arXiv](https://arxiv.org/abs/2407.08136)  |              |  Avatar  |

| [EMOPortraits](https://github.com/neeek2303/EMOPortraits)                                      | Emotion-enhanced Multimodal One-shot Head Avatars.                                                                                                            |                                     |              |  Avatar  |

| [EmoVOCA](https://github.com/miccunifi/EmoVOCA)                                                | EmoVOCA: Speech-Driven Emotional 3D Talking Heads.                                                                              |[arXiv](https://arxiv.org/abs/2403.12886)  |              |  Avatar  |

| [E3 Gen](https://github.com/olivia23333/E3Gen)                                                 | Efficient, Expressive and Editable Avatars Generation.                                                                          |[arXiv](https://arxiv.org/abs/2405.19203)  |              |  Avatar  |

| [ExAvatar](https://github.com/mks0601/ExAvatar_RELEASE)                                        | ExAvatar - Expressive Whole-Body 3D Gaussian Avatar.                                                                            |[arXiv](https://arxiv.org/abs/2407.21686)  |              |  Avatar  |

| [GeneAvatar](https://github.com/zju3dv/GeneAvatar)                                             | Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image.                                                    |[arXiv](https://arxiv.org/abs/2404.02152)  |              |  Avatar  |

| [GeneFace++](https://github.com/yerfor/GeneFacePlusPlus)                                       | Generalized and Stable Real-Time 3D Talking Face Generation.                                                                                                     |                                  |              |  Avatar  |

| [Hallo](https://github.com/fudan-generative-vision/hallo)                                      | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation.                                                        |[arXiv](https://arxiv.org/abs/2406.08801)  |              |  Avatar  |

| [Hallo2](https://github.com/fudan-generative-vision/hallo2)                                    | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation.                                                |[arXiv](https://arxiv.org/abs/2410.07718)  |              |  Avatar  |

| [HeadSculpt](https://brandonhan.uk/HeadSculpt/)                                                | Crafting 3D Head Avatars with Text.                                                                                             |[arXiv](https://arxiv.org/abs/2306.03038)  |              |  Avatar  |

| [IntrinsicAvatar](https://github.com/taconite/IntrinsicAvatar)                                 | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing.      |[arXiv](https://arxiv.org/abs/2312.05210)  |              |  Avatar  |

| [Linly-Talker](https://github.com/Kedreamix/Linly-Talker)                                      | Digital Avatar Conversational System.                                                                                                                          |                                    |              |  Avatar  |

| [LivePortrait](https://github.com/KwaiVGI/LivePortrait)                                        | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.                                              |[arXiv](https://arxiv.org/abs/2407.03168)  |              |  Avatar  |

| [MotionGPT](https://github.com/OpenMotionLab/MotionGPT)                                        | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs.                                 |[arXiv](https://arxiv.org/abs/2306.14795)  |              |  Avatar  |

| [MusePose](https://github.com/TMElyralab/MusePose)                                             | MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation.                                                                                        |                                    |              |  Avatar  |

| [MuseTalk](https://github.com/TMElyralab/MuseTalk)                                             | Real-Time High Quality Lip Synchorization with Latent Space Inpainting.                                                                                        |                                    |              |  Avatar  |

| [MuseV](https://github.com/TMElyralab/MuseV)                                           | Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising.                                                        |                                    |              |  Avatar  |

| [Portrait4D](https://github.com/YuDeng/Portrait-4D)                                            | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data.                                                                |[arXiv](https://arxiv.org/abs/2311.18729)  |              |  Avatar  |

| [Ready Player Me](https://readyplayer.me/)                                                     | Integrate customizable avatars into your game or app in days.                                                                                                  |                                    |              |  Avatar  |

| [RodinHD](https://rodinhd.github.io/)                                                          | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.                                                              |[arXiv](https://arxiv.org/abs/2407.06938)  |              |  Avatar  |

| [StyleAvatar3D](https://github.com/icoz69/StyleAvatar3D)                                       | Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation.                                                  |[arXiv](https://arxiv.org/abs/2305.19012)  |              |  Avatar  |

| [Text2Control3D](https://text2control3d.github.io/)                                            | Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model.                |[arXiv](https://arxiv.org/abs/2309.03550)  |              |  Avatar  |

| [Topo4D](https://github.com/XuanchenLi/Topo4D)                                                 | Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture.                                                       |[arXiv](https://arxiv.org/abs/2406.00440)  |              |  Avatar  |

| [UnityAIWithChatGPT](https://github.com/haili1234/UnityAIWithChatGPT)                          | Based on Unity, ChatGPT+UnityChan voice interactive display is realized.                                                                                      |                                     |     Unity     |  Avatar  |

| [Vid2Avatar](https://moygcc.github.io/vid2avatar/)                                             | 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition.                                       |[arXiv](https://arxiv.org/abs/2302.11566)  |              |  Avatar  |

| [VLOGGER](https://enriccorona.github.io/vlogger/)                                              | Multimodal Diffusion for Embodied Avatar Synthesis.                                                                                                           |                                     |              |  Avatar  |

| [Wild2Avatar](https://cs.stanford.edu/~xtiange/projects/wild2avatar/)                          | Rendering Humans Behind Occlusions.                                                                                             |[arXiv](https://arxiv.org/abs/2401.00431)  |              |  Avatar  |

^ Back to Contents ^


## Animation

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [Animate Anyone](https://github.com/HumanAIGC/AnimateAnyone)                                   | Consistent and Controllable Image-to-Video Synthesis for Character Animation.                                                   |[arXiv](https://arxiv.org/abs/2311.17117)  |              | Animation |

| [AnimateAnything](https://animationai.github.io/AnimateAnything/)                              | Fine-Grained Open Domain Image Animation with Motion Guidance.                                                                  |[arXiv](https://arxiv.org/abs/2311.12886)  |              | Animation |

| [AnimateDiff](https://github.com/guoyww/animatediff/)                                          | Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.                                               |[arXiv](https://arxiv.org/abs/2307.04725)  |              | Animation |

| [AnimateLCM](https://github.com/G-U-N/AnimateLCM)                                              | Let's Accelerate the Video Generation within 4 Steps!                                                                           |[arXiv](https://arxiv.org/abs/2402.00769)  |              | Animation |

| [Animate-X](https://github.com/Lucaria-Academy/Animate-X)                                      | Animate-X: Universal Character Image Animation with Enhanced Motion Representation.                                             |[arXiv](https://arxiv.org/abs/2410.10306)  |              | Animation |

| [AnimateZero](https://vvictoryuki.github.io/animatezero.github.io/)                            | Video Diffusion Models are Zero-Shot Image Animators.                                                                           |[arXiv](https://arxiv.org/abs/2312.03793)  |              | Animation |

| [AnimationGPT](https://github.com/fyyakaxyy/AnimationGPT)                                      | An AIGC tool for generating game combat motion assets.                                                                          |                                                                            |              | Animation |

| [Deforum](https://deforum.art/)                                                                | Deforum leverages Stable Diffusion to generate evolving AI visuals.                                                             |                                                                            |              | Animation |

| [DrawingSpinUp](https://github.com/LordLiang/DrawingSpinUp)                                    | DrawingSpinUp: 3D Animation from Single Character Drawings.                                                                     |[arXiv](https://arxiv.org/abs/2409.08615)  |              | Animation |

| [DreaMoving](https://dreamoving.github.io/dreamoving/)                                         | A Human Video Generation Framework based on Diffusion Models.                                                                   |[arXiv](https://arxiv.org/abs/2312.05107)  |              | Animation |

| [FaceFusion](https://github.com/facefusion/facefusion)                                         | Next generation face swapper and enhancer.                                                                                       |                                                                           |              | Animation |

| [FreeInit](https://tianxingwu.github.io/pages/FreeInit/)                                       | Bridging Initialization Gap in Video Diffusion Models.                                                                          |[arXiv](https://arxiv.org/abs/2312.07537)  |              | Animation |

| [GeneFace](https://github.com/yerfor/GeneFace)                                                 | Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.                                                           |[arXiv](https://arxiv.org/abs/2301.13430)  |              | Animation |

| [ID-Animator](https://github.com/ID-Animator/ID-Animator)                                      | Zero-Shot Identity-Preserving Human Video Generation.                                                                           |[arXiv](https://arxiv.org/abs/2404.15275)  |              | Animation |

| [MagicAnimate](https://showlab.github.io/magicanimate/)                                        | Temporally Consistent Human Image Animation using Diffusion Model.                                                              |[arXiv](https://arxiv.org/abs/2311.16498)  |              | Animation |

| [NUWA](https://msra-nuwa.azurewebsites.net/#/)                                                 | DragNUWA is an open-domain diffusion-based video generation model takes text, image, and trajectory controls as inputs to achieve controllable video generation.   |[arXiv](https://arxiv.org/abs/2308.08089)  |              | Animation |

| [NUWA-Infinity](https://nuwa-infinity.microsoft.com/#/NUWAInfinity)                            | NUWA-Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input.       |                                                   |              | Animation |

| [NUWA-XL](https://msra-nuwa.azurewebsites.net/#/NUWAXL)                                        | A novel Diffusion over Diffusion architecture for eXtremely Long video generation.                                              |                                                                            |              | Animation |

| [Omni Animation](https://omnianimation.ai/)                                                    | AI Generated High Fidelity Animations.                                                                                          |                                                                            |              | Animation |

| [PIA](https://pi-animator.github.io/)                                                          | Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.                                             |[arXiv](https://arxiv.org/abs/2312.13964)  |              | Animation |

| [SadTalker](https://github.com/Winfredy/SadTalker)                                             | Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation.                       |[arXiv](https://arxiv.org/abs/2211.12194)  |              | Animation |

| [SadTalker-Video-Lip-Sync](https://github.com/Zz-ww/SadTalker-Video-Lip-Sync)                  | This project is based on SadTalkers Wav2lip for video lip synthesis.                                                           |                                                                             |              | Animation |

| [Stable Animation](https://stability.ai/news/stable-animation-sdk)                             | A powerful text-to-animation tool for developers.                                                                              |                                                                             |              | Animation |

| [TaleCrafter](https://github.com/VideoCrafter/TaleCrafter)                                     | An interactive story visualization tool that support multiple characters.                                                       |[arXiv](https://arxiv.org/abs/2305.18247)  |              | Animation |

| [ToonCrafter](https://github.com/ToonCrafter/ToonCrafter)                                      | ToonCrafter: Generative Cartoon Interpolation.                                                                                  |[arXiv](https://arxiv.org/abs/2405.17933v1)  |              | Animation |

| [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)                                                 | Accurately Lip-syncing Videos In The Wild.                                                                                      |[arXiv](https://arxiv.org/abs/2008.10010)  |              | Animation |

| [Wonder Studio](https://wonderdynamics.com/)                                                   | An AI tool that automatically animates, lights and composes CG characters into a live-action scene.                         |                                                                            |              | Animation |

^ Back to Contents ^


## Visual

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [Cambrian-1](https://github.com/cambrian-mllm/cambrian)                                     | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.                                                            |[arXiv](https://arxiv.org/abs/2406.16860)  |              |   Multimodal LLMs  |

| [CogVLM2](https://github.com/THUDM/CogVLM2)                                                 | GPT4V-level open-source multi-modal model based on Llama3-8B.                                                                                       |                           |              |   Visual  |

| [CoTracker](https://co-tracker.github.io/)                                                  | It is Better to Track Together.                                                                                                      |[arXiv](https://arxiv.org/abs/2307.07635)  |               | Visual |

| [EVF-SAM](https://github.com/hustvl/EVF-SAM)                                                | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model.                                                      |[arXiv](https://arxiv.org/abs/2406.20076)  |               | Visual |

| [FaceHi](https://m.facehi.ai/)                                                              | It is Better to Track Together.                                                                                                                       |                                           |               | Visual |

| [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer)                       | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.           |[arXiv](https://arxiv.org/abs/2404.06512)  |               | Visual |

| [Kangaroo](https://github.com/KangarooGroup/Kangaroo)                                       | Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input.                                                                        |                                           |               | Visual |

| [LGVI](https://jianzongwu.github.io/projects/rovi/)                                         | Towards Language-Driven Video Inpainting via Multimodal Large Language Models.                                                                         |                                           |               | Visual |

| [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp)                                          | Extending Visual Capabilities with LLaMA-3 and Phi-3.                                                                                                     |                                     |              |   Visual  |

| [LLaVA-OneVision](https://github.com/LLaVA-VL/LLaVA-NeXT)                                   | LLaVA-OneVision: Easy Visual Task Transfer.                                                                                           |[arXiv](https://arxiv.org/abs/2408.03326)  |              |   Visual  |

| [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA)                                        | Long Context Transfer from Language to Vision.                                                                                        |[arXiv](https://arxiv.org/abs/2406.16852)  |              |   Visual  |

| [MaskViT](https://maskedvit.github.io/)                                                     | Masked Visual Pre-Training for Video Prediction.                                                                                      |[arXiv](https://arxiv.org/abs/2206.11894)  |              | Visual |

| [MiniCPM-Llama3-V 2.5](https://github.com/OpenBMB/MiniCPM-V)                                | A GPT-4V Level MLLM on Your Phone.                                                                                                                        |                                      |              |   Visual  |

| [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)                                     | Mixture of Experts for Large Vision-Language Models.                                                                                  |[arXiv](https://arxiv.org/abs/2401.15947)  |              |   Visual  |

| [MotionLLM](https://github.com/IDEA-Research/MotionLLM)                                     | Understanding Human Behaviors from Human Motions and Videos.                                                                          |[arXiv](https://arxiv.org/abs/2405.20340)  |              |   Visual  |

| [PLLaVA](https://github.com/magic-research/PLLaVA)                                          | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning.                                                      |[arXiv](https://arxiv.org/abs/2404.16994)  |              |   Visual  |

| [Qwen-VL](https://github.com/QwenLM/Qwen-VL)                                                | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.                                          |[arXiv](https://arxiv.org/abs/2308.12966)  |              |   Visual  |

| [Sapiens](https://github.com/facebookresearch/sapiens)                                      | Sapiens: Foundation for Human Vision Models.                                                                                          |[arXiv](https://arxiv.org/abs/2408.12569)  |              |   Visual  |

| [ShareGPT4V](https://github.com/ShareGPT4Omni/ShareGPT4V)                                   | Improving Large Multi-modal Models with Better Captions.                                                                              |[arXiv](https://arxiv.org/abs/2311.12793)  |              |   Visual  |

| [SOLO](https://github.com/Yangyi-Chen/SOLO)                                                 | SOLO: A Single Transformer for Scalable Vision-Language Modeling.                                                                     |[arXiv](https://arxiv.org/abs/2407.06438)  |              |   Visual  |

| [Video-CCAM](https://github.com/QQ-MM/Video-CCAM)                                           | Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks.                                                                                          |  |              |   Visual  |

| [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA)                                 | Learning United Visual Representation by Alignment Before Projection.                                                                 |[arXiv](https://arxiv.org/abs/2311.10122)  |              |   Visual  |

| [VideoLLaMA 2](https://github.com/DAMO-NLP-SG/VideoLLaMA2)                                  | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs.                                                            |[arXiv](https://arxiv.org/abs/2406.07476)  |              |   Visual  |

| [VideoLLaMA 3](https://github.com/DAMO-NLP-SG/VideoLLaMA3)                                  | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding.                                                |[arXiv](https://arxiv.org/abs/2501.13106)  |              |   Visual  |

| [Video-MME](https://github.com/BradyFU/Video-MME)                                           | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.                                              |[arXiv](https://arxiv.org/abs/2405.21075)  |              |   Visual  |

| [Vitron](https://github.com/SkyworkAI/Vitron)                                               | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing.                                                                      |                                      |              |   Visual  |

| [VILA](https://github.com/NVlabs/VILA)                                                      | VILA: On Pre-training for Visual Language Models.                                                                                     |[arXiv](https://arxiv.org/abs/2312.07533)  |              |   Visual  |

^ Back to Contents ^


## Video

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [360DVD](https://akaneqwq.github.io/360DVD/)                                                   | Controllable Panorama Video Generation with 360-Degree Video Diffusion Model.                                                |[arXiv](https://arxiv.org/abs/2401.06578)     |              |   Video   |

| [Animate-A-Story](https://github.com/VideoCrafter/Animate-A-Story)                             | Retrieval-Augmented Video Generation for Telling a Story.                                                                    |[arXiv](https://arxiv.org/abs/2307.06940)     |              |   Video   |

| [Anything in Any Scene](https://anythinginanyscene.github.io/)                                 | Photorealistic Video Object Insertion.                                                                                                                                           |               |              |   Video   |

| [ART•V](https://warranweng.github.io/art.v/)                                                   | Auto-Regressive Text-to-Video Generation with Diffusion Models.                                                              |[arXiv](https://arxiv.org/abs/2311.18834)     |              |   Video   |

| [Assistive](https://assistive.chat/product/video)                                              | Meet the generative video platform that brings your ideas to life.                                                                                                              |                |              |   Video   |

| [AtomoVideo](https://atomo-video.github.io/)                                                   | High Fidelity Image-to-Video Generation.                                                                                     |[arXiv](https://arxiv.org/abs/2403.01800)     |              |   Video   |

| [BackgroundRemover](https://github.com/nadermx/backgroundremover)                              | Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.                           |                |              |   Video   |

| [Boximator](https://boximator.github.io/)                                                      | Generating Rich and Controllable Motions for Video Synthesis.                                                                |[arXiv](https://arxiv.org/abs/2402.01566)     |              |   Video   |

| [CoDeF](https://github.com/qiuyu96/codef)                                                      | Content Deformation Fields for Temporally Consistent Video Processing.                                                       |[arXiv](https://arxiv.org/abs/2308.07926)     |              |   Video   |

| [CogVideo](https://models.aminer.cn/cogvideo/)                                                 | Generate Videos from Text Descriptions.                                                                                                                                           |              |              |   Video   |

| [CogVideoX](https://github.com/THUDM/CogVideo)                                                 | CogVideoX is an open-source version of the video generation model, which is homologous to 清影.                                                                                   |              |              |   Video   |

| [CogVLM](https://github.com/THUDM/CogVLM)                                                      | CogVLM is a powerful open-source visual language model (VLM).                                                                                                                    |               |              |   Visual   |

| [CoNR](https://github.com/megvii-research/CoNR)                                                | Genarate vivid dancing videos from hand-drawn anime character sheets(ACS).                                                   |[arXiv](https://arxiv.org/abs/2207.05378)     |              |   Video   |

| [Decohere](https://www.decohere.ai/)                                                           | Create what can't be filmed.                                                                                                                                                      |              |              |   Video   |

| [Descript](https://www.descript.com/)                                                          | Descript is the simple, powerful , and fun way to edit.                                                                                                                           |              |              |   Video   |

| [Diffutoon](https://github.com/modelscope/DiffSynth-Studio)                                    | High-Resolution Editable Toon Shading via Diffusion Models.                                                                  |[arXiv](https://arxiv.org/abs/2401.16224)     |              |   Video   |

| [dolphin](https://github.com/kaleido-lab/dolphin)                                              | General video interaction platform based on LLMs.                                                                                                                                 |              |              |   Video   |

| [DomoAI](https://domoai.app/)                                                                  | Amplify Your Creativity with DomoAI.                                                                                                                                             |               |              |   Video   |

| [DreamCinema](https://github.com/chen-wl20/DreamCinema)                                        | DreamCinema: Cinematic Transfer with Free Camera and 3D Character.                                                           |[arXiv](https://www.arxiv.org/abs/2408.12601)     |              |   Video   |

| [DynamiCrafter](https://doubiiu.github.io/projects/DynamiCrafter/)                             | Animating Open-domain Images with Video Diffusion Priors.                                                                    |[arXiv](https://arxiv.org/abs/2310.12190)     |              |   Video   |

| [EDGE](https://github.com/Stanford-TML/EDGE)                                                   | We introduce EDGE, a powerful method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to arbitrary input music.  |[arXiv](https://arxiv.org/abs/2211.10658)     |              |   Video   |

| [EMO](https://humanaigc.github.io/emote-portrait-alive/)                                       | Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions.       |[arXiv](https://arxiv.org/abs/2402.17485)     |              |   Video   |

| [Emu Video](https://emu-video.metademolab.com/)                                                | Factorizing Text-to-Video Generation by Explicit Image Conditioning.                                                                                                             |               |              |   Video   |

| [Etna](https://etna.7volcanoes.com/)                                                           | Etna can generate corresponding video content based on short text descriptions.                                                                                                   |              |              |   Video   |

| [Fairy](https://fairy-video2video.github.io/)                                                  | Fast Parallelized Instruction-Guided Video-to-Video Synthesis.                                                                                                                    |              |              |   Video   |

| [Follow-Your-Canvas](https://github.com/mayuelala/FollowYourCanvas)                            | Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation.                        |[arXiv](https://arxiv.org/abs/2409.01055)     |              |   Video   |

| [Follow Your Pose](https://follow-your-pose.github.io/)                                        | Pose-Guided Text-to-Video Generation using Pose-Free Videos.                                                                 |[arXiv](https://arxiv.org/abs/2304.01186)     |              |   Video   |

| [FullJourney](https://www.fulljourney.ai/)                                                     | Your complete suite of AI Creation tools at your fingertips.                                                                                                                       |             |              |   Video   |

| [Gen-2](https://research.runwayml.com/gen2)                                                    | A multi-modal AI system that can generate novel videos with text, images, or video clips.                                                                                          |             |              |   Video   |

| [Generative Dynamics](https://generative-dynamics.github.io/)                                  | Generative Image Dynamics.                                                                                                                                                         |             |              |   Video   |

| [Genie](https://sites.google.com/view/genie-2024/home)                                         | Generative Interactive Environments.                                                                                         |[arXiv](https://arxiv.org/abs/2402.15391)     |              |   Video   |

| [Genmo](https://www.genmo.ai/create/video)                                                     | Magically make videos with AI.                                                                                                                                                      |            |              |   Video   |

| [GenTron](https://www.shoufachen.com/gentron_website/)                                         | Diffusion Transformers for Image and Video Generation.                                                                                                                              |            |              |   Video   |

| [HiGen](https://higen-t2v.github.io/)                                                          | Hierarchical Spatio-temporal Decoupling for Text-to-Video generation.                                                                                                               |            |              |   Video   |

| [Hotshot-XL](https://github.com/hotshotco/Hotshot-XL)                                          | Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.                                                                                              |              |              |   Video   |

| [HunyuanVideo](https://github.com/Tencent/HunyuanVideo)                                        | HunyuanVideo: A Systematic Framework For Large Video Generation Model.                                                                                         |[arXiv](https://arxiv.org/abs/2412.03603)     |              |   Video   |

| [Imagen Video](https://imagen.research.google/video/)                                          | Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. |   |              |   Video   |

| [InstructVideo](https://instructvideo.github.io/)                                              | Instructing Video Diffusion Models with Human Feedback.                                                                      |[arXiv](https://arxiv.org/abs/2312.12490)     |              |   Video   |

| [I2VGen-XL](https://i2vgen-xl.github.io/)                                                      | High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.                                                         |[arXiv](https://arxiv.org/abs/2311.04145)     |              |   Video   |

| [LaVie](https://vchitect.github.io/LaVie-project/)                                             | High-Quality Video Generation with Cascaded Latent Diffusion Models.                                                         |[arXiv](https://arxiv.org/abs/2309.15103)     |              |   Video   |

| [LTX Studio](https://ltx.studio/)                                                              | LTX Studio is a holistic, AI-driven filmmaking platform for creators, marketers, filmmakers and studios.                                                                 |              |              |   Video   |

| [LTX-Video](https://github.com/Lightricks/LTX-Video)                                           | LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them.                      |              |              |   Video   |

| [Lumiere](https://lumiere-video.github.io/)                                                    | A Space-Time Diffusion Model for Video Generation.                                                                           |[arXiv](https://arxiv.org/abs/2401.12945)     |              |   Video   |

| [LVDM](https://yingqinghe.github.io/LVDM/)                                                     | Latent Video Diffusion Models for High-Fidelity Long Video Generation.                                                       |[arXiv](https://arxiv.org/abs/2211.13221)     |              |   Video   |

| [MagicVideo](https://magicvideo.github.io/)                                                    | Efficient Video Generation With Latent Diffusion Models.                                                                     |[arXiv](https://arxiv.org/abs/2211.11018)     |              |   Video   |

| [MagicVideo-V2](https://magicvideov2.github.io/)                                               | Multi-Stage High-Aesthetic Video Generation.                                                                                 |[arXiv](https://arxiv.org/abs/2401.04468)     |              |   Video   |

| [Magic Hour](https://magichour.ai/)                                                            | AI Video for Creators made simple.                                                                                                                                                  |            |              |   Video   |

| [MAGVIT-v2](https://magvit.cs.cmu.edu/v2/)                                                     | Tokenizer is key to visual generation.                                                                                                                                              |            |              |   Video   |

| [MAGVIT](https://magvit.cs.cmu.edu/)                                                           | Masked Generative Video Transformer.                                                                                                                                                |            |              |   Video   |

| [Make-A-Video](https://makeavideo.studio/)                                                     | Make-A-Video is a state-of-the-art AI system that generates videos from text.                                                |[arXiv](https://arxiv.org/abs/2209.14792)     |              |   Video   |

| [Make Pixels Dance](https://makepixelsdance.github.io/)                                        | High-Dynamic Video Generation.                                                                                               |[arXiv](https://arxiv.org/abs/2311.10982)     |              |   Video   |

| [Make-Your-Video](https://doubiiu.github.io/projects/Make-Your-Video/)                         | Customized Video Generation Using Textual and Structural Guidance.                                                           |[arXiv](https://arxiv.org/abs/2306.00943)     |              |   Video   |

| [MicroCinema](https://wangyanhui666.github.io/MicroCinema.github.io/)                          | A Divide-and-Conquer Approach for Text-to-Video Generation.                                                                  |[arXiv](https://arxiv.org/abs/2311.18829)     |              |   Video   |

| [MIMO](https://github.com/menyifang/MIMO)                                                      | MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling.                                               |[arXiv](https://arxiv.org/abs/2409.16160)     |              |   Video   |

| [Mini-Gemini](https://github.com/dvlab-research/MiniGemini)                                    | Mining the Potential of Multi-modality Vision Language Models.                                                                                                                     |             |              |   Vision   |

| [MobileVidFactory](https://arxiv.org/abs/2307.16371)                                           | Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text.                                                                                              |             |              |   Video   |

| [Mochi 1](https://github.com/genmoai/models)                                                   | Mochi 1 is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.                                       |             |              |   Video   |

| [MOFA-Video](https://github.com/MyNiuuu/MOFA-Video)                                            | Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.             |[arXiv](https://arxiv.org/abs/2405.20222)     |              |   Video   |

| [MoneyPrinterTurbo](https://github.com/harry0703/MoneyPrinterTurbo)                            | Use large models to generate short videos with one click.                                                                                                                           |            |              |   Video   |

| [Moonvalley](https://moonvalley.ai/)                                                           | Moonvalley is a groundbreaking new text-to-video generative AI model.                                                                                                               |            |              |   Video   |

| [Mora](https://github.com/lichao-sun/Mora)                                                     | More like Sora for Generalist Video Generation.                                                                              |[arXiv](https://arxiv.org/abs/2403.13248)     |              |   Video   |

| [Morph Studio](https://www.morphstudio.com/)                                                   | With our Text-to-Video AI Magic, manifest your creativity through your prompt.                                                                                                     |             |              |   Video   |

| [MotionClone](https://github.com/Bujiazi/MotionClone)                                          | MotionClone: Training-Free Motion Cloning for Controllable Video Generation.                                                 |[arXiv](https://arxiv.org/abs/2406.05338)     |              |   Video   |

| [MotionCtrl](https://wzhouxiff.github.io/projects/MotionCtrl/)                                 | A Unified and Flexible Motion Controller for Video Generation.                                                               |[arXiv](https://arxiv.org/abs/2312.03641)     |              |   Video   |

| [MotionDirector](https://github.com/showlab/MotionDirector)                                    | Motion Customization of Text-to-Video Diffusion Models.                                                                      |[arXiv](https://arxiv.org/abs/2310.08465)     |              |   Video   |

| [Motionshop](https://aigc3d.github.io/motionshop/)                                             | An application of replacing the characters in video with 3D avatars.                                                                                                               |             |              |   Video   |

| [Mov2mov](https://github.com/Scholar01/sd-webui-mov2mov)                                       | Mov2mov plugin for Automatic1111/stable-diffusion-webui.                                                                                                                            |            |              |   Video   |

| [MovieFactory](https://arxiv.org/abs/2306.07257)                                               | Automatic Movie Creation from Text using Large Generative Models for Language and Images.                                    |[arXiv](https://arxiv.org/abs/2306.07257)     |              |   Video   |

| [Neural Frames](https://www.neuralframes.com/)                                                 | Discover the synthesizer for the visual world.                                                                                                                                      |            |              |   Video   |

| [NeverEnds](https://neverends.life/)                                                           | Create your world.                                                                                                                                                                   |           |              |   Video   |

| [Open-Sora](https://github.com/hpcaitech/Open-Sora)                                            | Democratizing Efficient Video Production for All.                                                                                                                                    |           |              |   Video   |

| [Open-Sora](https://github.com/PKU-YuanGroup/Open-Sora-Plan)                                   | Open-Sora Plan.                                                                                                                                                                    |            |              |   Video   |

| [Phenaki](https://phenaki.video/)                                                              | A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes.     |[arXiv](https://arxiv.org/abs/2210.02399)     |              |   Video   |

| [Pika Labs](https://www.pika.art/)                                                             | Pika Labs is revolutionizing video-making experience with AI.                                                                                                                      |             |              |   Video   |

| [Pixeling](https://hidream.ai/#/Pixeling)                                                      | Pixeling empowers our customers to create highly precise, ultra-realistic, and extremely controllable visual content including images, videos and 3D models.                       |             |              |   Video   |

| [PixVerse](https://app.pixverse.ai)                                                            | Create breath-taking videos with AI.                                                                                                                                                |            |              |   Video   |

| [Pollinations](https://pollinations.ai/c/Video)                                                | Creating gets easy, fast, and fun.                                                                                                                                                  |            |              |   Video   |

| [Reuse and Diffuse](https://anonymous0x233.github.io/ReuseAndDiffuse/)                         | Iterative Denoising for Text-to-Video Generation.                                                                            |[arXiv](https://arxiv.org/abs/2309.03549)     |              |   Video   |

| [Ruyi](https://github.com/IamCreateAI/Ruyi-Models)                                             | Ruyi is an image-to-video model capable of generating cinematic-quality videos at a resolution of 768, with a frame rate of 24 frames per second, totaling 5 seconds and 120 frames.             |              |              |   Video   |

| [ShortGPT](https://github.com/RayVentura/ShortGPT)                                             | An experimental AI framework for automated short/video content creation.                                                                                                            |            |              |   Video   |

| [Show-1](https://showlab.github.io/Show-1/)                                                    | Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation.                                                     |[arXiv](https://arxiv.org/abs/2309.15818)     |              |   Video   |

| [Step-Video-T2V](https://github.com/stepfun-ai/Step-Video-T2V)                                 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model.                             |[arXiv](https://arxiv.org/abs/2502.10248)     |              |   Video   |

| [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1)                                        | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers.                                                  |[arXiv](https://arxiv.org/abs/2502.10841)     |              |   Video   |

| [SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1)                                        | SkyReels V1: Human-Centric Video Foundation Model.                                                                                                                          |            |              |   Video   |

| [Snap Video](https://snap-research.github.io/snapvideo/)                                       | Scaled Spatiotemporal Transformers for Text-to-Video Synthesis.                                                              |[arXiv](https://arxiv.org/abs/2402.14797)     |              |   Video   |

| [Sora](https://openai.com/sora)                                                                | Creating video from text.                                                                                                                                                           |            |              |   Video   |

| [SoraWebui](https://github.com/SoraWebui/SoraWebui)                                            | SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model.                                                            |            |              |   Video   |

| [StableVideo](https://github.com/rese1f/stablevideo)                                           | Text-driven Consistency-aware Diffusion Video Editing.                                                                                                                              |            |              |   Video   |

| [Stable Video Diffusion](https://github.com/Stability-AI/generative-models)                    | Stable Video Diffusion (SVD) Image-to-Video.                                                                                                                                         |           |              |   Video   |

| [StoryDiffusion](https://github.com/HVision-NKU/StoryDiffusion)                                | Consistent Self-Attention for Long-Range Image and Video Generation.                                                         |[arXiv](https://arxiv.org/abs/2405.01434)     |              |   Video   |

| [StreamingT2V](https://github.com/Picsart-AI-Research/StreamingT2V)                            | Consistent, Dynamic, and Extendable Long Video Generation from Text.                                                         |[arXiv](https://arxiv.org/abs/2403.14773)     |              |   Video   |

| [StyleCrafter](https://gongyeliu.github.io/StyleCrafter.github.io/)                            | nhancing Stylized Text-to-Video Generation with Style Adapter.                                                               |[arXiv](https://arxiv.org/abs/2312.00330)     |              |   Video   |

| [TATS](https://songweige.github.io/projects/tats/index.html)                                   | Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer.                                                                                                       |           |              |   Video   |

| [Text2Video-Zero](https://github.com/Picsart-AI-Research/Text2Video-Zero)                      | Text-to-Image Diffusion Models are Zero-Shot Video Generators.                                                               |[arXiv](https://arxiv.org/abs/2303.13439)     |              |   Video   |

| [TF-T2V](https://tf-t2v.github.io/)                                                            | A Recipe for Scaling up Text-to-Video Generation with Text-free Videos.                                                      |[arXiv](https://arxiv.org/abs/2312.15770)     |              |   Video   |

| [Tora](https://github.com/ali-videoai/Tora)                                                    | Tora: Trajectory-oriented Diffusion Transformer for Video Generation.                                                        |[arXiv](https://arxiv.org/abs/2407.21705)     |              |   Video   |

| [Track-Anything](https://github.com/gaomingqi/Track-Anything)                                  | Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem.           |[arXiv](https://arxiv.org/abs/2304.11968)     |              |   Video   |

| [Tune-A-Video](https://github.com/showlab/Tune-A-Video)                                        | One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.                                                      |[arXiv](https://arxiv.org/abs/2212.11565)     |              |   Video   |

| [TwelveLabs](https://www.twelvelabs.io/)                                                       | Multimodal AI that understands videos like humans.                                                                                                                                  |            |              |   Video   |

| [UniVG](https://univg-baidu.github.io/)                                                        | Towards UNIfied-modal Video Generation.                                                                                                                                             |            |              |   Video   |

| [Vchitect-2.0](https://github.com/Vchitect/Vchitect-2.0)                                       | Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.                                                                                                           |            |              |   Video   |

| [VGen](https://github.com/ali-vilab/i2vgen-xl)                                                 | A holistic video generation ecosystem for video generation building on diffusion models.                                     |[arXiv](https://arxiv.org/abs/2311.04145)     |              |   Video   |

| [ViewCrafter](https://github.com/Drexubery/ViewCrafter)                                        | ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis.                                           |[arXiv](https://arxiv.org/abs/2409.02048)     |              |   Video   |

| [Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT)                                  | Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos.                      |[arXiv](https://arxiv.org/abs/2306.05424)     |              |   Video   |

| [VideoComposer](https://videocomposer.github.io/)                                              | Compositional Video Synthesis with Motion Controllability.                                                                   |[arXiv](https://arxiv.org/abs/2306.02018)     |              |   Video   |

| [VideoCrafter1](https://arxiv.org/abs/2310.19512)                                              | Open Diffusion Models for High-Quality Video Generation.                                                                     |[arXiv](https://arxiv.org/abs/2310.19512)     |              |   Video   |

| [VideoCrafter2](https://ailab-cvc.github.io/videocrafter2/)                                    | Overcoming Data Limitations for High-Quality Video Diffusion Models.                                                         |[arXiv](https://arxiv.org/abs/2401.09047)     |              |   Video   |

| [VideoDrafter](https://videodrafter.github.io/)                                                | Content-Consistent Multi-Scene Video Generation with LLM.                                                                    |[arXiv](https://arxiv.org/abs/2401.01256)     |              |   Video   |

| [VideoElevator](https://github.com/YBYBZhang/VideoElevator)                                    | Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.                                            |[arXiv](https://arxiv.org/abs/2403.05438)     |              |   Video   |

| [VideoFactory](https://arxiv.org/abs/2305.10874)                                               | Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation.                                                                                                         |              |              |   Video   |

| [VideoGen](https://videogen.github.io/VideoGen/)                                               | A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation.                        |[arXiv](https://arxiv.org/abs/2309.00398)     |              |   Video   |

| [VideoLCM](https://arxiv.org/abs/2312.09109)                                                   | Video Latent Consistency Model.                                                                                              |[arXiv](https://arxiv.org/abs/2312.09109)     |              |   Video   |

| [Video LDMs](https://research.nvidia.com/labs/toronto-ai/VideoLDM/)                            | Align your Latents: High- resolution Video Synthesis with Latent Diffusion Models.                                           |[arXiv](https://arxiv.org/abs/2304.08818)     |              |   Video   |

| [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA)                                    | Learning United Visual Representation by Alignment Before Projection.                                                        |[arXiv](https://arxiv.org/abs/2311.10122)     |              |   Video   |

| [VideoMamba](https://github.com/OpenGVLab/VideoMamba)                                          | State Space Model for Efficient Video Understanding.                                                                         |[arXiv](https://arxiv.org/abs/2403.06977)     |              |   Video   |

| [Video-of-Thought](https://github.com/scofield7419/Video-of-Thought)                           | Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition.                                                                                                       |             |              |   Video   |

| [VideoPoet](https://sites.research.google/videopoet/)                                          | A large language model for zero-shot video generation.                                                                       |[arXiv](https://arxiv.org/abs/2312.14125)     |              |   Video   |

| [Vispunk Motion](https://vispunk.com/video)                                                    | Create realistic videos using just text.                                                                                                                                           |             |              |   Video   |

| [VisualRWKV](https://github.com/howard-hou/VisualRWKV)                                         | VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.                                                            |              |              |   Visual   |

| [V-JEPA](https://github.com/facebookresearch/jepa)                                             | Video Joint Embedding Predictive Architecture.                                                                               |[arXiv](https://arxiv.org/abs/2404.08471)     |              |   Video   |

| [W.A.L.T](https://walt-video-diffusion.github.io/)                                             | Photorealistic Video Generation with Diffusion Models.                                                                       |[arXiv](https://arxiv.org/abs/2312.06662)     |              |   Video   |

| [Wan2.1](https://github.com/Wan-Video/Wan2.1)                                                  | Wan: Open and Advanced Large-Scale Video Generative Models.                                                                                                                      |               |              |   Video   |

| [Zeroscope](https://huggingface.co/spaces/fffiloni/zeroscope)                                  | Zeroscope Text-to-Video.                                                                                                                                                         |               |              |   Video   |

^ Back to Contents ^


## Audio

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AcademiCodec](https://github.com/yangdongchao/AcademiCodec)                                   | An Open Source Audio Codec Model for Academic Research.                                                                                                                  |  |              |   Audio   |

| [Amphion](https://github.com/open-mmlab/Amphion)                                               | An Open-Source Audio, Music, and Speech Generation Toolkit.                                                                 |[arXiv](https://arxiv.org/abs/2312.09911)      |              |   Audio   |

| [ArchiSound](https://github.com/archinetai/audio-diffusion-pytorch)                            | Audio generation using diffusion models, in PyTorch.                                                                                                                           |                  |              |   Audio   |

| [Audiobox](https://audiobox.metademolab.com/)                                                  | Unified Audio Generation with Natural Language Prompts.                                                                                                                        |                  |              |   Audio   |

| [AudioEditing](https://github.com/HilaManor/AudioEditingCode)                                  | Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion.                                                   |[arXiv](https://arxiv.org/abs/2402.10009)      |              |   Audio   |

| [Audiogen Codec](https://github.com/AudiogenAI/agc)                                            | A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵.                                                                         |                  |              |   Audio   |

| [AudioGPT](https://github.com/AIGC-Audio/AudioGPT)                                             | Understanding and Generating Speech, Music, Sound, and Talking Head.                                                        |[arXiv](https://arxiv.org/abs/2304.12995)      |              |   Audio   |

| [AudioLCM](https://github.com/liuhuadai/AudioLCM)                                              | Text-to-Audio Generation with Latent Consistency Models.                                                                    |[arXiv](https://arxiv.org/abs/2406.00356v1)      |              |   Audio   |

| [AudioLDM](https://audioldm.github.io/)                                                        | Text-to-Audio Generation with Latent Diffusion Models.                                                                      |[arXiv](https://arxiv.org/abs/2301.12503)      |              |   Audio   |

| [AudioLDM 2](https://github.com/haoheliu/audioldm2)                                            | Learning Holistic Audio Generation with Self-supervised Pretraining.                                                        |[arXiv](https://arxiv.org/abs/2308.05734)      |              |   Audio   |

| [Auffusion](https://github.com/happylittlecat2333/Auffusion)                                   | Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.                                   |[arXiv](https://arxiv.org/abs/2401.01044)      |              |   Audio   |

| [CTAG](https://github.com/PapayaResearch/ctag)                                                 | Creative Text-to-Audio Generation via Synthesizer Programming.                                                                                                                     |              |              |   Audio   |

| [FoleyCrafter](https://github.com/open-mmlab/FoleyCrafter)                                     | FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.                                            |[arXiv](https://arxiv.org/abs/2407.01494)      |              |   Audio   |

| [MAGNeT](https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/)                                      | Masked Audio Generation using a Single Non-Autoregressive Transformer.                                                                                                             |              |              |   Audio   |

| [Make-An-Audio](https://text-to-audio.github.io/)                                              | Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.                                                             |[arXiv](https://arxiv.org/abs/2301.12661)      |              |   Audio   |

| [Make-An-Audio 3](https://github.com/Text-to-Audio/Make-An-Audio-3)                            | Transforming Text into Audio via Flow-based Large Diffusion Transformers.                                                   |[arXiv](https://arxiv.org/abs/2305.18474)      |              |   Audio   |

| [NeuralSound](https://github.com/hellojxt/NeuralSound)                                         | Learning-based Modal Sound Synthesis with Acoustic Transfer.                                                                |[arXiv](https://arxiv.org/abs/2108.07425)      |              |   Audio   |

| [OptimizerAI](https://www.optimizerai.xyz/)                                                    | Sounds for Creators, Game makers, Artists, Video makers.                                                                    |            |              |   Audio   |

| [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio)                                           | Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.                                         |[arXiv](https://arxiv.org/abs/2407.10759)      |              |   Audio   |

| [SEE-2-SOUND](https://github.com/see2sound/see2sound)                                          | Zero-Shot Spatial Environment-to-Spatial Sound.                                                                             |[arXiv](https://arxiv.org/abs/2406.06612)      |              |   Audio   |

| [SoundStorm](https://google-research.github.io/seanet/soundstorm/examples/)                    | Efficient Parallel Audio Generation.                                                                                        |[arXiv](https://arxiv.org/abs/2305.09636)      |              |   Audio   |

| [Stable Audio](https://www.stableaudio.com/)                                                   | Fast Timing-Conditioned Latent Audio Diffusion.                                                                                                                                      |            |              |   Audio   |

| [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0)                  | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts.                                                                              |            |              |   Audio   |

| [SyncFusion](https://github.com/mcomunita/syncfusion)                                          | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis.                                                   |[arXiv](https://arxiv.org/abs/2310.15247)      |              |   Audio   |

| [TANGO](https://github.com/declare-lab/tango)                                                  | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model.                                                                                                      |           |              |   Audio   |

| [VTA-LDM](https://github.com/ariesssxu/vta-ldm)                                                | Video-to-Audio Generation with Hidden Alignment.                                                                            |[arXiv](https://arxiv.org/abs/2407.07464)      |              |   Audio   |

| [WavJourney](https://github.com/Audio-AGI/WavJourney)                                          | Compositional Audio Creation with Large Language Models.                                                                    |[arXiv](https://arxiv.org/abs/2307.14335)      |              |   Audio   |

^ Back to Contents ^


## Music

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [AIVA](https://www.aiva.ai/)                                                                   | The Artificial Intelligence composing emotional soundtrack music.                                                                                             |                                    |              |   Music   |

| [Amper Music](https://www.shutterstock.com/discover/ampermusic)                                | Custom music generation technology powered by Amper.                                                                                                          |                                    |              |   Music   |

| [Boomy](https://boomy.com/)                                                                    | Create generative music. Share it with the world.                                                                                                             |                                    |              |   Music   |

| [ChatMusician](https://shanghaicannon.github.io/ChatMusician/)                                 | Fostering Intrinsic Musical Abilities Into LLM.                                                                                                              |                                     |              |   Music   |

| [Chord2Melody](https://github.com/tanreinama/chord2melody)                                     | Automatic Music Generation AI.                                                                                                                               |                                     |              |   Music   |

| [Diff-BGM](https://github.com/sizhelee/Diff-BGM)                                               | A Diffusion Model for Video Background Music Generation.                                                                   | [arXiv](https://arxiv.org/abs/2405.11913)      |              |   Music   |

| [FluxMusic](https://github.com/feizc/FluxMusic)                                                | FluxMusic: Text-to-Music Generation with Rectified Flow Transformer.                                                       | [arXiv](https://arxiv.org/abs/2409.00587)      |              |   Music   |

| [GPTAbleton](https://github.com/BurnedGuitarist/GPTAbleton)                                    | Draft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc.                                |                                   |              |   Music   |

| [HeyMusic.AI](https://heymusic.ai/zh)                                                          | AI Music Generator                                                                                                                                             |                                   |              |   Music   |

| [Image to Music](https://imagetomusic.top/)                                                    | AI Image to Music Generator is a tool that uses artificial intelligence to convert images into music.                                                          |                                   |              |   Music   |

| [JEN-1](https://www.futureverse.com/research/jen/demos/jen1)                                   | Text-Guided Universal Music Generation with Omnidirectional Diffusion Models.                                                                                  |                                   |              |   Music   |

| [Jukebox](https://github.com/openai/jukebox)                                                   | A Generative Model for Music.                                                                                              | [arXiv](https://arxiv.org/abs/2005.00341)      |              |   Music   |

| [Magenta](https://github.com/magenta/magenta)                                                  | Magenta is a research project exploring the role of machine learning in the process of creating art and music.                                              |                                   |              |   Music   |

| [MeLoDy](https://efficient-melody.github.io/)                                                  | Efficient Neural Music Generation                                                                                                                                |                                 |              |   Music   |

| [Mubert](https://mubert.com/)                                                                  | AI Generative Music.                                                                                                                                             |                                 |              |   Music   |

| [MuseNet](https://openai.com/research/musenet)                                                 | A deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles.     |                         |              |   Music   |

| [MusicGen](https://github.com/facebookresearch/audiocraft)                                     | Simple and Controllable Music Generation.                                                                                  | [arXiv](https://arxiv.org/abs/2306.05284)      |              |   Music   |

| [MusicLDM](https://musicldm.github.io/)                                                        | Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies.                                     | [arXiv](https://arxiv.org/abs/2308.01546)      |              |   Music   |

| [MusicLM](https://google-research.github.io/seanet/musiclm/examples/)                          | Generating Music From Text.                                                                                                | [arXiv](https://arxiv.org/abs/2301.11325)      |              |   Music   |

| [Riffusion App](https://github.com/riffusion/riffusion-app)                                    | Riffusion is an app for real-time music generation with stable diffusion.                                                                                           |                              |              |   Music   |

| [Sonauto](https://sonauto.ai/Home)                                                             | Sonauto is an AI music editor that turns prompts, lyrics, or melodies into full songs in any style.                                                                 |                             |              |   Music   |

| [SoundRaw](https://soundraw.io/)                                                               | AI music generator for creators.                                                                                                                                     |                             |              |   Music   |

| [Soundry AI](https://soundry.ai/)                                                              | Generative AI tools including text-to-sound and infinite sample packs.                                                                                               |                             |              |   Music   |

| [YuE](https://github.com/multimodal-art-projection/YuE)                                        | YuE: Open Full-song Generation Foundation Model, something similar to Suno.ai but open.                                                                              |                             |              |   Music   |

^ Back to Contents ^


## Singing Voice

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger)                                     | Singing Voice Synthesis via Shallow Diffusion Mechanism.                                                                   | [arXiv](https://arxiv.org/abs/2105.02446)      |              |   Singing Voice   |

| [Retrieval-based-Voice-Conversion-WebUI](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)                                     | An easy-to-use SVC framework based on VITS.                                                                             |                      |              |   Singing Voice   |

| [so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)                                 | SoftVC VITS Singing Voice Conversion.                                                                                                                                      |                       |              |   Singing Voice   |

| [VI-SVS](https://github.com/PlayVoice/VI-SVS)                                                  | Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger.                                                                                         |                       |              |   Singing Voice   |

^ Back to Contents ^


## Speech

| Source                                                                                      | Description                                                                                                                                                                                    |   Paper   |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: |

| [Applio](https://github.com/IAHispano/Applio)                                                  | Ultimate voice cloning tool, meticulously optimized for unrivaled power, modularity, and user-friendly experience.                              |                                                |              |  Speech  |

| [Audyo](https://www.audyo.ai/)                                                                 | Text in. Audio out.                                                                                                                              |                                                |              |  Speech  |

| [Bark](https://github.com/suno-ai/bark)                                                        | Text-Prompted Generative Audio Model.                                                                                                           |                                                 |              |  Speech  |

| [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)                                          | VITS2 Backbone with multilingual bert.                                                                                                          |                                                 |              |  Speech  |

| [ChatTTS](https://github.com/2noise/ChatTTS)                                                   | ChatTTS is a generative speech model for daily dialogue.                                                                                        |                                                 |              |  Speech  |

| [CLAPSpeech](https://clapspeech.github.io/)                                                    | Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.                                           | [arXiv](https://arxiv.org/abs/2305.10763)      |              |  Speech  |

| [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)                                          | Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.                                   |                                                 |              |  Speech  |

| [DEX-TTS](https://github.com/winddori2002/DEX-TTS)                                             | Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability.                                         | [arXiv](https://arxiv.org/abs/2406.19135)      |              |  Speech  |

| [EmotiVoice](https://github.com/netease-youdao/EmotiVoice)                                     | A Multi-Voice and Prompt-Controlled TTS Engine.                                                                                                 |                                                 |              |  Speech  |

| [Fliki](https://fliki.ai/)                                                                     | Turn text into videos with AI voices.                                                                                                           |                                                 |              |  Speech  |

| [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice)                                            | GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.                                         |                                                 |              |  Speech  |

| [Glow-TTS](https://github.com/jaywalnut310/glow-tts)                                           | A Generative Flow for Text-to-Speech via Monotonic Alignment Search.                                                       | [arXiv](https://arxiv.org/abs/2005.11129)      |              |  Speech  |

| [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)                                           | A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.                                                                                 |                                                  |              |  Speech  |

| [LOVO](https://lovo.ai/)                                                                       | LOVO is the go-to AI Voice Generator & Text to Speech platform for thousands of creators.                                                      |                                                  |              |  Speech  |

| [MahaTTS](https://github.com/dubverse-ai/MahaTTS)                                              | An Open-Source Large Speech Generation Model.                                                                                                   |                                                 |              |  Speech  |

| [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS)                                      | A fast TTS architecture with conditional flow matching.                                                                    | [arXiv](https://arxiv.org/abs/2309.03199)      |              |  Speech  |

| [MeloTTS](https://github.com/myshell-ai/MeloTTS)                                               | High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.       |                                                  |              |  Speech  |

| [MetaVoice-1B](https://github.com/metavoiceio/metavoice-src)                                   | AI for human-level speech intelligence.                                                                                                         |                                                 |              |  Speech  |

| [Narakeet](https://www.narakeet.com/)                                                          | Easily Create Voiceovers Using Realistic Text to Speech.                                                                                        |                                                 |              |  Speech  |

| [Mini-Omni](https://github.com/gpt-omni/mini-omni)                                             | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.                                                                 | [arXiv](https://arxiv.org/abs/2408.16725)      |              |  Speech  |

| [One-Shot-Voice-Cloning](https://github.com/CMsmartvoice/One-Shot-Voice-Cloning)               | One Shot Voice Cloning base on Unet-TTS.                                                                                                       |                                                  |              |  Speech  |

| [OpenVoice](https://github.com/myshell-ai/OpenVoice)                                           | Instant voice cloning by MyShell.                                                                                                              |                                                  |              |  Speech  |

| [OverFlow](https://github.com/shivammehta25/OverFlow)                                          | Putting flows on top of neural transducers for better TTS.                                                                                     |                                                  |              |  Speech  |

| [RealtimeTTS](https://github.com/KoljaB/RealtimeTTS)                                           | RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications.                                        |                                                  |              |  Speech  |

| [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)                                        | SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED).                                                                                            |                                                  |              |  Speech  |

| [SpeechGPT](https://github.com/0nutation/SpeechGPT)                                            | Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities.                                      | [arXiv](https://arxiv.org/abs/2305.11000)      |              |  Speech  |

| [speech-to-text-gpt3-unity](https://github.com/dr-iskandar/speech-to-text-gpt3-unity)          | This is the repo I use Whisper and ChatGPT API from OpenAI in Unity.                                                                           |                                                  |     Unity     |  Speech  |

| [Stable Speech](https://github.com/sanchit-gandhi/stable-speech)                               | Stability AI's Text-to-Speech model.                                                                                                          |                                                   |              |  Speech  |

| [StableTTS](https://github.com/KdaiP/StableTTS)                                                | Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3.                                                        |                                                   |              |  Speech  |

| [Step-Audio](https://github.com/stepfun-ai/Step-Audio)                                         | Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction.                                        | [arXiv](https://arxiv.org/abs/2502.11946)      |              |  Speech  |

| [StyleTTS 2](https://github.com/yl4579/StyleTTS2)                                              | Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.    | [arXiv](https://arxiv.org/abs/2306.07691)      |              |  Speech  |

| [tortoise.cpp](https://github.com/balisujohn/tortoise.cpp)                                     | tortoise.cpp: GGML implementation of tortoise-tts.                                                                                             |                                                  |              |  Speech  |

| [TorToiSe-TTS](https://github.com/neonbjb/tortoise-tts)                                        | A multi-voice TTS system trained with an emphasis on quality.                                                                                  |                                                  |              |  Speech  |

| [TTS Generation WebUI](https://github.com/rsxdalv/tts-generation-webui)                        | TTS Generation WebUI (Bark, MusicGen, Tortoise, RVC, Vocos, Demucs).                                                                           |                                                  |              |  Speech  |

| [VALL-E](https://valle-demo.github.io/)                                                        | Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.                                                    | [arXiv](https://arxiv.org/abs/2301.02111)      |              |  Speech  |

| [VALL-E X](https://vallex-demo.github.io/)                                                     | Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling                                  | [arXiv](https://arxiv.org/abs/2303.03926)      |              |  Speech  |

| [Vocode](https://docs.vocode.dev/)                                                             | Vocode is an open-source library for building voice-based LLM applications.                                                                   |                                                   |              |  Speech  |

| [Voicebox](https://github.com/SpeechifyInc/Meta-voicebox)                                      | Text-Guided Multilingual Universal Speech Generation at Scale.                                                             | [arXiv](https://arxiv.org/abs/2306.15687)      |              |  Speech  |

| [VoiceCraft](https://github.com/jasonppy/VoiceCraft)                                           | Zero-Shot Speech Editing and Text-to-Speech in the Wild.                                                                                    |                                                     |              |  Speech  |

| [Whisper](https://github.com/openai/whisper)                                                   | Whisper is a general-purpose speech recognition model.                                                                                     |                                                      |              |  Speech  |

| [WhisperSpeech](https://github.com/collabora/WhisperSpeech)                                    | An Open Source text-to-speech system built by inverting Whisper.                                                                           |                                                      |              |  Speech  |

| [X-E-Speech](https://github.com/X-E-Speech/X-E-Speech-code)                                    | Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion.                                |                                                      |              |  Speech  |

| [XTTS](https://github.com/coqui-ai/TTS)                                                        | XTTS is a library for advanced Text-to-Speech generation.                                                                                  |                                                      |              |  Speech  |

| [YourTTS](https://github.com/Edresson/YourTTS)                                                 | Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone.                                           | [arXiv](https://arxiv.org/abs/2112.02418)      |              |  Speech  |

| [ZMM-TTS](https://github.com/nii-yamagishilab/ZMM-TTS)                                         | Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations.  | [arXiv](https://arxiv.org/abs/2312.14398)      |              |  Speech  |

^ Back to Contents ^


## Analytics

| Source                                                                                      | Description                                                                                                                                                                                    |  Game Engine  |   Type   |

| :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-------: |

| [Ludo.ai](https://ludo.ai/)                                                        | Assistant for game research and design.                                                                                                                        |              |  Analytics  |

^ Back to Contents ^
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Yuan-ManX/ai-game-devtools

Awesome Lists containing this project

README