ai-game-devtools

Here we will keep track of the latest AI Game Development Tools, including LLM, World Model, Agent, Code, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
https://github.com/Yuan-ManX/ai-game-devtools

Last synced: 1 day ago
JSON representation

Project List
- LLM (LLM & Tool)
 - MiniMax-01 - 01: Scaling Foundation Models with Lightning Attention. |[arXiv](https://arxiv.org/abs/2501.08313) | | LLM |
 - SkyThought - T1: Train your own O1 preview model within $450. | | | LLM |
 - Open Deep Research - powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. | | | LLM |
 - AI-Writer - trained generative model. | | | Writer |
 - Notebook.ai
 - Novel - style WYSIWYG editor with AI-powered autocompletions. | | | Writer |
 - AIOS
 - AgentGPT
 - AICommand
 - Bisheng
 - Character-LLM - Playing. |[arXiv](https://arxiv.org/abs/2310.10158) | | Tool |
 - 👶🤖🖥️ BabyAGI UI
 - ChatGPT-API-unity
 - ChatGPTForUnity
 - ChatRWKV
 - ChatYuan
 - Chinese-LLaMA-Alpaca-3 - 3 LLMs) developed from Meta Llama 3. | | | Tool |
 - Chrome-GPT
 - CoreNet
 - DBRX
 - DCLM
 - DemoGPT - AI App Generator with the Power of Llama 2 | | | Tool |
 - Devika
 - Gemma - of-the art open models built from research and technology used to create Google Gemini models. | | | Tool |
 - GPTScript
 - Hugging Face API Unity Integration - to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. | | Unity | Tool |
 - gemma.cpp
 - GLM-4 - 4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. | | | Tool |
 - GPT4All
 - ImageBind
 - InternLM - sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. |[arXiv](https://arxiv.org/abs/2403.17297) | | Tool |
 - Jan
 - Index-1.9B
 - InteractML-Unity
 - Lamini - tuning on their own data. | | | Tool |
 - LaMini-LM - LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions. | | | Tool |
 - LaVague
 - Lemur
 - Lepton AI
 - Lit-LLaMA - Adapter fine-tuning, pre-training. | | | Tool |
 - llama2-webui
 - Llama 3
 - Llama 3.1
 - MiniGPT-5 - and-Language Generation via Generative Vokens. |[arXiv](https://arxiv.org/abs/2310.02239) | | Tool |
 - MLC LLM
 - MobiLlama
 - llm.c
 - LLMUnity
 - LLocalSearch
 - NExT-GPT - to-Any Multimodal Large Language Model. | | | Tool |
 - LogicGamesSolver
 - Large World Model (LWM) - purpose large-context multimodal autoregressive model. |[arXiv](https://arxiv.org/abs/2402.08268) | | Tool |
 - Lumina-T2X - T2X is a unified framework for Text to Any Modality Generation. |[arXiv](https://arxiv.org/abs/2405.05945) | | Tool |
 - MetaGPT - Agent Framework | | | Tool |
 - MiniGPT-4 - language Understanding with Advanced Large Language Models. |[arXiv](https://arxiv.org/abs/2304.10592) | | Tool |
 - mPLUG-Owl🦉
 - OLMo
 - LLaSM
 - OneLLM
 - LLM Answer Engine - Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | | | Tool |
 - Orion-14B - 14B is a family of models includes a 14B foundation LLM, and a series of models. |[arXiv](https://arxiv.org/abs/2401.12246) | | Tool |
 - Text generation web UI - J, OPT, and GALACTICA. | | | Tool |
 - WebGPT
 - Web3-GPT
 - Perplexica - powered search engine. | | | Tool |
 - RepoAgent - Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. |[arXiv](https://arxiv.org/abs/2402.16667) | | Tool |
 - Sanity AI Engine
 - SearchGPT
 - Skywork - trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. | | | Tool |
 - StableLM
 - Stanford Alpaca - following LLaMA Model. | | | LLM |
 - TinyChatEngine - Device LLM Inference Library. | | | Tool |
 - ToolBench
 - Unity ChatGPT
 - Unreal Engine 5 Llama LoRA - of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools. | | Unreal Engine | Tool |
 - UnrealGPT
 - WordGPT
 - Yi
 - 01 Project - source language model computer. | | | Tool |
 - Qwen-7B - 7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. | | | LLM |
 - HuggingChat
 - Grok-1 - of-Experts model, Grok-1. | | | Tool |
 - Moshi
 - ShareGPT4V - Modal Models with Better Captions. | | | Tool |
 - Pi
 - NovelAI
 - AI Scientist - Ended Scientific Discovery. |[arXiv](https://arxiv.org/abs/2408.06292) | | Tool |
 - LongWriter
 - LongCat-Flash - Flash is a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. | | | LLM |
 - MOSS - source tool-augmented conversational language model from Fudan University. | | | Tool |
 - DeepSeek-V3 - V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. |[arXiv](https://arxiv.org/abs/2412.19437) | | LLM |
 - InteractML-Unreal Engine
 - Moshi - text foundation model for real time dialogue. | | | Tool |
 - DeepSeek-R1 - R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. | | | LLM |
 - Gemini
 - GLM-4.5 - 4.5: An open-source large language model designed for intelligent agents by Z.ai. | | | LLM |
 - gpt-oss - oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. | | | LLM |
 - Kimi K2 - of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. | | | LLM |
 - Cosmos
 - Janus
 - s1 - time scaling. |[arXiv](https://arxiv.org/abs/2501.19393) | | LLM |
 - Assistant CLI
 - BabyAGI - powered task management system. | | | Tool |
 - Hunyuan-MT - MT comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translation outputs to produce a higher-quality result. | | | LLM |
 - LangChain
 - OpenDevin
 - GPT-4o - 4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. | | | Tool |
 - SimpleOllamaUnity
 - Nemotron-4 - billion-parameter large multilingual language model trained on 8 trillion text tokens. |[arXiv](https://arxiv.org/abs/2402.16819) | | Tool |
 - Open-Assistant - based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. | | | Tool |
 - Panda - 7B, -13B, -33B, -65B for continuous pre-training in the Chinese field. | | | Tool |
 - Qwen3
 - Seed-OSS - OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. | | | LLM |
 - baichuan-7B - scale 7B pretraining language model developed by Baichuan. | | | Tool |
 - Baichuan-13B
 - Baichuan 2
 - CogVLM - source visual language foundation model. |[arXiv](https://arxiv.org/abs/2311.03079) | | Tool |
 - Design2Code - End Engineering | | | Tool |
 - Devon - source pair programmer. | | | Tool |
 - Dora
 - Flowise
 - MiniCPM-2B - side LLM outperforms Llama2-13B. | | | Tool |
 - Mixtral 8x7B - of-Experts. |[arXiv](https://arxiv.org/abs/2401.04088) | | Tool |
 - Qwen1.5
 - Auto-GPT - source attempt to make GPT-4 fully autonomous. | | | Tool |
- Tool (AI LLM)
 - InteractML-Unreal Engine
 - Unity OpenAI-API Integration - 3 language model and ChatGPT API into a Unity project. | | Unity | Tool |
 - Mistral 7B
 - Mistral Large - edge text generation model. It reaches top-tier reasoning capabilities. | | | Tool |
Game (World Model & Agent)
- LLM (LLM & Tool)
 - XAgent
 - AutoGen - Gen Large Language Model Applications. |[arXiv](https://arxiv.org/abs/2308.08155) | | Agent |
 - behaviac
 - Biomes
 - Buffer of Thoughts - Augmented Reasoning with Large Language Models. |[arXiv](https://arxiv.org/abs/2406.04271) | | Agent |
 - Byzer-Agent
 - Dify - source LLM app building platform. | | | Agent |
 - fabric - source framework for augmenting humans using AI. | | | Agent |
 - FastGPT - based platform built on the LLM. | | | Agent |
 - Cat Town - powered simulation with cats. | | | Agent |
 - fastRAG
 - GameAISDK - based game AI automation framework. | | | Framework |
 - Generative Agents
 - CharacterGLM
 - Cradle
 - everything-ai - powered and local chatbot assistant🤖. | | | Agent |
 - KwaiAgents - seeking agent system with Large Language Models (LLMs). |[arXiv](https://arxiv.org/abs/2312.04889) | | Agent |
 - LangChain
 - HippoRAG - Term Memory for Large Language Models. |[arXiv](https://arxiv.org/abs/2405.14831) | | Agent |
 - LangGraph Studio
 - Interactive LLM Powered NPCs - source project that completely transforms your interaction with non-player characters (NPCs) in any game! | | | Game |
 - LARP - Agent Role Play for open-world games. |[arXiv](https://arxiv.org/abs/2312.17653) | | Agent |
 - LlamaIndex
 - OpenAgents
 - Pipecat
 - Qwen-Agent - Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. | | | Agent |
 - Ragas
 - Translation Agent
 - Video2Game - time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. |[arXiv](https://arxiv.org/abs/2404.09833) | | Game |
 - WebDesignAgent
 - Mixture of Agents (MoA) - of-Agents Enhances Large Language Model Capabilities. |[arXiv](https://arxiv.org/abs/2406.04692) | | Agent |
 - MuG Diffusion
 - AgentBench
 - Agent Group Chat
 - AgentScope - empowered multi-agent applications in an easier way. |[arXiv](https://arxiv.org/abs/2402.14034) | | Agent |
 - AI Town
 - anime.gf
 - AutoAgents
 - Astrocade
 - CogAgent - source visual language model improved based on CogVLM. |[arXiv](https://arxiv.org/abs/2312.08914) | | Agent |
 - Digital Life Project
 - Moonlander.ai
 - Opus
 - V-IRL
 - SIMA
 - Agent K - evolving and modular. | | | Agent |
 - RPBench-Auto - playing. | | | Game |
 - MMRole - Playing Agents. |[arXiv](https://arxiv.org/abs/2408.04203v1) | | Agent |
 - TaskGen - based agentic framework building on StrictJSON outputs by LLM agents. | | | Agent |
 - Twitter
 - TEN Agent - time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. | | | Agent |
 - HunyuanWorld-Voyager - Voyager is a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. | | | Game |
 - NVIDIA NeMo Agent Toolkit
 - Unbounded
 - ChatDev
 - Oasis
 - HunyuanWorld 1.0
 - Genesis
 - Agent Laboratory
 - SWE-agent
 - Datarus Jupyter Agent - step reasoning system that executes complex analytical workflows with step-by-step reasoning, automatic error recovery, and comprehensive result synthesis. | | | Agent |
 - GameNGen - Time Game Engines. |[arXiv](https://arxiv.org/abs/2408.14837) | | Game |
 - AWorld - Improvement. | | | Agent |
 - ComoRAG - Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning. |[arXiv](https://arxiv.org/abs/2508.10419) | | Agent |
 - GameGen-O - O: Open-world Video Game Generation. | | | Game |
 - Genie 3
 - gigax - powered NPCs. | | | Game |
 - Hunyuan-GameCraft - GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition. |[arXiv](https://arxiv.org/abs/2506.17201) | | Game |
 - IoA - source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | | | Agent |
 - Jaaz - The world's first open-source multimodal creative assistant. AI design agent, local alternative for Lovart. Canva + Cursor. AI agent with ability to design, edit and generate images, posters, storyboards, etc. | | | Agent |
 - MindSearch - based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). | | | Agent |
 - OmAgent
 - StoryGames.ai
 - GenAgent - Case Studies on ComfyUI. |[arXiv](https://arxiv.org/abs/2409.01392) | | Agent |
 - Langflow - flow to provide an effortless way to experiment and prototype flows. | | | Agent |
 - Matrix-Game 2.0 - Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. | | | Game |
Avatar
- LLM (LLM & Tool)
 - Ditto - Space Diffusion for Controllable Realtime Talking Head Synthesis. |[arXiv](https://arxiv.org/abs/2411.19509) | | Avatar |
 - AniPortrait - Driven Synthesis of Photorealistic Portrait Animations. |[arXiv](https://arxiv.org/abs/2403.17694) | | Avatar |
 - CALM
 - ChatdollKit
 - DreamTalk
 - EMOPortraits - enhanced Multimodal One-shot Head Avatars. | | | Avatar |
 - E3 Gen
 - GeneAvatar - Aware Volumetric Head Avatar Editing from a Single Image. |[arXiv](https://arxiv.org/abs/2404.02152) | | Avatar |
 - GeneFace++ - Time 3D Talking Face Generation. | | | Avatar |
 - Hallo - Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar |
 - IntrinsicAvatar
 - Linly-Talker
 - LivePortrait
 - MotionGPT - language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar |
 - MuseTalk - Time High Quality Lip Synchorization with Latent Space Inpainting. | | | Avatar |
 - MuseV - length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | | | Avatar |
 - Portrait4D - Shot 4D Head Avatar Synthesis using Synthetic Data. |[arXiv](https://arxiv.org/abs/2311.18729) | | Avatar |
 - StyleAvatar3D - Text Diffusion Models for High-Fidelity 3D Avatar Generation. |[arXiv](https://arxiv.org/abs/2305.19012) | | Avatar |
 - Topo4D - Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. |[arXiv](https://arxiv.org/abs/2406.00440) | | Avatar |
 - UnityAIWithChatGPT
 - Vid2Avatar - supervised Scene Decomposition. |[arXiv](https://arxiv.org/abs/2302.11566) | | Avatar |
 - HeadSculpt
 - Ready Player Me
 - Text2Control3D - Guided Text-to-Image Diffusion Model. |[arXiv](https://arxiv.org/abs/2309.03550) | | Avatar |
 - VLOGGER
 - Wild2Avatar
 - ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. |[arXiv](https://arxiv.org/abs/2407.21686) | | Avatar |
 - Hallo2 - Duration and High-Resolution Audio-Driven Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2410.07718) | | Avatar |
 - EmoVOCA - Driven Emotional 3D Talking Heads. |[arXiv](https://arxiv.org/abs/2403.12886) | | Avatar |
 - Duix - Silicon-Based Digital Human SDK 🌐🤖 | | | Avatar |
 - HunyuanPortrait
 - HunyuanVideo-Avatar - Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters. |[arXiv](https://arxiv.org/abs/2505.20156) | | Avatar |
 - StableAvatar - Length Audio-Driven Avatar Video Generation. |[arXiv](https://arxiv.org/abs/2508.08248) | | Avatar |
 - MusePose - Driven Image-to-Video Framework for Virtual Human Generation. | | | Avatar |
 - RodinHD - Fidelity 3D Avatar Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.06938) | | Avatar |
 - EchoMimic - Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar |
- Tool (AI LLM)
 - EchoMimic - Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar |
 - ChatAvatar
Image
- LLM (LLM & Tool)
 - Lumina-Image 2.0 - Image 2.0 : A Unified and Efficient Image Generative Model. | | | Image |
 - StableStudio
 - SyncDreamer - consistent Images from a Single-view Image. |[arXiv](https://arxiv.org/abs/2309.03453) | | Image |
 - UltraEdit - based Fine-Grained Image Editing at Scale. |[arXiv](https://arxiv.org/abs/2407.05282) | | Image |
 - UltraPixel - High-Resolution Image Synthesis to New Peaks. |[arXiv](https://arxiv.org/abs/2407.02158) | | Image |
 - Unity ML Stable Diffusion
 - Depth map library and poser - diffusion-webui. | | | Image |
 - AnyDoor - shot Object-level Image Customization. |[arXiv](https://arxiv.org/abs/2307.09481) | | Image |
 - Disco Diffusion
 - AnyText
 - Blender-ControlNet
 - BriVL
 - CLIPasso
 - ComfyUI
 - ConceptLab
 - ControlNet
 - DeepFloyd IF
 - Depth Anything V2
 - DragGAN - based Manipulation on the Generative Image Manifold. |[arXiv](https://arxiv.org/abs/2305.10973) | | Image |
 - DWPose - body Pose Estimation with Two-stages Distillation. |[arXiv](https://arxiv.org/abs/2307.15880) | | Image |
 - EasyPhoto
 - Follow-Your-Click - domain Regional Image Animation via Short Prompts. |[arXiv](https://arxiv.org/abs/2403.08268) | | Image |
 - Fooocus
 - GIFfusion
 - Grounded-Segment-Anything
 - Hua
 - Hunyuan-DiT - Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. |[arXiv](https://arxiv.org/abs/2405.08748) | | Image |
 - IC-Light - Light is a project to manipulate the illumination of images. | | | Image |
 - img2img-turbo - Step Image-to-Image with SD-Turbo. | | | Image |
 - KOALA - Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. | | | Image |
 - Kolors - to-Image Synthesis. | | | Image |
 - LaVi-Bridge - to-Image Generation. |[arXiv](https://arxiv.org/abs/2403.07860) | | Image |
 - LlamaGen
 - MimicBrush - shot Image Editing with Reference Imitation. |[arXiv](https://arxiv.org/abs/2406.07547) | | Image |
 - Omost
 - Outfit Anyone - high quality virtual try-on for Any Clothing and Any Person. | | | Image |
 - PaintsUndo
 - PuLID
 - Stable Diffusion - to-image diffusion model. | | | Image |
 - Rich-Text-to-Image - to-Image Generation with Rich Text. |[arXiv](https://arxiv.org/abs/2304.06720) | | Image |
 - SEED-Story - Story: Multimodal Long Story Generation with Large Language Model. |[arXiv](https://arxiv.org/abs/2407.08683) | | Image |
 - sd-webui-controlnet
 - SDXS - Time One-Step Latent Diffusion Models with Image Conditions. | | | Image |
 - Stable.art
 - Stable Cascade
 - stable-diffusion.cpp
 - Stable Diffusion web UI
 - Stable Diffusion web UI - based UI for Stable Diffusion. | | | Image |
 - Stable Diffusion WebUI Chinese - diffusion-webui. | | | Image |
 - RPG-DiffusionMaster - to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | | | Image |
 - ClipDrop
 - DALL·E 2
 - DeepAI
 - Diffuse to Choose - All. |[arXiv](https://arxiv.org/abs/2401.13795) | | Image |
 - Ideogram
 - Imagen
 - Lexica
 - Midjourney
 - PhotoMaker
 - Photoroom
 - Prompt.Art
 - Segment Anything
 - SDXL-Lightning
 - Vispunk Visions - to-Image generation platform. | | | Image |
 - StyleDrop - To-Image Generation in Any Style. |[arXiv](https://arxiv.org/abs/2306.00983) | | Image |
 - Komiko - powered storytelling platform that lets you create original characters, comics, and animations with ease. | | | Comic |
 - CatVTON - On with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.15886) | | Image |
 - Flux - to-image and image-to-image with our Flux latent rectified flow transformers. | | | Image |
 - Lumina-mGPT - mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. |[arXiv](https://arxiv.org/abs/2408.02657) | | Image |
 - OmniGen
 - StoryMaker - to-image Generation. |[arXiv](https://arxiv.org/abs/2409.12576) | | Image |
 - CSGO - Style Composition in Text-to-Image Generation. |[arXiv](https://arxiv.org/abs/2408.16766) | | Image |
 - MetaShoot
 - Stable Diffusion 3.5
 - MakeAnything - Domain Procedural Sequence Generation. |[arXiv](https://arxiv.org/abs/2502.01572) | | Image |
 - Stable Diffusion XL Turbo - Time Text-to-Image Generation. | | | Image |
 - Stable Doodle - to-image tool that converts a simple drawing into a dynamic image. | | | Image |
 - Img2Prompt
 - Infinity - Resolution Image Synthesis. |[arXiv](https://arxiv.org/abs/2412.04431) | | Image |
 - AutoStudio - turn Interactive Image Generation. |[arXiv](https://arxiv.org/abs/2406.01388) | | Image |
 - KREA - powered design tool. | | | Image |
 - InternLM-XComposer2 - XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2401.16420) | | Image |
 - MIGC - Instance Generation Controller for Text-to-Image Synthesis. |[arXiv](https://arxiv.org/abs/2402.05408) | | Image |
 - Qwen-Image-Edit - Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. |[arXiv](https://arxiv.org/abs/2508.02324) | | Image |
 - BAGEL - Unified Model for Multimodal Understanding and Generation. BAGEL is an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. |[arXiv](https://arxiv.org/abs/2505.14683) | | Image |
 - OmniGen2
 - PosterCraft - Quality Aesthetic Poster Generation in a Unified Framework. |[arXiv](https://arxiv.org/abs/2506.10741) | | Image |
 - Draw Things - assisted image generation in Your Pocket. | | | Image |
 - HivisionIDPhotos
 - NextStep-1 - 1: Toward Autoregressive Image Generation with Continuous Tokens at Scale. |[arXiv](https://arxiv.org/abs/2508.10711) | | Image |
 - Openpose Editor - diffusion-webui. | | | Image |
 - SkyworkUniPic - Unified Autoregressive Modeling for Visual Understanding and Generation. | | | Image |
 - StreamDiffusion - Level Solution for Real-Time Interactive Generation. | | | Image |
 - USO - Driven Generation via Disentangled and Reward Learning. |[arXiv](https://arxiv.org/abs/2508.18966) | | Image |
 - IRG - Interleaving Reasoning for Better Text-to-Image Generation. |[arXiv](https://arxiv.org/abs/2509.06945) | | Image |
 - PromptEnhancer - to-Image Models via Chain-of-Thought Prompt Rewriting. |[arXiv](https://www.arxiv.org/abs/2509.04545) | | Image |
 - HunyuanImage-2.1 - 2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation. | | | Image |
 - Segment Anything Model 2 (SAM 2)
 - HunyuanImage-3.0 - 3.0: A Powerful Native Multimodal Model for Image Generation. | | | Image |
 - InstantID - shot Identity-Preserving Generation in Seconds. |[arXiv](https://arxiv.org/abs/2401.07519) | | Image |
- Tool (AI LLM)
Texture
- LLM (LLM & Tool)
 - With Poly
 - CRM
 - Text2Tex - driven texture Synthesis via Diffusion Models. |[arXiv](https://arxiv.org/abs/2303.11396) | | Texture |
 - X-Mesh - Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. |[arXiv](https://arxiv.org/abs/2303.15764) | | Texture |
 - DreamSpace - Driven Panoramic Texture Propagation. | | | Texture |
 - Dream Textures - in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | | Blender | Texture |
 - InstructHumans
 - InteX - to-Texture Synthesis via Unified Depth-aware Inpainting. |[arXiv](https://arxiv.org/abs/2403.11878) | | Texture |
 - MaterialSeg3D
 - MeshAnything
 - Neuralangelo - Fidelity Neural Surface Reconstruction. |[arXiv](https://arxiv.org/abs/2306.03092) | | Texture |
 - With Poly
 - DreamMat - quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. |[arXiv](https://arxiv.org/abs/2405.17176) | | Texture |
 - LLaMA-Mesh - Mesh: Unifying 3D Mesh Generation with Language Models. |[arXiv](https://arxiv.org/abs/2411.09595) | | Mesh |
 - TexFusion - Guided Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.13772) | | Texture |
 - Paint-it - to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture |
- Tool (AI LLM)
 - Paint-it - to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture |
 - Polycam
Code
- LLM (LLM & Tool)
 - CodeGeeX
 - CodeGeeX4
 - CodeGen - source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. |[arXiv](https://arxiv.org/abs/2203.13474) | | Code |
 - AI Code Translator
 - aiXcoder-7B - 7B Code Large Language Model. | | | Code |
 - bloop
 - Chapyter
 - CodeGen2
 - CodeTF - stop Transformer Library for State-of-the-art Code LLM. | | | Code |
 - CodeT5
 - SoTaNa - Source Software Development Assistant. |[arXiv](https://arxiv.org/abs/2308.13416) | | Code |
 - StarCoder
 - StarCoder 2
 - CodeGeeX2
 - OpenAI Codex - 3. | | | Code |
 - DeepSeek Coder
 - Void
 - UnityGen AI - powered code generation plugin for Unity. | | Unity | Code |
 - Cursor - 4 in a new type of editor. | | | Code |
 - RobloxScripterAI - powered code generation tool for Roblox. | | Roblox | Code |
 - Code Llama
 - Scikit-LLM - learn for enhanced text analysis tasks. | | | Code |
 - Stable Code 3B
 - Code World Model (CWM) - billion-parameter open-weights LLM, to advance research on code generation with world models. | | | Code |
 - PandasAI
- Tool (AI LLM)
 - UnityGen AI - powered code generation plugin for Unity. | | Unity | Code |
 - Stable Code 3B
Video
- LLM (LLM & Tool)
 - DreamCinema
 - ViewCrafter - fidelity Novel View Synthesis. |[arXiv](https://arxiv.org/abs/2409.02048) | | Video |
 - 360DVD - Degree Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2401.06578) | | Video |
 - BackgroundRemover
 - CoDeF
 - CogVLM - source visual language model (VLM). | | | Visual |
 - Diffutoon - Resolution Editable Toon Shading via Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.16224) | | Video |
 - dolphin
 - EDGE - plausible dances while remaining faithful to arbitrary input music. |[arXiv](https://arxiv.org/abs/2211.10658) | | Video |
 - EMO - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. |[arXiv](https://arxiv.org/abs/2402.17485) | | Video |
 - Hotshot-XL - XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | | | Video |
 - MicroCinema - and-Conquer Approach for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2311.18829) | | Video |
 - MOFA-Video - to-Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2405.20222) | | Video |
 - MoneyPrinterTurbo
 - Mora
 - LaVie - Quality Video Generation with Cascaded Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2309.15103) | | Video |
 - MotionDirector - to-Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.08465) | | Video |
 - Motionshop
 - Mov2mov - diffusion-webui. | | | Video |
 - Open-Sora
 - Open-Sora - Sora Plan. | | | Video |
 - Reuse and Diffuse - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.03549) | | Video |
 - LVDM - Fidelity Long Video Generation. |[arXiv](https://arxiv.org/abs/2211.13221) | | Video |
 - ShortGPT
 - Show-1 - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.15818) | | Video |
 - Snap Video - to-Video Synthesis. |[arXiv](https://arxiv.org/abs/2402.14797) | | Video |
 - SoraWebui - source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | | | Video |
 - StoryDiffusion - Attention for Long-Range Image and Video Generation. |[arXiv](https://arxiv.org/abs/2405.01434) | | Video |
 - StreamingT2V
 - StyleCrafter - to-Video Generation with Style Adapter. |[arXiv](https://arxiv.org/abs/2312.00330) | | Video |
 - Text2Video-Zero - to-Image Diffusion Models are Zero-Shot Video Generators. |[arXiv](https://arxiv.org/abs/2303.13439) | | Video |
 - Track-Anything - Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. |[arXiv](https://arxiv.org/abs/2304.11968) | | Video |
 - Tune-A-Video - Shot Tuning of Image Diffusion Models for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2212.11565) | | Video |
 - VGen
 - Video-ChatGPT - ChatGPT is a video conversation model capable of generating meaningful conversation about videos. |[arXiv](https://arxiv.org/abs/2306.05424) | | Video |
 - VideoElevator - to-Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2403.05438) | | Video |
 - Stable Video Diffusion - to-Video. | | | Video |
 - Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video |
 - VisualRWKV - enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | | | Visual |
 - V-JEPA
 - VideoMamba
 - AtomoVideo - to-Video Generation. |[arXiv](https://arxiv.org/abs/2403.01800) | | Video |
 - Assistive
 - Anything in Any Scene
 - Boximator
 - CogVideo
 - Descript
 - Decohere
 - Etna
 - Generative Dynamics
 - Follow Your Pose - Guided Text-to-Video Generation using Pose-Free Videos. |[arXiv](https://arxiv.org/abs/2304.01186) | | Video |
 - Genmo
 - GenTron
 - HiGen - temporal Decoupling for Text-to-Video generation. | | | Video |
 - Imagen Video - resolution models. | | | Video |
 - I2VGen-XL - Quality Image-to-Video Synthesis via Cascaded Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.04145) | | Video |
 - InstructVideo
 - LTX Studio - driven filmmaking platform for creators, marketers, filmmakers and studios. | | | Video |
 - MagicVideo
 - Magic Hour
 - MAGVIT-v2
 - MAGVIT
 - Make-A-Video - A-Video is a state-of-the-art AI system that generates videos from text. |[arXiv](https://arxiv.org/abs/2209.14792) | | Video |
 - Make Pixels Dance - Dynamic Video Generation. |[arXiv](https://arxiv.org/abs/2311.10982) | | Video |
 - Make-Your-Video
 - MobileVidFactory - Based Social Media Video Generation for Mobile Devices from Text. | | | Video |
 - Morph Studio - to-Video AI Magic, manifest your creativity through your prompt. | | | Video |
 - MovieFactory
 - Neural Frames
 - NeverEnds
 - Phenaki
 - TATS - Agnostic VQGAN and Time-Sensitive Transformer. | | | Video |
 - TwelveLabs
 - UniVG - modal Video Generation. | | | Video |
 - VideoCrafter1 - Quality Video Generation. |[arXiv](https://arxiv.org/abs/2310.19512) | | Video |
 - VideoCrafter2 - Quality Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.09047) | | Video |
 - VideoDrafter - Consistent Multi-Scene Video Generation with LLM. |[arXiv](https://arxiv.org/abs/2401.01256) | | Video |
 - VideoFactory - to-Video Generation. | | | Video |
 - VideoLCM
 - Video LDMs - resolution Video Synthesis with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2304.08818) | | Video |
 - Vispunk Motion
 - W.A.L.T
 - CogVideoX - source version of the video generation model, which is homologous to 清影. | | | Video |
 - Tora - oriented Diffusion Transformer for Video Generation. |[arXiv](https://arxiv.org/abs/2407.21705) | | Video |
 - MIMO
 - FullJourney
 - Ruyi - to-video model capable of generating cinematic-quality videos at a resolution of 768, with a frame rate of 24 frames per second, totaling 5 seconds and 120 frames. | | | Video |
 - Vchitect-2.0 - 2.0: Parallel Transformer for Scaling Up Video Diffusion Models. | | | Video |
 - SkyReels-A1 - A1: Expressive Portrait Animation in Video Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2502.10841) | | Video |
 - SkyReels-V1 - Centric Video Foundation Model. | | | Video |
 - LTX-Video - Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. | | | Video |
 - Wan2.1 - Scale Video Generative Models. | | | Video |
 - Step-Video-T2V - Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model. |[arXiv](https://arxiv.org/abs/2502.10248) | | Video |
 - Video-LLaVA
 - Wan2.2 - Scale Video Generative Models. |[arXiv](https://arxiv.org/abs/2503.20314) | | Video |
 - Pika Labs - making experience with AI. | | | Video |
 - ART•V - Regressive Text-to-Video Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.18834) | | Video |
 - MoviiGen 1.1 - Quality Video Generative Models. MoviiGen 1.1 is a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. This model is a fine-tuning model based on the Wan2.1. Based on comprehensive evaluations by 11 professional filmmakers and AIGC creators, including industry experts, across 60 aesthetic dimensions, MoviiGen 1.1 demonstrates superior performance in key cinematic aspects. | | | Video |
 - DomoAI
 - DynamiCrafter - domain Images with Video Diffusion Priors. |[arXiv](https://arxiv.org/abs/2310.12190) | | Video |
 - Emu Video - to-Video Generation by Explicit Image Conditioning. | | | Video |
 - Fairy - Guided Video-to-Video Synthesis. | | | Video |
 - Follow-Your-Canvas - Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. |[arXiv](https://arxiv.org/abs/2409.01055) | | Video |
 - InfiniteTalk - driven Video Generation for Sparse-Frame Video Dubbing. |[arXiv](https://arxiv.org/abs/2508.14033) | | Video |
 - Lumiere - Time Diffusion Model for Video Generation. |[arXiv](https://arxiv.org/abs/2401.12945) | | Video |
 - MagicVideo-V2 - Stage High-Aesthetic Video Generation. |[arXiv](https://arxiv.org/abs/2401.04468) | | Video |
 - MotionCtrl
 - Pollinations
 - Sora
 - StableVideo - driven Consistency-aware Diffusion Video Editing. | | | Video |
 - TF-T2V - to-Video Generation with Text-free Videos. |[arXiv](https://arxiv.org/abs/2312.15770) | | Video |
 - VideoComposer
 - VideoGen - Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.00398) | | Video |
 - VideoPoet - shot video generation. |[arXiv](https://arxiv.org/abs/2312.14125) | | Video |
 - Waver - generation, universal foundation model family for unified image and video generation, built on rectified flow Transformers and engineered for industry-grade performance. |[arXiv](https://arxiv.org/abs/2508.15761) | | Video |
 - Zeroscope - to-Video. | | | Video |
 - HuMo - Centric Video Generation via Collaborative Multi-Modal Conditioning. |[arXiv](https://arxiv.org/abs/2509.08519) | | Video |
 - CoNR - drawn anime character sheets(ACS). |[arXiv](https://arxiv.org/abs/2207.05378) | | Video |
 - HunyuanVideo
 - Mini-Gemini - modality Vision Language Models. | | | Vision |
 - Mochi 1 - of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. | | | Video |
 - MotionClone - Free Motion Cloning for Controllable Video Generation. |[arXiv](https://arxiv.org/abs/2406.05338) | | Video |
 - Genie
 - LongLive - time Interactive Long Video Generation. |[arXiv](https://arxiv.org/abs/2509.22622) | | Video |
 - Lynx - Fidelity Personalized Video Generation. |[arXiv](https://arxiv.org/abs/2509.15496) | | Video |
 - Ovi - Modal Fusion for Audio-Video Generation. |[arXiv](https://arxiv.org/abs/2510.01284) | | Video |
- Tool (AI LLM)
 - FullJourney
 - Gen-2 - modal AI system that can generate novel videos with text, images, or video clips. | | | Video |
 - Moonvalley - to-video generative AI model. | | | Video |
 - Pixeling - realistic, and extremely controllable visual content including images, videos and 3D models. | | | Video |
 - PixVerse - taking videos with AI. | | | Video |
Music
- LLM (LLM & Tool)
 - FluxMusic - to-Music Generation with Rectified Flow Transformer. | [arXiv](https://arxiv.org/abs/2409.00587) | | Music |
 - ChatMusician
 - Chord2Melody
 - Diff-BGM
 - Jukebox
 - Magenta
 - MusicGen
 - AIVA
 - Amper Music
 - Boomy
 - HeyMusic.AI
 - GPTAbleton - osc. | | | Music |
 - MeLoDy
 - Mubert
 - MuseNet - minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. | | | Music |
 - MusicLDM - to-Music Generation Using Beat-Synchronous Mixup Strategies. | [arXiv](https://arxiv.org/abs/2308.01546) | | Music |
 - MusicLM
 - JEN-1 - Guided Universal Music Generation with Omnidirectional Diffusion Models. | | | Music |
 - SonicMaster - in-One Music Restoration and Mastering. | [arXiv](https://arxiv.org/abs/2508.03448) | | Music |
 - YuE - song Generation Foundation Model, something similar to Suno.ai but open. | | | Music |
 - Image to Music
 - SoundRaw
 - Soundry AI - to-sound and infinite sample packs. | | | Music |
 - Riffusion App - time music generation with stable diffusion. | | | Music |
 - AnyAccomp
- Tool (AI LLM)
 - JEN-1 - Guided Universal Music Generation with Omnidirectional Diffusion Models. | | | Music |
Game (Agent)
- Tool (AI LLM)
 - AgentSims - Source Sandbox for Large Language Model Evaluation. | | | Agent |
3D Model
- LLM (LLM & Tool)
 - chatGPT-maya
 - DreamGaussian4D
 - GALA3D - to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D |
 - HoloDreamer
 - Infinigen
 - Interactive3D
 - Isotropic3D - to-3D Generation Based on a Single CLIP Embedding. | | | 3D |
 - LION
 - Make-It-3D - Fidelity 3D Creation from A Single Image with Diffusion Prior. |[arXiv](https://arxiv.org/abs/2303.14184) | | Model |
 - MVDream - view Diffusion for 3D Generation. |[arXiv](https://arxiv.org/abs/2308.16512) | | 3D |
 - NVIDIA Instant NeRF
 - Paint3D - Less Texture Diffusion Models. |[arXiv](https://arxiv.org/abs/2312.13913) | | 3D |
 - PAniC-3D - view 3D Reconstruction from Portraits of Anime Characters. |[arXiv](https://arxiv.org/abs/2303.14587) | | Model |
 - Point·E
 - Shap-E
 - 3DTopia - to-3D Generation within 5 Minutes. |[arXiv](https://arxiv.org/abs/2403.02234) | | 3D |
 - Stable Dreamfusion - to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | | | Model |
 - threestudio
 - TripoSR - of-the-art open-source model for fast feedforward 3D reconstruction from a single image. |[arXiv](https://arxiv.org/abs/2403.02151) | | Model |
 - CF-3DGS - Free 3D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.07504) | | 3D |
 - CharacterGen - View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D |
 - DUSt3R
 - ViVid-1-to-3
 - Wonder3D - Domain Diffusion. |[arXiv](https://arxiv.org/abs/2310.15008) | | 3D |
 - Zero-1-to-3 - shot One Image to 3D Object. |[arXiv](https://arxiv.org/abs/2303.11328) | | Model |
 - Unique3D - Quality and Efficient 3D Mesh Generation from a Single Image. |[arXiv](https://arxiv.org/abs/2405.20343) | | 3D |
 - UnityGaussianSplatting
 - Blockade Labs - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model |
 - CSM
 - Dash
 - Instruct-NeRF2NeRF
 - Luma AI
 - Meshy
 - ProlificDreamer - Fidelity and diverse Text-to-3D generation with Variational score Distillation. |[arXiv](https://arxiv.org/abs/2305.16213) | | Model |
 - Sloyd
 - 3D-GPT
 - Tafi
 - 3Dpresso
 - SF3D - unwrapping and Illumination Disentanglement. |[arXiv](https://arxiv.org/abs/2408.00653) | | 3D |
 - BlenderMCP
 - 3DTopia-XL - XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. |[arXiv](https://arxiv.org/abs/2409.12957) | | 3D |
 - Animate3D - view Video Diffusion. |[arXiv](https://arxiv.org/abs/2407.11398) | | 3D |
 - Edify 3D - Quality 3D Asset Generation. |[arXiv](https://arxiv.org/abs/2411.07135) | | 3D |
 - Direct3D-S2 - S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention. |[arXiv](https://arxiv.org/abs/2505.17412) | | 3D |
 - Step1X-3D - 3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets. |[arXiv](https://arxiv.org/abs/2505.07747) | | 3D |
 - PhysRig - Based Rigging for Realistic Articulated Object Modeling. |[arXiv](https://arxiv.org/abs/2506.20936) | | Model |
 - Hunyuan3D 2.1 - Fidelity 3D Assets with Production-Ready PBR Material. |[arXiv](https://arxiv.org/abs/2506.15442) | | 3D |
 - LATTE3D - scale Amortized Text-To-Enhanced3D Synthesis. |[arXiv](https://arxiv.org/abs/2403.15385) | | 3D |
 - Anything-3D - Anything + 3D. Let's lift the anything to 3D. |[arXiv](https://arxiv.org/abs/2304.10261) | | Model |
 - Any2Point - modality Large Models for Efficient 3D Understanding. |[arXiv](https://arxiv.org/abs/2404.07989) | | 3D |
 - BlenderGPT - 4. | | Blender | Model |
 - Blender-GPT - in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model |
 - GaussCtrl - View Consistent Text-Driven 3D Gaussian Splatting Editing. |[arXiv](https://arxiv.org/abs/2403.08733) | | 3D |
 - GaussianCube
 - GaussianDreamer
 - GenieLabs - UGC. | | | 3D |
 - HiFA - fidelity Text-to-3D with advance Diffusion guidance. | | | Model |
 - Hunyuan3D-1.0 - 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. |[arXiv](https://arxiv.org/abs/2411.02293) | | 3D |
 - lumine AI - Powered Creativity. | | | 3D |
 - One-2-3-45 - Shape Optimization. |[arXiv](https://arxiv.org/abs/2306.16928) | | Model |
 - SV3D - view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. |[arXiv](https://arxiv.org/abs/2403.12008) | | 3D |
 - Voxcraft - to-Use 3D Models with AI. | | | 3D |
 - CityDreamer
 - DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
 - Hunyuan3D 2.0
 - 3D-LLM
- Tool (AI LLM)
 - CityDreamer
 - DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
 - Mootion
 - Spline AI
Animation
- LLM (LLM & Tool)
 - AnimateAnything - Grained Open Domain Image Animation with Motion Guidance. |[arXiv](https://arxiv.org/abs/2311.12886) | | Animation |
 - AnimateLCM
 - AnimationGPT
 - DreaMoving
 - FaceFusion
 - GeneFace - Fidelity Audio-Driven 3D Talking Face Synthesis. |[arXiv](https://arxiv.org/abs/2301.13430) | | Animation |
 - MagicAnimate
 - SadTalker-Video-Lip-Sync
 - Wav2Lip - syncing Videos In The Wild. |[arXiv](https://arxiv.org/abs/2008.10010) | | Animation |
 - Animate Anyone - to-Video Synthesis for Character Animation. |[arXiv](https://arxiv.org/abs/2311.17117) | | Animation |
 - Deforum
 - FreeInit
 - ID-Animator - Shot Identity-Preserving Human Video Generation. |[arXiv](https://arxiv.org/abs/2404.15275) | | Animation |
 - NUWA-Infinity - Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | | | Animation |
 - Stable Animation - to-animation tool for developers. | | | Animation |
 - Wonder Studio - action scene. | | | Animation |
 - DrawingSpinUp
 - Animate-X - X: Universal Character Image Animation with Enhanced Motion Representation. |[arXiv](https://arxiv.org/abs/2410.10306) | | Animation |
 - Omni Animation
 - ToonCrafter
 - AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
 - Index-AniSora - AniSora is the most powerful open-source animated video generation model. It enables one-click creation of video shots across diverse anime styles including series episodes, Chinese original animations, manga adaptations, VTuber content, anime PVs, mad-style parodies(鬼畜动画), and more! |[arXiv](https://arxiv.org/abs/2412.10255) | | Animation |
 - PIA - and-Play Modules in Text-to-Image Models. |[arXiv](https://arxiv.org/abs/2312.13964) | | Animation |
 - ToonComposer - Keyframing. |[arXiv](https://arxiv.org/abs/2508.10881) | | Animation |
 - AnimateDiff - to-Image Diffusion Models without Specific Tuning. |[arXiv](https://arxiv.org/abs/2307.04725) | | Animation |
 - SadTalker - Driven Single Image Talking Face Animation. |[arXiv](https://arxiv.org/abs/2211.12194) | | Animation |
 - TaleCrafter
 - NUWA-XL
- Tool (AI LLM)
 - AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
 - Omni Animation
VLM (Visual)
- LLM (LLM & Tool)
 - CogVLM2 - level open-source multi-modal model based on Llama3-8B. | | | Visual |
 - EVF-SAM - SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. |[arXiv](https://arxiv.org/abs/2406.20076) | | Visual |
 - Kangaroo - Language Model Supporting Long-context Video Input. | | | Visual |
 - LLaVA++ - 3 and Phi-3. | | | Visual |
 - LongVA
 - MotionLLM
 - Cambrian-1 - 1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. |[arXiv](https://arxiv.org/abs/2406.16860) | | Multimodal LLMs |
 - Qwen-VL - Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual |
 - ShareGPT4V - modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual |
 - SOLO - Language Modeling. |[arXiv](https://arxiv.org/abs/2407.06438) | | Visual |
 - Video-CCAM - CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | | | Visual |
 - VideoLLaMA 2 - Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual |
 - VILA - training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual |
 - PLLaVA - free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual |
 - Video-MME - Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual |
 - FaceHi
 - CoTracker
 - LLaVA-OneVision - OneVision: Easy Visual Task Transfer. |[arXiv](https://arxiv.org/abs/2408.03326) | | Visual |
 - Sapiens
 - VideoLLaMA 3
 - Kwai Keye-VL - VL is a cutting-edge multimodal large language model meticulously crafted by the Kwai Keye Team at Kuaishou. |[arXiv](https://arxiv.org/abs/2509.01563) | | VLM |
 - Lumina-DiMOO - DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding. | | | VLM |
 - dots.vlm1 - language model in the dots model family. Built upon a 1.2 billion-parameter vision encoder and the DeepSeek V3 large language model (LLM), dots.vlm1 demonstrates strong multimodal understanding and reasoning capabilities. | | | VLM |
 - GLM-V - 4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. |[arXiv](https://arxiv.org/abs/2507.01006) | | VLM |
 - LGVI - Driven Video Inpainting via Multimodal Large Language Models. | | | Visual |
 - MaskViT - Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual |
 - MiniCPM-Llama3-V 2.5 - 4V Level MLLM on Your Phone. | | | Visual |
 - VideoAgent - augmented Multimodal Agent for Video Understanding. |[arXiv](https://arxiv.org/abs/2403.11481) | | Agent |
 - Vitron - level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual |
 - POINTS-Reader - Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion. |[arXiv](https://arxiv.org/abs/2509.01215) | | Visual |
 - MoE-LLaVA - Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual |
 - MiniCPM-V 4.0 - V 4.0: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone. | | | Visual |
Visual
- Tool (AI LLM)
 - Video-MME - Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual |
Audio
- LLM (LLM & Tool)
 - AcademiCodec
 - Amphion - Source Audio, Music, and Speech Generation Toolkit. |[arXiv](https://arxiv.org/abs/2312.09911) | | Audio |
 - ArchiSound
 - AudioEditing - Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. |[arXiv](https://arxiv.org/abs/2402.10009) | | Audio |
 - Audiogen Codec
 - AudioGPT
 - AudioLCM - to-Audio Generation with Latent Consistency Models. |[arXiv](https://arxiv.org/abs/2406.00356v1) | | Audio |
 - AudioLDM 2 - supervised Pretraining. |[arXiv](https://arxiv.org/abs/2308.05734) | | Audio |
 - Auffusion - to-Audio Generation. |[arXiv](https://arxiv.org/abs/2401.01044) | | Audio |
 - CTAG - to-Audio Generation via Synthesizer Programming. | | | Audio |
 - Make-An-Audio 3 - based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio |
 - NeuralSound - based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio |
 - Qwen2-Audio - Audio chat & pretrained large audio language model proposed by Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2407.10759) | | Audio |
 - SEE-2-SOUND - Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio |
 - TANGO - to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio |
 - VTA-LDM - to-Audio Generation with Hidden Alignment. |[arXiv](https://arxiv.org/abs/2407.07464) | | Audio |
 - WavJourney
 - Audiobox
 - MAGNeT - Autoregressive Transformer. | | | Audio |
 - AudioLDM - to-Audio Generation with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12503) | | Audio |
 - Make-An-Audio - To-Audio Generation with Prompt-Enhanced Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12661) | | Audio |
 - SoundStorm
 - HunyuanVideo-Foley - Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation. |[arXiv](https://www.arxiv.org/abs/2508.16930) | | Audio |
 - FoleyCrafter
 - SyncFusion - synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio |
 - MiDashengLM
 - AudioX - to-Audio Generation. |[arXiv](https://arxiv.org/abs/2503.10522) | | Audio |
 - MeanAudio - to-Audio Generation with Mean Flows. | | | Audio |
 - MMAudio - Quality Video-to-Audio Synthesis. |[arXiv](https://arxiv.org/abs/2412.15322) | | Audio |
 - OptimizerAI
 - Stable Audio - Conditioned Latent Audio Diffusion. | | | Audio |
 - Stable Audio Open - length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio |
 - ThinkSound - of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing. |[arXiv](https://arxiv.org/abs/2506.21448) | | Audio |
Singing Voice
- LLM (LLM & Tool)
 - DiffSinger
 - VI-SVS
 - so-vits-svc
 - Retrieval-based-Voice-Conversion-WebUI - to-use SVC framework based on VITS. | | | Singing Voice |
Speech
- LLM (LLM & Tool)
 - Applio - friendly experience. | | | Speech |
 - Bert-VITS2
 - ChatTTS
 - CosyVoice - lingual large voice generation model, providing inference, training and deployment full-stack ability. | | | Speech |
 - DEX-TTS - based EXpressive Text-to-Speech with Style Modeling on Time Variability. | [arXiv](https://arxiv.org/abs/2406.19135) | | Speech |
 - Glow-TTS - to-Speech via Monotonic Alignment Search. | [arXiv](https://arxiv.org/abs/2005.11129) | | Speech |
 - GPT-SoVITS - shot Voice Conversion and Text-to-Speech WebUI. | | | Speech |
 - MahaTTS - Source Large Speech Generation Model. | | | Speech |
 - Matcha-TTS
 - MeloTTS - quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | | | Speech |
 - MetaVoice-1B - level speech intelligence. | | | Speech |
 - One-Shot-Voice-Cloning - TTS. | | | Speech |
 - OpenVoice
 - OverFlow
 - RealtimeTTS - of-the-art text-to-speech (TTS) library designed for real-time applications. | | | Speech |
 - SenseVoice
 - SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech |
 - speech-to-text-gpt3-unity
 - StyleTTS 2 - Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | [arXiv](https://arxiv.org/abs/2306.07691) | | Speech |
 - Voicebox - Guided Multilingual Universal Speech Generation at Scale. | [arXiv](https://arxiv.org/abs/2306.15687) | | Speech |
 - VoiceCraft - Shot Speech Editing and Text-to-Speech in the Wild. | | | Speech |
 - Whisper - purpose speech recognition model. | | | Speech |
 - X-E-Speech - Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | | | Speech |
 - XTTS - to-Speech generation. | | | Speech |
 - YourTTS - Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | [arXiv](https://arxiv.org/abs/2112.02418) | | Speech |
 - TorToiSe-TTS - voice TTS system trained with an emphasis on quality. | | | Speech |
 - ZMM-TTS - shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | [arXiv](https://arxiv.org/abs/2312.14398) | | Speech |
 - TTS Generation WebUI
 - StableTTS - generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | | | Speech |
 - Higgs Audio
 - Audyo
 - Fliki
 - CLAPSpeech - Audio Pre-Training. | [arXiv](https://arxiv.org/abs/2305.10763) | | Speech |
 - LOVO - to AI Voice Generator & Text to Speech platform for thousands of creators. | | | Speech |
 - VALL-E X - Lingual Neural Codec Language Modeling | [arXiv](https://arxiv.org/abs/2303.03926) | | Speech |
 - tortoise.cpp - tts. | | | Speech |
 - UniAudio 2.0 - task Audio Foundation Model with Reasoning-Augmented Audio Tokenization. | | | Speech |
 - GLM-4-Voice - 4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. | | | Speech |
 - Step-Audio - Audio: Unified Understanding and Generation in Intelligent Speech Interaction. | [arXiv](https://arxiv.org/abs/2502.11946) | | Speech |
 - IndexTTS2 - Controlled Auto-Regressive Zero-Shot Text-to-Speech. | [arXiv](https://arxiv.org/abs/2506.21619) | | Speech |
 - Chatterbox - grade open-source TTS model. | | | Speech |
 - UnityNeuroSpeech
 - VoxCPM - Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning. | | | Speech |
 - FireRedTTS-2 - 2: Towards Long Conversational Speech Generation for Podcast and Chatbot. | [arXiv](https://arxiv.org/abs/2509.02020) | | Speech |
 - Bark - Prompted Generative Audio Model. | | | Speech |
 - EmotiVoice - Voice and Prompt-Controlled TTS Engine. | | | Speech |
 - Kitten TTS - source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis. | | | Speech |
 - Narakeet
 - Mini-Omni - Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | [arXiv](https://arxiv.org/abs/2408.16725) | | Speech |
 - Step-Audio 2 - Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. | [arXiv](https://arxiv.org/abs/2507.16632) | | Speech |
 - VALL-E - Shot Text to Speech Synthesizers. | [arXiv](https://arxiv.org/abs/2301.02111) | | Speech |
 - VibeVoice - form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. | | | Speech |
 - Stable Speech - to-Speech model. | | | Speech |
 - WhisperSpeech - to-speech system built by inverting Whisper. | | | Speech |
 - Liquid Audio - Speech-to-Speech audio models by Liquid AI. | | | Speech |
- Tool (AI LLM)
 - Vocode - source library for building voice-based LLM applications. | | | Speech |
Analytics
- LLM (LLM & Tool)
 - Ludo.ai
Shader
- LLM (LLM & Tool)
 - AI Shader - powered shader generator for Unity. | | Unity | Shader |

Programming Languages

Python 362 Jupyter Notebook 35 TypeScript 25 C# 18 JavaScript 12 C++ 9 HTML 8 Go 3 Cuda 2 Shell 1

ai-game-devtools

Project List

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="game">Game (World Model & Agent)</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="avatar">Avatar</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="image">Image</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="texture">Texture</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="code">Code</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="video">Video</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="music">Music</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="game">Game (Agent)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="model">3D Model</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="animation">Animation</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="visual">VLM (Visual)</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="visual">Visual</span>

<span id="tool">Tool (AI LLM)</span>

<span id="audio">Audio</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="voice">Singing Voice</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="speech">Speech</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="speech">Analytics</span>

<span id="tool">LLM (LLM & Tool)</span>

<span id="shader">Shader</span>

<span id="tool">LLM (LLM & Tool)</span>