ai-game-devtools

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
https://github.com/Yuan-ManX/ai-game-devtools

Last synced: about 5 hours ago
JSON representation

Texture
- Tool (AI LLM)
 - LLaMA-Mesh - Mesh: Unifying 3D Mesh Generation with Language Models. |[arXiv](https://arxiv.org/abs/2411.09595) | | Mesh |
 - CRM
 - Paint-it - to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture |
 - Text2Tex - driven texture Synthesis via Diffusion Models. |[arXiv](https://arxiv.org/abs/2303.11396) | | Texture |
 - X-Mesh - Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. |[arXiv](https://arxiv.org/abs/2303.15764) | | Texture |
 - DreamMat - quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. |[arXiv](https://arxiv.org/abs/2405.17176) | | Texture |
 - DreamSpace - Driven Panoramic Texture Propagation. | | | Texture |
 - Dream Textures - in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | | Blender | Texture |
 - InstructHumans
 - InteX - to-Texture Synthesis via Unified Depth-aware Inpainting. |[arXiv](https://arxiv.org/abs/2403.11878) | | Texture |
 - MaterialSeg3D
 - MeshAnything
 - Neuralangelo - Fidelity Neural Surface Reconstruction. |[arXiv](https://arxiv.org/abs/2306.03092) | | Texture |
 - Polycam
 - TexFusion - Guided Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.13772) | | Texture |
 - With Poly
Project List
- Tool (AI LLM)
 - MiniMax-01 - 01: Scaling Foundation Models with Lightning Attention. |[arXiv](https://arxiv.org/abs/2501.08313) | | LLM |
 - SkyThought - T1: Train your own O1 preview model within $450. | | | LLM |
 - Open Deep Research - powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. | | | LLM |
 - AICommand
 - AgentGPT
 - AIOS
 - Assistant CLI
 - baichuan-7B - scale 7B pretraining language model developed by Baichuan. | | | Tool |
 - Baichuan-13B
 - Baichuan 2
 - Bisheng
 - Character-LLM - Playing. |[arXiv](https://arxiv.org/abs/2310.10158) | | Tool |
 - ChatGPT-API-unity
 - BabyAGI - powered task management system. | | | Tool |
 - 👶🤖🖥️ BabyAGI UI
 - ChatGPTForUnity
 - ChatRWKV
 - ChatYuan
 - Chinese-LLaMA-Alpaca-3 - 3 LLMs) developed from Meta Llama 3. | | | Tool |
 - Chrome-GPT
 - CoreNet
 - DBRX
 - DCLM
 - DemoGPT - AI App Generator with the Power of Llama 2 | | | Tool |
 - Design2Code - End Engineering | | | Tool |
 - Devika
 - Devon - source pair programmer. | | | Tool |
 - Flowise
 - Gemma - of-the art open models built from research and technology used to create Google Gemini models. | | | Tool |
 - gemma.cpp
 - GLM-4 - 4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. | | | Tool |
 - GPT4All
 - GPTScript
 - Hugging Face API Unity Integration - to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. | | Unity | Tool |
 - ImageBind
 - InteractML-Unreal Engine
 - InternLM - sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. |[arXiv](https://arxiv.org/abs/2403.17297) | | Tool |
 - Jan
 - Lamini - tuning on their own data. | | | Tool |
 - Index-1.9B
 - InteractML-Unity
 - LaMini-LM - LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions. | | | Tool |
 - LaVague
 - Lemur
 - Lepton AI
 - Lit-LLaMA - Adapter fine-tuning, pre-training. | | | Tool |
 - llama2-webui
 - Llama 3
 - Llama 3.1
 - MiniGPT-4 - language Understanding with Advanced Large Language Models. |[arXiv](https://arxiv.org/abs/2304.10592) | | Tool |
 - MiniGPT-5 - and-Language Generation via Generative Vokens. |[arXiv](https://arxiv.org/abs/2310.02239) | | Tool |
 - LLaSM
 - LLM Answer Engine - Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | | | Tool |
 - llm.c
 - LLMUnity
 - LLocalSearch
 - LogicGamesSolver
 - Large World Model (LWM) - purpose large-context multimodal autoregressive model. |[arXiv](https://arxiv.org/abs/2402.08268) | | Tool |
 - Lumina-T2X - T2X is a unified framework for Text to Any Modality Generation. |[arXiv](https://arxiv.org/abs/2405.05945) | | Tool |
 - MetaGPT - Agent Framework | | | Tool |
 - MiniCPM-2B - side LLM outperforms Llama2-13B. | | | Tool |
 - MLC LLM
 - MobiLlama
 - mPLUG-Owl🦉
 - NExT-GPT - to-Any Multimodal Large Language Model. | | | Tool |
 - OLMo
 - OneLLM
 - Open-Assistant - based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. | | | Tool |
 - Orion-14B - 14B is a family of models includes a 14B foundation LLM, and a series of models. |[arXiv](https://arxiv.org/abs/2401.12246) | | Tool |
 - WebGPT
 - Panda - 7B, -13B, -33B, -65B for continuous pre-training in the Chinese field. | | | Tool |
 - Perplexica - powered search engine. | | | Tool |
 - RepoAgent - Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. |[arXiv](https://arxiv.org/abs/2402.16667) | | Tool |
 - Sanity AI Engine
 - SearchGPT
 - Skywork - trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. | | | Tool |
 - StableLM
 - Stanford Alpaca - following LLaMA Model. | | | LLM |
 - Text generation web UI - J, OPT, and GALACTICA. | | | Tool |
 - TinyChatEngine - Device LLM Inference Library. | | | Tool |
 - ToolBench
 - Unity ChatGPT
 - Unity OpenAI-API Integration - 3 language model and ChatGPT API into a Unity project. | | Unity | Tool |
 - Unreal Engine 5 Llama LoRA - of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools. | | Unreal Engine | Tool |
 - UnrealGPT
 - Web3-GPT
 - WordGPT
 - Yi
 - 01 Project - source language model computer. | | | Tool |
 - CogVLM - source visual language foundation model. |[arXiv](https://arxiv.org/abs/2311.03079) | | Tool |
 - Dora
 - GPT-4o - 4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. | | | Tool |
 - Grok-1 - of-Experts model, Grok-1. | | | Tool |
 - HuggingChat
 - Mixtral 8x7B - of-Experts. |[arXiv](https://arxiv.org/abs/2401.04088) | | Tool |
 - Mistral 7B
 - Mistral Large - edge text generation model. It reaches top-tier reasoning capabilities. | | | Tool |
 - Moshi
 - Nemotron-4 - billion-parameter large multilingual language model trained on 8 trillion text tokens. |[arXiv](https://arxiv.org/abs/2402.16819) | | Tool |
 - Pi
 - ShareGPT4V - Modal Models with Better Captions. | | | Tool |
 - AI Scientist - Ended Scientific Discovery. |[arXiv](https://arxiv.org/abs/2408.06292) | | Tool |
 - LongWriter
 - MOSS - source tool-augmented conversational language model from Fudan University. | | | Tool |
 - Qwen1.5
 - DeepSeek-V3 - V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. |[arXiv](https://arxiv.org/abs/2412.19437) | | LLM |
 - Mistral Large - edge text generation model. It reaches top-tier reasoning capabilities. | | | Tool |
 - InteractML-Unreal Engine
 - Moshi - text foundation model for real time dialogue. | | | Tool |
 - Unity OpenAI-API Integration - 3 language model and ChatGPT API into a Unity project. | | Unity | Tool |
 - DeepSeek-R1 - R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. | | | LLM |
 - Qwen-7B - 7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. | | | LLM |
 - Auto-GPT - source attempt to make GPT-4 fully autonomous. | | | Tool |
 - Cosmos
 - Janus
 - Mistral 7B
 - OmniLMM - modal models for strong performance and efficient deployment. | | | Tool |
 - Qwen2
 - Qwen2.5-Coder - Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2409.12186) | | LLM |
 - s1 - time scaling. |[arXiv](https://arxiv.org/abs/2501.19393) | | LLM |
 - LangChain
 - OpenDevin
Game (Agent)
- Tool (AI LLM)
 - XAgent
 - AutoGen - Gen Large Language Model Applications. |[arXiv](https://arxiv.org/abs/2308.08155) | | Agent |
 - behaviac
 - Biomes
 - Buffer of Thoughts - Augmented Reasoning with Large Language Models. |[arXiv](https://arxiv.org/abs/2406.04271) | | Agent |
 - Byzer-Agent
 - Dify - source LLM app building platform. | | | Agent |
 - everything-ai - powered and local chatbot assistant🤖. | | | Agent |
 - fabric - source framework for augmenting humans using AI. | | | Agent |
 - FastGPT - based platform built on the LLM. | | | Agent |
 - fastRAG
 - GameAISDK - based game AI automation framework. | | | Framework |
 - Generative Agents
 - Cat Town - powered simulation with cats. | | | Agent |
 - CharacterGLM
 - Cradle
 - KwaiAgents - seeking agent system with Large Language Models (LLMs). |[arXiv](https://arxiv.org/abs/2312.04889) | | Agent |
 - LangChain
 - gigax - powered NPCs. | | | Game |
 - HippoRAG - Term Memory for Large Language Models. |[arXiv](https://arxiv.org/abs/2405.14831) | | Agent |
 - Interactive LLM Powered NPCs - source project that completely transforms your interaction with non-player characters (NPCs) in any game! | | | Game |
 - IoA - source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | | | Agent |
 - LangGraph Studio
 - LARP - Agent Role Play for open-world games. |[arXiv](https://arxiv.org/abs/2312.17653) | | Agent |
 - LlamaIndex
 - MindSearch - based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). | | | Agent |
 - Mixture of Agents (MoA) - of-Agents Enhances Large Language Model Capabilities. |[arXiv](https://arxiv.org/abs/2406.04692) | | Agent |
 - MuG Diffusion
 - OpenAgents
 - Pipecat
 - Qwen-Agent - Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. | | | Agent |
 - Ragas
 - Translation Agent
 - Video2Game - time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. |[arXiv](https://arxiv.org/abs/2404.09833) | | Game |
 - WebDesignAgent
 - AgentBench
 - Agent Group Chat
 - AgentScope - empowered multi-agent applications in an easier way. |[arXiv](https://arxiv.org/abs/2402.14034) | | Agent |
 - AgentSims - Source Sandbox for Large Language Model Evaluation. | | | Agent |
 - AI Town
 - anime.gf
 - AutoAgents
 - Astrocade
 - CogAgent - source visual language model improved based on CogVLM. |[arXiv](https://arxiv.org/abs/2312.08914) | | Agent |
 - Digital Life Project
 - Moonlander.ai
 - OmAgent
 - Opus
 - SIMA
 - StoryGames.ai
 - V-IRL
 - Agent K - evolving and modular. | | | Agent |
 - RPBench-Auto - playing. | | | Game |
 - MMRole - Playing Agents. |[arXiv](https://arxiv.org/abs/2408.04203v1) | | Agent |
 - TaskGen - based agentic framework building on StrictJSON outputs by LLM agents. | | | Agent |
 - Twitter
 - TEN Agent - time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. | | | Agent |
 - GameGen-O - O: Open-world Video Game Generation. | | | Game |
 - GameNGen - Time Game Engines. |[arXiv](https://arxiv.org/abs/2408.14837) | | Game |
 - Unbounded
 - crewAI - playing, autonomous AI agents. | | | Agent |
 - AgentSims - Source Sandbox for Large Language Model Evaluation. | | | Agent |
 - ChatDev
 - Atomic Agents
 - GenAgent - Case Studies on ComfyUI. |[arXiv](https://arxiv.org/abs/2409.01392) | | Agent |
 - Oasis
 - Langflow - flow to provide an effortless way to experiment and prototype flows. | | | Agent |
 - Genesis
 - Agent Laboratory
 - SWE-agent
Avatar
- Tool (AI LLM)
 - Ditto - Space Diffusion for Controllable Realtime Talking Head Synthesis. |[arXiv](https://arxiv.org/abs/2411.19509) | | Avatar |
 - RodinHD - Fidelity 3D Avatar Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.06938) | | Avatar |
 - AniPortrait - Driven Synthesis of Photorealistic Portrait Animations. |[arXiv](https://arxiv.org/abs/2403.17694) | | Avatar |
 - CALM
 - ChatdollKit
 - DreamTalk
 - Duix - Silicon-Based Digital Human SDK 🌐🤖 | | | Avatar |
 - EchoMimic - Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar |
 - EMOPortraits - enhanced Multimodal One-shot Head Avatars. | | | Avatar |
 - E3 Gen
 - GeneAvatar - Aware Volumetric Head Avatar Editing from a Single Image. |[arXiv](https://arxiv.org/abs/2404.02152) | | Avatar |
 - GeneFace++ - Time 3D Talking Face Generation. | | | Avatar |
 - Hallo - Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar |
 - IntrinsicAvatar
 - Linly-Talker
 - LivePortrait
 - MotionGPT - language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar |
 - MusePose - Driven Image-to-Video Framework for Virtual Human Generation. | | | Avatar |
 - MuseTalk - Time High Quality Lip Synchorization with Latent Space Inpainting. | | | Avatar |
 - MuseV - length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | | | Avatar |
 - Portrait4D - Shot 4D Head Avatar Synthesis using Synthetic Data. |[arXiv](https://arxiv.org/abs/2311.18729) | | Avatar |
 - StyleAvatar3D - Text Diffusion Models for High-Fidelity 3D Avatar Generation. |[arXiv](https://arxiv.org/abs/2305.19012) | | Avatar |
 - Topo4D - Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. |[arXiv](https://arxiv.org/abs/2406.00440) | | Avatar |
 - UnityAIWithChatGPT
 - Vid2Avatar - supervised Scene Decomposition. |[arXiv](https://arxiv.org/abs/2302.11566) | | Avatar |
 - ChatAvatar
 - HeadSculpt
 - Ready Player Me
 - Text2Control3D - Guided Text-to-Image Diffusion Model. |[arXiv](https://arxiv.org/abs/2309.03550) | | Avatar |
 - VLOGGER
 - Wild2Avatar
 - ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. |[arXiv](https://arxiv.org/abs/2407.21686) | | Avatar |
 - Hallo2 - Duration and High-Resolution Audio-Driven Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2410.07718) | | Avatar |
 - EmoVOCA - Driven Emotional 3D Talking Heads. |[arXiv](https://arxiv.org/abs/2403.12886) | | Avatar |
Code
- Tool (AI LLM)
 - Stable Code 3B
 - Code Llama
 - CodeGeeX
 - CodeGeeX2
 - CodeGeeX4
 - AI Code Translator
 - aiXcoder-7B - 7B Code Large Language Model. | | | Code |
 - bloop
 - Chapyter
 - CodeGen - source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. |[arXiv](https://arxiv.org/abs/2203.13474) | | Code |
 - CodeGen2
 - CodeTF - stop Transformer Library for State-of-the-art Code LLM. | | | Code |
 - CodeT5
 - SoTaNa - Source Software Development Assistant. |[arXiv](https://arxiv.org/abs/2308.13416) | | Code |
 - StarCoder
 - StarCoder 2
 - UnityGen AI - powered code generation plugin for Unity. | | Unity | Code |
 - OpenAI Codex - 3. | | | Code |
 - RobloxScripterAI - powered code generation tool for Roblox. | | Roblox | Code |
 - Stable Code 3B
 - DeepSeek Coder
 - Void
 - PandasAI
 - UnityGen AI - powered code generation plugin for Unity. | | Unity | Code |
Image
- Tool (AI LLM)
 - Lumina-Image 2.0 - Image 2.0 : A Unified and Efficient Image Generative Model. | | | Image |
 - HivisionIDPhotos
 - StableStudio
 - StreamDiffusion - Level Solution for Real-Time Interactive Generation. | | | Image |
 - SyncDreamer - consistent Images from a Single-view Image. |[arXiv](https://arxiv.org/abs/2309.03453) | | Image |
 - UltraEdit - based Fine-Grained Image Editing at Scale. |[arXiv](https://arxiv.org/abs/2407.05282) | | Image |
 - UltraPixel - High-Resolution Image Synthesis to New Peaks. |[arXiv](https://arxiv.org/abs/2407.02158) | | Image |
 - Unity ML Stable Diffusion
 - Depth map library and poser - diffusion-webui. | | | Image |
 - Disco Diffusion
 - AnyDoor - shot Object-level Image Customization. |[arXiv](https://arxiv.org/abs/2307.09481) | | Image |
 - AnyText
 - Blender-ControlNet
 - BriVL
 - CLIPasso
 - ComfyUI
 - ConceptLab
 - ControlNet
 - DeepFloyd IF
 - Depth Anything V2
 - DragGAN - based Manipulation on the Generative Image Manifold. |[arXiv](https://arxiv.org/abs/2305.10973) | | Image |
 - DWPose - body Pose Estimation with Two-stages Distillation. |[arXiv](https://arxiv.org/abs/2307.15880) | | Image |
 - EasyPhoto
 - Follow-Your-Click - domain Regional Image Animation via Short Prompts. |[arXiv](https://arxiv.org/abs/2403.08268) | | Image |
 - Fooocus
 - GIFfusion
 - Grounded-Segment-Anything
 - Hua
 - Hunyuan-DiT - Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. |[arXiv](https://arxiv.org/abs/2405.08748) | | Image |
 - IC-Light - Light is a project to manipulate the illumination of images. | | | Image |
 - img2img-turbo - Step Image-to-Image with SD-Turbo. | | | Image |
 - KOALA - Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. | | | Image |
 - Kolors - to-Image Synthesis. | | | Image |
 - LaVi-Bridge - to-Image Generation. |[arXiv](https://arxiv.org/abs/2403.07860) | | Image |
 - LlamaGen
 - MIGC - Instance Generation Controller for Text-to-Image Synthesis. |[arXiv](https://arxiv.org/abs/2402.05408) | | Image |
 - MimicBrush - shot Image Editing with Reference Imitation. |[arXiv](https://arxiv.org/abs/2406.07547) | | Image |
 - Omost
 - Openpose Editor - diffusion-webui. | | | Image |
 - Outfit Anyone - high quality virtual try-on for Any Clothing and Any Person. | | | Image |
 - PaintsUndo
 - PuLID
 - Rich-Text-to-Image - to-Image Generation with Rich Text. |[arXiv](https://arxiv.org/abs/2304.06720) | | Image |
 - RPG-DiffusionMaster - to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | | | Image |
 - SEED-Story - Story: Multimodal Long Story Generation with Large Language Model. |[arXiv](https://arxiv.org/abs/2407.08683) | | Image |
 - sd-webui-controlnet
 - SDXS - Time One-Step Latent Diffusion Models with Image Conditions. | | | Image |
 - Stable.art
 - Stable Cascade
 - Stable Diffusion - to-image diffusion model. | | | Image |
 - stable-diffusion.cpp
 - Stable Diffusion web UI
 - Stable Diffusion web UI - based UI for Stable Diffusion. | | | Image |
 - Stable Diffusion WebUI Chinese - diffusion-webui. | | | Image |
 - LayerDiffusion
 - AutoStudio - turn Interactive Image Generation. |[arXiv](https://arxiv.org/abs/2406.01388) | | Image |
 - ClipDrop
 - DALL·E 2
 - Dashtoon Studio
 - DeepAI
 - Diffuse to Choose - All. |[arXiv](https://arxiv.org/abs/2401.13795) | | Image |
 - Draw Things - assisted image generation in Your Pocket. | | | Image |
 - Ideogram
 - Imagen
 - Img2Prompt
 - Lexica
 - MetaShoot
 - Midjourney
 - PhotoMaker
 - Photoroom
 - Prompt.Art
 - Segment Anything
 - SDXL-Lightning
 - Stable Diffusion XL Turbo - Time Text-to-Image Generation. | | | Image |
 - StyleDrop - To-Image Generation in Any Style. |[arXiv](https://arxiv.org/abs/2306.00983) | | Image |
 - Vispunk Visions - to-Image generation platform. | | | Image |
 - Komiko - powered storytelling platform that lets you create original characters, comics, and animations with ease. | | | Comic |
 - CatVTON - On with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.15886) | | Image |
 - Flux - to-image and image-to-image with our Flux latent rectified flow transformers. | | | Image |
 - Lumina-mGPT - mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. |[arXiv](https://arxiv.org/abs/2408.02657) | | Image |
 - OmniGen
 - StoryMaker - to-image Generation. |[arXiv](https://arxiv.org/abs/2409.12576) | | Image |
 - CSGO - Style Composition in Text-to-Image Generation. |[arXiv](https://arxiv.org/abs/2408.16766) | | Image |
 - Stable Diffusion 3.5
 - MakeAnything - Domain Procedural Sequence Generation. |[arXiv](https://arxiv.org/abs/2502.01572) | | Image |
 - Stable Diffusion XL Turbo - Time Text-to-Image Generation. | | | Image |
 - Stable Doodle - to-image tool that converts a simple drawing into a dynamic image. | | | Image |
 - MetaShoot
 - Img2Prompt
 - Infinity - Resolution Image Synthesis. |[arXiv](https://arxiv.org/abs/2412.04431) | | Image |
 - KREA - powered design tool. | | | Image |
 - Segment Anything Model 2 (SAM 2)
3D Model
- Tool (AI LLM)
 - Blender-GPT - in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model |
 - Anything-3D - Anything + 3D. Let's lift the anything to 3D. |[arXiv](https://arxiv.org/abs/2304.10261) | | Model |
 - Any2Point - modality Large Models for Efficient 3D Understanding. |[arXiv](https://arxiv.org/abs/2404.07989) | | 3D |
 - BlenderGPT - 4. | | Blender | Model |
 - CF-3DGS - Free 3D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.07504) | | 3D |
 - CharacterGen - View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D |
 - chatGPT-maya
 - CityDreamer
 - DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
 - DreamGaussian4D
 - DUSt3R
 - GALA3D - to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D |
 - GaussCtrl - View Consistent Text-Driven 3D Gaussian Splatting Editing. |[arXiv](https://arxiv.org/abs/2403.08733) | | 3D |
 - GaussianCube
 - GaussianDreamer
 - HiFA - fidelity Text-to-3D with advance Diffusion guidance. | | | Model |
 - HoloDreamer
 - Infinigen
 - Interactive3D
 - Isotropic3D - to-3D Generation Based on a Single CLIP Embedding. | | | 3D |
 - LION
 - Make-It-3D - Fidelity 3D Creation from A Single Image with Diffusion Prior. |[arXiv](https://arxiv.org/abs/2303.14184) | | Model |
 - MVDream - view Diffusion for 3D Generation. |[arXiv](https://arxiv.org/abs/2308.16512) | | 3D |
 - NVIDIA Instant NeRF
 - Paint3D - Less Texture Diffusion Models. |[arXiv](https://arxiv.org/abs/2312.13913) | | 3D |
 - PAniC-3D - view 3D Reconstruction from Portraits of Anime Characters. |[arXiv](https://arxiv.org/abs/2303.14587) | | Model |
 - Point·E
 - Shap-E
 - Stable Dreamfusion - to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | | | Model |
 - 3DTopia - to-3D Generation within 5 Minutes. |[arXiv](https://arxiv.org/abs/2403.02234) | | 3D |
 - threestudio
 - TripoSR - of-the-art open-source model for fast feedforward 3D reconstruction from a single image. |[arXiv](https://arxiv.org/abs/2403.02151) | | Model |
 - Unique3D - Quality and Efficient 3D Mesh Generation from a Single Image. |[arXiv](https://arxiv.org/abs/2405.20343) | | 3D |
 - UnityGaussianSplatting
 - ViVid-1-to-3
 - Wonder3D - Domain Diffusion. |[arXiv](https://arxiv.org/abs/2310.15008) | | 3D |
 - Zero-1-to-3 - shot One Image to 3D Object. |[arXiv](https://arxiv.org/abs/2303.11328) | | Model |
 - Blockade Labs - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model |
 - CSM
 - Dash
 - GenieLabs - UGC. | | | 3D |
 - Instruct-NeRF2NeRF
 - Luma AI
 - LATTE3D - scale Amortized Text-To-Enhanced3D Synthesis. |[arXiv](https://arxiv.org/abs/2403.15385) | | 3D |
 - lumine AI - Powered Creativity. | | | 3D |
 - Meshy
 - Mootion
 - One-2-3-45 - Shape Optimization. |[arXiv](https://arxiv.org/abs/2306.16928) | | Model |
 - ProlificDreamer - Fidelity and diverse Text-to-3D generation with Variational score Distillation. |[arXiv](https://arxiv.org/abs/2305.16213) | | Model |
 - Sloyd
 - Spline AI
 - SV3D - view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. |[arXiv](https://arxiv.org/abs/2403.12008) | | 3D |
 - Tafi
 - 3D-GPT
 - 3Dpresso
 - Voxcraft - to-Use 3D Models with AI. | | | 3D |
 - SF3D - unwrapping and Illumination Disentanglement. |[arXiv](https://arxiv.org/abs/2408.00653) | | 3D |
 - Hunyuan3D-1.0 - 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. |[arXiv](https://arxiv.org/abs/2411.02293) | | 3D |
 - BlenderMCP
 - Hunyuan3D 2.0
 - 3DTopia-XL - XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. |[arXiv](https://arxiv.org/abs/2409.12957) | | 3D |
 - Animate3D - view Video Diffusion. |[arXiv](https://arxiv.org/abs/2407.11398) | | 3D |
 - Edify 3D - Quality 3D Asset Generation. |[arXiv](https://arxiv.org/abs/2411.07135) | | 3D |
 - DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
Video
- Tool (AI LLM)
 - DreamCinema
 - ViewCrafter - fidelity Novel View Synthesis. |[arXiv](https://arxiv.org/abs/2409.02048) | | Video |
 - 360DVD - Degree Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2401.06578) | | Video |
 - ART•V - Regressive Text-to-Video Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.18834) | | Video |
 - BackgroundRemover
 - CoDeF
 - CogVLM - source visual language model (VLM). | | | Visual |
 - Diffutoon - Resolution Editable Toon Shading via Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.16224) | | Video |
 - dolphin
 - EDGE - plausible dances while remaining faithful to arbitrary input music. |[arXiv](https://arxiv.org/abs/2211.10658) | | Video |
 - EMO - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. |[arXiv](https://arxiv.org/abs/2402.17485) | | Video |
 - Hotshot-XL - XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | | | Video |
 - LaVie - Quality Video Generation with Cascaded Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2309.15103) | | Video |
 - LVDM - Fidelity Long Video Generation. |[arXiv](https://arxiv.org/abs/2211.13221) | | Video |
 - MicroCinema - and-Conquer Approach for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2311.18829) | | Video |
 - MOFA-Video - to-Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2405.20222) | | Video |
 - MoneyPrinterTurbo
 - Mora
 - MotionDirector - to-Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.08465) | | Video |
 - Motionshop
 - Mov2mov - diffusion-webui. | | | Video |
 - Open-Sora
 - Open-Sora - Sora Plan. | | | Video |
 - Reuse and Diffuse - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.03549) | | Video |
 - ShortGPT
 - Show-1 - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.15818) | | Video |
 - Snap Video - to-Video Synthesis. |[arXiv](https://arxiv.org/abs/2402.14797) | | Video |
 - SoraWebui - source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | | | Video |
 - StableVideo - driven Consistency-aware Diffusion Video Editing. | | | Video |
 - Stable Video Diffusion - to-Video. | | | Video |
 - StoryDiffusion - Attention for Long-Range Image and Video Generation. |[arXiv](https://arxiv.org/abs/2405.01434) | | Video |
 - StreamingT2V
 - StyleCrafter - to-Video Generation with Style Adapter. |[arXiv](https://arxiv.org/abs/2312.00330) | | Video |
 - Text2Video-Zero - to-Image Diffusion Models are Zero-Shot Video Generators. |[arXiv](https://arxiv.org/abs/2303.13439) | | Video |
 - Track-Anything - Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. |[arXiv](https://arxiv.org/abs/2304.11968) | | Video |
 - Tune-A-Video - Shot Tuning of Image Diffusion Models for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2212.11565) | | Video |
 - VGen
 - Video-ChatGPT - ChatGPT is a video conversation model capable of generating meaningful conversation about videos. |[arXiv](https://arxiv.org/abs/2306.05424) | | Video |
 - VideoElevator - to-Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2403.05438) | | Video |
 - VideoGen - Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.00398) | | Video |
 - VideoMamba
 - Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video |
 - VisualRWKV - enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | | | Visual |
 - V-JEPA
 - Anything in Any Scene
 - Assistive
 - AtomoVideo - to-Video Generation. |[arXiv](https://arxiv.org/abs/2403.01800) | | Video |
 - Boximator
 - CogVideo
 - Decohere
 - Descript
 - DomoAI
 - DynamiCrafter - domain Images with Video Diffusion Priors. |[arXiv](https://arxiv.org/abs/2310.12190) | | Video |
 - Emu Video - to-Video Generation by Explicit Image Conditioning. | | | Video |
 - Etna
 - Fairy - Guided Video-to-Video Synthesis. | | | Video |
 - Follow Your Pose - Guided Text-to-Video Generation using Pose-Free Videos. |[arXiv](https://arxiv.org/abs/2304.01186) | | Video |
 - FullJourney
 - Gen-2 - modal AI system that can generate novel videos with text, images, or video clips. | | | Video |
 - Generative Dynamics
 - Genmo
 - GenTron
 - HiGen - temporal Decoupling for Text-to-Video generation. | | | Video |
 - Imagen Video - resolution models. | | | Video |
 - InstructVideo
 - I2VGen-XL - Quality Image-to-Video Synthesis via Cascaded Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.04145) | | Video |
 - LTX Studio - driven filmmaking platform for creators, marketers, filmmakers and studios. | | | Video |
 - Lumiere - Time Diffusion Model for Video Generation. |[arXiv](https://arxiv.org/abs/2401.12945) | | Video |
 - MagicVideo
 - MagicVideo-V2 - Stage High-Aesthetic Video Generation. |[arXiv](https://arxiv.org/abs/2401.04468) | | Video |
 - Magic Hour
 - MAGVIT-v2
 - MAGVIT
 - Make-A-Video - A-Video is a state-of-the-art AI system that generates videos from text. |[arXiv](https://arxiv.org/abs/2209.14792) | | Video |
 - Make Pixels Dance - Dynamic Video Generation. |[arXiv](https://arxiv.org/abs/2311.10982) | | Video |
 - Make-Your-Video
 - MobileVidFactory - Based Social Media Video Generation for Mobile Devices from Text. | | | Video |
 - Moonvalley - to-video generative AI model. | | | Video |
 - Morph Studio - to-Video AI Magic, manifest your creativity through your prompt. | | | Video |
 - MotionCtrl
 - MovieFactory
 - Neural Frames
 - NeverEnds
 - Phenaki
 - Pixeling - realistic, and extremely controllable visual content including images, videos and 3D models. | | | Video |
 - PixVerse - taking videos with AI. | | | Video |
 - Pollinations
 - Sora
 - TATS - Agnostic VQGAN and Time-Sensitive Transformer. | | | Video |
 - TF-T2V - to-Video Generation with Text-free Videos. |[arXiv](https://arxiv.org/abs/2312.15770) | | Video |
 - TwelveLabs
 - UniVG - modal Video Generation. | | | Video |
 - VideoComposer
 - VideoCrafter1 - Quality Video Generation. |[arXiv](https://arxiv.org/abs/2310.19512) | | Video |
 - VideoCrafter2 - Quality Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.09047) | | Video |
 - VideoDrafter - Consistent Multi-Scene Video Generation with LLM. |[arXiv](https://arxiv.org/abs/2401.01256) | | Video |
 - VideoFactory - to-Video Generation. | | | Video |
 - VideoLCM
 - Video LDMs - resolution Video Synthesis with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2304.08818) | | Video |
 - VideoPoet - shot video generation. |[arXiv](https://arxiv.org/abs/2312.14125) | | Video |
 - Vispunk Motion
 - W.A.L.T
 - Zeroscope - to-Video. | | | Video |
 - Genie
 - CogVideoX - source version of the video generation model, which is homologous to 清影. | | | Video |
 - Tora - oriented Diffusion Transformer for Video Generation. |[arXiv](https://arxiv.org/abs/2407.21705) | | Video |
 - MIMO
 - Follow-Your-Canvas - Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. |[arXiv](https://arxiv.org/abs/2409.01055) | | Video |
 - FullJourney
 - HunyuanVideo
 - Ruyi - to-video model capable of generating cinematic-quality videos at a resolution of 768, with a frame rate of 24 frames per second, totaling 5 seconds and 120 frames. | | | Video |
 - Vchitect-2.0 - 2.0: Parallel Transformer for Scaling Up Video Diffusion Models. | | | Video |
 - SkyReels-A1 - A1: Expressive Portrait Animation in Video Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2502.10841) | | Video |
 - SkyReels-V1 - Centric Video Foundation Model. | | | Video |
 - LTX-Video - Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. | | | Video |
 - Step-Video-T2V - Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model. |[arXiv](https://arxiv.org/abs/2502.10248) | | Video |
 - Wan2.1 - Scale Video Generative Models. | | | Video |
 - Moonvalley - to-video generative AI model. | | | Video |
 - Video-LLaVA
 - MotionClone - Free Motion Cloning for Controllable Video Generation. |[arXiv](https://arxiv.org/abs/2406.05338) | | Video |
 - Pika Labs - making experience with AI. | | | Video |
Music
- Tool (AI LLM)
 - FluxMusic - to-Music Generation with Rectified Flow Transformer. | [arXiv](https://arxiv.org/abs/2409.00587) | | Music |
 - ChatMusician
 - Chord2Melody
 - Diff-BGM
 - Jukebox
 - Magenta
 - MusicGen
 - AIVA
 - Amper Music
 - Boomy
 - GPTAbleton - osc. | | | Music |
 - HeyMusic.AI
 - Image to Music
 - JEN-1 - Guided Universal Music Generation with Omnidirectional Diffusion Models. | | | Music |
 - MeLoDy
 - Mubert
 - MuseNet - minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. | | | Music |
 - MusicLDM - to-Music Generation Using Beat-Synchronous Mixup Strategies. | [arXiv](https://arxiv.org/abs/2308.01546) | | Music |
 - MusicLM
 - Sonauto
 - SoundRaw
 - Soundry AI - to-sound and infinite sample packs. | | | Music |
 - JEN-1 - Guided Universal Music Generation with Omnidirectional Diffusion Models. | | | Music |
 - Riffusion App - time music generation with stable diffusion. | | | Music |
 - YuE - song Generation Foundation Model, something similar to Suno.ai but open. | | | Music |
Writer
- Tool (AI LLM)
 - AI-Writer - trained generative model. | | | Writer |
 - Notebook.ai
 - Novel - style WYSIWYG editor with AI-powered autocompletions. | | | Writer |
 - NovelAI
Shader
- Tool (AI LLM)
 - AI Shader - powered shader generator for Unity. | | Unity | Shader |
Animation
- Tool (AI LLM)
 - Animate Anyone - to-Video Synthesis for Character Animation. |[arXiv](https://arxiv.org/abs/2311.17117) | | Animation |
 - AnimateAnything - Grained Open Domain Image Animation with Motion Guidance. |[arXiv](https://arxiv.org/abs/2311.12886) | | Animation |
 - AnimateLCM
 - AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
 - AnimationGPT
 - DreaMoving
 - FaceFusion
 - GeneFace - Fidelity Audio-Driven 3D Talking Face Synthesis. |[arXiv](https://arxiv.org/abs/2301.13430) | | Animation |
 - MagicAnimate
 - SadTalker-Video-Lip-Sync
 - Wav2Lip - syncing Videos In The Wild. |[arXiv](https://arxiv.org/abs/2008.10010) | | Animation |
 - Deforum
 - FreeInit
 - ID-Animator - Shot Identity-Preserving Human Video Generation. |[arXiv](https://arxiv.org/abs/2404.15275) | | Animation |
 - NUWA-Infinity - Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | | | Animation |
 - Omni Animation
 - PIA - and-Play Modules in Text-to-Image Models. |[arXiv](https://arxiv.org/abs/2312.13964) | | Animation |
 - Stable Animation - to-animation tool for developers. | | | Animation |
 - Wonder Studio - action scene. | | | Animation |
 - DrawingSpinUp
 - Animate-X - X: Universal Character Image Animation with Enhanced Motion Representation. |[arXiv](https://arxiv.org/abs/2410.10306) | | Animation |
 - NUWA-XL
 - Omni Animation
 - ToonCrafter
 - AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
Visual
- Tool (AI LLM)
 - Cambrian-1 - 1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. |[arXiv](https://arxiv.org/abs/2406.16860) | | Multimodal LLMs |
 - CogVLM2 - level open-source multi-modal model based on Llama3-8B. | | | Visual |
 - EVF-SAM - SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. |[arXiv](https://arxiv.org/abs/2406.20076) | | Visual |
 - Kangaroo - Language Model Supporting Long-context Video Input. | | | Visual |
 - LLaVA++ - 3 and Phi-3. | | | Visual |
 - LongVA
 - MotionLLM
 - PLLaVA - free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual |
 - Qwen-VL - Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual |
 - ShareGPT4V - modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual |
 - SOLO - Language Modeling. |[arXiv](https://arxiv.org/abs/2407.06438) | | Visual |
 - Video-CCAM - CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | | | Visual |
 - VideoLLaMA 2 - Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual |
 - Video-MME - Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual |
 - Vitron - level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual |
 - VILA - training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual |
 - CoTracker
 - FaceHi
 - LGVI - Driven Video Inpainting via Multimodal Large Language Models. | | | Visual |
 - MaskViT - Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual |
 - LLaVA-OneVision - OneVision: Easy Visual Task Transfer. |[arXiv](https://arxiv.org/abs/2408.03326) | | Visual |
 - Sapiens
 - InternLM-XComposer2 - XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2404.06512) | | Visual |
 - MoE-LLaVA - Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual |
 - VideoLLaMA 3
Audio
- Tool (AI LLM)
 - AcademiCodec
 - Amphion - Source Audio, Music, and Speech Generation Toolkit. |[arXiv](https://arxiv.org/abs/2312.09911) | | Audio |
 - ArchiSound
 - AudioEditing - Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. |[arXiv](https://arxiv.org/abs/2402.10009) | | Audio |
 - Audiogen Codec
 - AudioGPT
 - AudioLCM - to-Audio Generation with Latent Consistency Models. |[arXiv](https://arxiv.org/abs/2406.00356v1) | | Audio |
 - AudioLDM 2 - supervised Pretraining. |[arXiv](https://arxiv.org/abs/2308.05734) | | Audio |
 - Auffusion - to-Audio Generation. |[arXiv](https://arxiv.org/abs/2401.01044) | | Audio |
 - CTAG - to-Audio Generation via Synthesizer Programming. | | | Audio |
 - Make-An-Audio 3 - based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio |
 - NeuralSound - based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio |
 - Qwen2-Audio - Audio chat & pretrained large audio language model proposed by Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2407.10759) | | Audio |
 - SEE-2-SOUND - Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio |
 - TANGO - to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio |
 - VTA-LDM - to-Audio Generation with Hidden Alignment. |[arXiv](https://arxiv.org/abs/2407.07464) | | Audio |
 - WavJourney
 - Audiobox
 - AudioLDM - to-Audio Generation with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12503) | | Audio |
 - MAGNeT - Autoregressive Transformer. | | | Audio |
 - Make-An-Audio - To-Audio Generation with Prompt-Enhanced Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12661) | | Audio |
 - OptimizerAI
 - SoundStorm
 - Stable Audio - Conditioned Latent Audio Diffusion. | | | Audio |
 - Stable Audio Open - length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio |
 - FoleyCrafter
 - SyncFusion - synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio |
Singing Voice
- Tool (AI LLM)
Speech
- Tool (AI LLM)
 - Applio - friendly experience. | | | Speech |
 - Bark - Prompted Generative Audio Model. | | | Speech |
 - Bert-VITS2
 - ChatTTS
 - CosyVoice - lingual large voice generation model, providing inference, training and deployment full-stack ability. | | | Speech |
 - DEX-TTS - based EXpressive Text-to-Speech with Style Modeling on Time Variability. | [arXiv](https://arxiv.org/abs/2406.19135) | | Speech |
 - EmotiVoice - Voice and Prompt-Controlled TTS Engine. | | | Speech |
 - Glow-TTS - to-Speech via Monotonic Alignment Search. | [arXiv](https://arxiv.org/abs/2005.11129) | | Speech |
 - GPT-SoVITS - shot Voice Conversion and Text-to-Speech WebUI. | | | Speech |
 - MahaTTS - Source Large Speech Generation Model. | | | Speech |
 - Matcha-TTS
 - MeloTTS - quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | | | Speech |
 - MetaVoice-1B - level speech intelligence. | | | Speech |
 - One-Shot-Voice-Cloning - TTS. | | | Speech |
 - OpenVoice
 - OverFlow
 - RealtimeTTS - of-the-art text-to-speech (TTS) library designed for real-time applications. | | | Speech |
 - SenseVoice
 - SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech |
 - speech-to-text-gpt3-unity
 - StableTTS - generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | | | Speech |
 - StyleTTS 2 - Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | [arXiv](https://arxiv.org/abs/2306.07691) | | Speech |
 - TorToiSe-TTS - voice TTS system trained with an emphasis on quality. | | | Speech |
 - TTS Generation WebUI
 - Voicebox - Guided Multilingual Universal Speech Generation at Scale. | [arXiv](https://arxiv.org/abs/2306.15687) | | Speech |
 - VoiceCraft - Shot Speech Editing and Text-to-Speech in the Wild. | | | Speech |
 - Whisper - purpose speech recognition model. | | | Speech |
 - X-E-Speech - Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | | | Speech |
 - XTTS - to-Speech generation. | | | Speech |
 - YourTTS - Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | [arXiv](https://arxiv.org/abs/2112.02418) | | Speech |
 - ZMM-TTS - shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | [arXiv](https://arxiv.org/abs/2312.14398) | | Speech |
 - Audyo
 - CLAPSpeech - Audio Pre-Training. | [arXiv](https://arxiv.org/abs/2305.10763) | | Speech |
 - Fliki
 - LOVO - to AI Voice Generator & Text to Speech platform for thousands of creators. | | | Speech |
 - Narakeet
 - VALL-E - Shot Text to Speech Synthesizers. | [arXiv](https://arxiv.org/abs/2301.02111) | | Speech |
 - VALL-E X - Lingual Neural Codec Language Modeling | [arXiv](https://arxiv.org/abs/2303.03926) | | Speech |
 - Vocode - source library for building voice-based LLM applications. | | | Speech |
 - tortoise.cpp - tts. | | | Speech |
 - Mini-Omni - Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | [arXiv](https://arxiv.org/abs/2408.16725) | | Speech |
 - GLM-4-Voice - 4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. | | | Speech |
 - Step-Audio - Audio: Unified Understanding and Generation in Intelligent Speech Interaction. | [arXiv](https://arxiv.org/abs/2502.11946) | | Speech |
Analytics
- Tool (AI LLM)
 - Ludo.ai

Programming Languages

Python 332 Jupyter Notebook 35 TypeScript 24 C# 16 JavaScript 12 C++ 9 HTML 8 Go 3 Cuda 2 Shell 1

ai-game-devtools

<span id="texture">Texture</span>

<span id="tool">Tool (AI LLM)</span>

Project List

<span id="tool">Tool (AI LLM)</span>

<span id="game">Game (Agent)</span>

<span id="tool">Tool (AI LLM)</span>

<span id="avatar">Avatar</span>

<span id="tool">Tool (AI LLM)</span>

<span id="code">Code</span>

<span id="tool">Tool (AI LLM)</span>

<span id="image">Image</span>

<span id="tool">Tool (AI LLM)</span>

<span id="model">3D Model</span>

<span id="tool">Tool (AI LLM)</span>

<span id="video">Video</span>

<span id="tool">Tool (AI LLM)</span>

<span id="music">Music</span>

<span id="tool">Tool (AI LLM)</span>

<span id="writer">Writer</span>

<span id="tool">Tool (AI LLM)</span>

<span id="shader">Shader</span>

<span id="tool">Tool (AI LLM)</span>

<span id="animation">Animation</span>

<span id="tool">Tool (AI LLM)</span>

<span id="visual">Visual</span>

<span id="tool">Tool (AI LLM)</span>

<span id="audio">Audio</span>

<span id="tool">Tool (AI LLM)</span>

<span id="voice">Singing Voice</span>

<span id="tool">Tool (AI LLM)</span>

<span id="speech">Speech</span>

<span id="tool">Tool (AI LLM)</span>

<span id="speech">Analytics</span>

<span id="tool">Tool (AI LLM)</span>