Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

ai-game-devtools

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
https://github.com/Yuan-ManX/ai-game-devtools

Last synced: 5 days ago
JSON representation

  • <span id="texture">Texture</span>

    • <span id="tool">Tool (AI LLM)</span>

      • LLaMA-Mesh - Mesh: Unifying 3D Mesh Generation with Language Models. |[arXiv](https://arxiv.org/abs/2411.09595) | | Mesh |
      • CRM
      • Paint-it - to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture |
      • Text2Tex - driven texture Synthesis via Diffusion Models. |[arXiv](https://arxiv.org/abs/2303.11396) | | Texture |
      • X-Mesh - Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. |[arXiv](https://arxiv.org/abs/2303.15764) | | Texture |
      • DreamMat - quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. |[arXiv](https://arxiv.org/abs/2405.17176) | | Texture |
      • DreamSpace - Driven Panoramic Texture Propagation. | | | Texture |
      • Dream Textures - in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | | Blender | Texture |
      • InstructHumans
      • InteX - to-Texture Synthesis via Unified Depth-aware Inpainting. |[arXiv](https://arxiv.org/abs/2403.11878) | | Texture |
      • MaterialSeg3D
      • MeshAnything
      • Neuralangelo - Fidelity Neural Surface Reconstruction. |[arXiv](https://arxiv.org/abs/2306.03092) | | Texture |
      • Polycam
      • TexFusion - Guided Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.13772) | | Texture |
      • Texture Lab - generated texures. You can generate your own with a text prompt. | | | Texture |
      • With Poly
  • <span id="game">Game (Agent)</span>

    • <span id="tool">Tool (AI LLM)</span>

      • AutoGen - Gen Large Language Model Applications. |[arXiv](https://arxiv.org/abs/2308.08155) | | Agent |
      • behaviac
      • Biomes
      • Buffer of Thoughts - Augmented Reasoning with Large Language Models. |[arXiv](https://arxiv.org/abs/2406.04271) | | Agent |
      • Byzer-Agent
      • Dify - source LLM app building platform. | | | Agent |
      • everything-ai - powered and local chatbot assistant🤖. | | | Agent |
      • fabric - source framework for augmenting humans using AI. | | | Agent |
      • FastGPT - based platform built on the LLM. | | | Agent |
      • fastRAG
      • GameAISDK - based game AI automation framework. | | | Framework |
      • Generative Agents
      • Cat Town - powered simulation with cats. | | | Agent |
      • CharacterGLM
      • Cradle
      • KwaiAgents - seeking agent system with Large Language Models (LLMs). |[arXiv](https://arxiv.org/abs/2312.04889) | | Agent |
      • LangChain
      • gigax - powered NPCs. | | | Game |
      • HippoRAG - Term Memory for Large Language Models. |[arXiv](https://arxiv.org/abs/2405.14831) | | Agent |
      • Interactive LLM Powered NPCs - source project that completely transforms your interaction with non-player characters (NPCs) in any game! | | | Game |
      • IoA - source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | | | Agent |
      • LangGraph Studio
      • LARP - Agent Role Play for open-world games. |[arXiv](https://arxiv.org/abs/2312.17653) | | Agent |
      • LlamaIndex
      • MindSearch - based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). | | | Agent |
      • Mixture of Agents (MoA) - of-Agents Enhances Large Language Model Capabilities. |[arXiv](https://arxiv.org/abs/2406.04692) | | Agent |
      • MuG Diffusion
      • OpenAgents
      • Pipecat
      • Qwen-Agent - Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. | | | Agent |
      • Ragas
      • SWE-agent
      • Translation Agent
      • Video2Game - time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. |[arXiv](https://arxiv.org/abs/2404.09833) | | Game |
      • WebDesignAgent
      • ChatDev
      • XAgent
      • AgentBench
      • Agent Group Chat
      • AgentScope - empowered multi-agent applications in an easier way. |[arXiv](https://arxiv.org/abs/2402.14034) | | Agent |
      • AgentSims - Source Sandbox for Large Language Model Evaluation. | | | Agent |
      • AI Town
      • anime.gf
      • AutoAgents
      • Astrocade
      • CogAgent - source visual language model improved based on CogVLM. |[arXiv](https://arxiv.org/abs/2312.08914) | | Agent |
      • Digital Life Project
      • Moonlander.ai
      • OmAgent
      • Opus
      • SIMA
      • StoryGames.ai
      • V-IRL
      • Agent K - evolving and modular. | | | Agent |
      • RPBench-Auto - playing. | | | Game |
      • MMRole - Playing Agents. |[arXiv](https://arxiv.org/abs/2408.04203v1) | | Agent |
      • TaskGen - based agentic framework building on StrictJSON outputs by LLM agents. | | | Agent |
      • Twitter
      • TEN Agent - time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. | | | Agent |
      • GameGen-O - O: Open-world Video Game Generation. | | | Game |
      • GameNGen - Time Game Engines. |[arXiv](https://arxiv.org/abs/2408.14837) | | Game |
      • crewAI - playing, autonomous AI agents. | | | Agent |
      • Unbounded
      • GenAgent - Case Studies on ComfyUI. |[arXiv](https://arxiv.org/abs/2409.01392) | | Agent |
      • Oasis
      • LLama Agentic System
      • Langflow - flow to provide an effortless way to experiment and prototype flows. | | | Agent |
  • <span id="model">3D Model</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Blender-GPT - in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model |
      • Anything-3D - Anything + 3D. Let's lift the anything to 3D. |[arXiv](https://arxiv.org/abs/2304.10261) | | Model |
      • Any2Point - modality Large Models for Efficient 3D Understanding. |[arXiv](https://arxiv.org/abs/2404.07989) | | 3D |
      • BlenderGPT - 4. | | Blender | Model |
      • CF-3DGS - Free 3D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.07504) | | 3D |
      • CharacterGen - View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D |
      • chatGPT-maya
      • CityDreamer
      • DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
      • DreamGaussian4D
      • DUSt3R
      • GALA3D - to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D |
      • GaussCtrl - View Consistent Text-Driven 3D Gaussian Splatting Editing. |[arXiv](https://arxiv.org/abs/2403.08733) | | 3D |
      • GaussianCube
      • GaussianDreamer
      • HiFA - fidelity Text-to-3D with advance Diffusion guidance. | | | Model |
      • HoloDreamer
      • Infinigen
      • Interactive3D
      • Isotropic3D - to-3D Generation Based on a Single CLIP Embedding. | | | 3D |
      • LION
      • Make-It-3D - Fidelity 3D Creation from A Single Image with Diffusion Prior. |[arXiv](https://arxiv.org/abs/2303.14184) | | Model |
      • MVDream - view Diffusion for 3D Generation. |[arXiv](https://arxiv.org/abs/2308.16512) | | 3D |
      • NVIDIA Instant NeRF
      • Paint3D - Less Texture Diffusion Models. |[arXiv](https://arxiv.org/abs/2312.13913) | | 3D |
      • PAniC-3D - view 3D Reconstruction from Portraits of Anime Characters. |[arXiv](https://arxiv.org/abs/2303.14587) | | Model |
      • Point·E
      • Shap-E
      • Stable Dreamfusion - to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | | | Model |
      • 3D-LLM
      • 3DTopia - to-3D Generation within 5 Minutes. |[arXiv](https://arxiv.org/abs/2403.02234) | | 3D |
      • threestudio
      • TripoSR - of-the-art open-source model for fast feedforward 3D reconstruction from a single image. |[arXiv](https://arxiv.org/abs/2403.02151) | | Model |
      • Unique3D - Quality and Efficient 3D Mesh Generation from a Single Image. |[arXiv](https://arxiv.org/abs/2405.20343) | | 3D |
      • UnityGaussianSplatting
      • ViVid-1-to-3
      • Wonder3D - Domain Diffusion. |[arXiv](https://arxiv.org/abs/2310.15008) | | 3D |
      • Zero-1-to-3 - shot One Image to 3D Object. |[arXiv](https://arxiv.org/abs/2303.11328) | | Model |
      • Blockade Labs - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model |
      • CSM
      • Dash
      • GenieLabs - UGC. | | | 3D |
      • Instruct-NeRF2NeRF
      • Luma AI
      • LATTE3D - scale Amortized Text-To-Enhanced3D Synthesis. |[arXiv](https://arxiv.org/abs/2403.15385) | | 3D |
      • lumine AI - Powered Creativity. | | | 3D |
      • Meshy
      • Mootion
      • One-2-3-45 - Shape Optimization. |[arXiv](https://arxiv.org/abs/2306.16928) | | Model |
      • ProlificDreamer - Fidelity and diverse Text-to-3D generation with Variational score Distillation. |[arXiv](https://arxiv.org/abs/2305.16213) | | Model |
      • Sloyd
      • Spline AI
      • SV3D - view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. |[arXiv](https://arxiv.org/abs/2403.12008) | | 3D |
      • Tafi
      • 3D-GPT
      • 3Dpresso
      • Voxcraft - to-Use 3D Models with AI. | | | 3D |
      • SF3D - unwrapping and Illumination Disentanglement. |[arXiv](https://arxiv.org/abs/2408.00653) | | 3D |
      • Hunyuan3D-1.0 - 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. |[arXiv](https://arxiv.org/abs/2411.02293) | | 3D |
      • 3DTopia-XL - XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. |[arXiv](https://arxiv.org/abs/2409.12957) | | 3D |
      • Animate3D - view Video Diffusion. |[arXiv](https://arxiv.org/abs/2407.11398) | | 3D |
      • Edify 3D - Quality 3D Asset Generation. |[arXiv](https://arxiv.org/abs/2411.07135) | | 3D |
  • <span id="code">Code</span>

  • <span id="avatar">Avatar</span>

    • <span id="tool">Tool (AI LLM)</span>

      • RodinHD - Fidelity 3D Avatar Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.06938) | | Avatar |
      • AniPortrait - Driven Synthesis of Photorealistic Portrait Animations. |[arXiv](https://arxiv.org/abs/2403.17694) | | Avatar |
      • CALM
      • ChatdollKit
      • DreamTalk
      • Duix - Silicon-Based Digital Human SDK 🌐🤖 | | | Avatar |
      • EchoMimic - Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar |
      • EMOPortraits - enhanced Multimodal One-shot Head Avatars. | | | Avatar |
      • E3 Gen
      • GeneAvatar - Aware Volumetric Head Avatar Editing from a Single Image. |[arXiv](https://arxiv.org/abs/2404.02152) | | Avatar |
      • GeneFace++ - Time 3D Talking Face Generation. | | | Avatar |
      • Hallo - Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar |
      • IntrinsicAvatar
      • Linly-Talker
      • LivePortrait
      • MotionGPT - language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar |
      • MusePose - Driven Image-to-Video Framework for Virtual Human Generation. | | | Avatar |
      • MuseTalk - Time High Quality Lip Synchorization with Latent Space Inpainting. | | | Avatar |
      • MuseV - length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | | | Avatar |
      • Portrait4D - Shot 4D Head Avatar Synthesis using Synthetic Data. |[arXiv](https://arxiv.org/abs/2311.18729) | | Avatar |
      • StyleAvatar3D - Text Diffusion Models for High-Fidelity 3D Avatar Generation. |[arXiv](https://arxiv.org/abs/2305.19012) | | Avatar |
      • Topo4D - Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. |[arXiv](https://arxiv.org/abs/2406.00440) | | Avatar |
      • UnityAIWithChatGPT
      • Vid2Avatar - supervised Scene Decomposition. |[arXiv](https://arxiv.org/abs/2302.11566) | | Avatar |
      • ChatAvatar
      • HeadSculpt
      • Ready Player Me
      • Text2Control3D - Guided Text-to-Image Diffusion Model. |[arXiv](https://arxiv.org/abs/2309.03550) | | Avatar |
      • VLOGGER
      • Wild2Avatar
      • ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. |[arXiv](https://arxiv.org/abs/2407.21686) | | Avatar |
      • Hallo2 - Duration and High-Resolution Audio-Driven Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2410.07718) | | Avatar |
  • <span id="image">Image</span>

  • <span id="video">Video</span>

    • <span id="tool">Tool (AI LLM)</span>

      • DreamCinema
      • ViewCrafter - fidelity Novel View Synthesis. |[arXiv](https://arxiv.org/abs/2409.02048) | | Video |
      • 360DVD - Degree Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2401.06578) | | Video |
      • ART•V - Regressive Text-to-Video Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.18834) | | Video |
      • BackgroundRemover
      • CoDeF
      • CogVLM - source visual language model (VLM). | | | Visual |
      • Diffutoon - Resolution Editable Toon Shading via Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.16224) | | Video |
      • dolphin
      • EDGE - plausible dances while remaining faithful to arbitrary input music. |[arXiv](https://arxiv.org/abs/2211.10658) | | Video |
      • EMO - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. |[arXiv](https://arxiv.org/abs/2402.17485) | | Video |
      • Hotshot-XL - XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | | | Video |
      • LaVie - Quality Video Generation with Cascaded Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2309.15103) | | Video |
      • LVDM - Fidelity Long Video Generation. |[arXiv](https://arxiv.org/abs/2211.13221) | | Video |
      • MicroCinema - and-Conquer Approach for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2311.18829) | | Video |
      • MOFA-Video - to-Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2405.20222) | | Video |
      • MoneyPrinterTurbo
      • Mora
      • MotionDirector - to-Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.08465) | | Video |
      • Motionshop
      • Mov2mov - diffusion-webui. | | | Video |
      • Open-Sora
      • Open-Sora - Sora Plan. | | | Video |
      • Reuse and Diffuse - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.03549) | | Video |
      • ShortGPT
      • Show-1 - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.15818) | | Video |
      • Snap Video - to-Video Synthesis. |[arXiv](https://arxiv.org/abs/2402.14797) | | Video |
      • SoraWebui - source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | | | Video |
      • StableVideo - driven Consistency-aware Diffusion Video Editing. | | | Video |
      • Stable Video Diffusion - to-Video. | | | Video |
      • StoryDiffusion - Attention for Long-Range Image and Video Generation. |[arXiv](https://arxiv.org/abs/2405.01434) | | Video |
      • StreamingT2V
      • StyleCrafter - to-Video Generation with Style Adapter. |[arXiv](https://arxiv.org/abs/2312.00330) | | Video |
      • Text2Video-Zero - to-Image Diffusion Models are Zero-Shot Video Generators. |[arXiv](https://arxiv.org/abs/2303.13439) | | Video |
      • Track-Anything - Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. |[arXiv](https://arxiv.org/abs/2304.11968) | | Video |
      • Tune-A-Video - Shot Tuning of Image Diffusion Models for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2212.11565) | | Video |
      • VGen
      • Video-ChatGPT - ChatGPT is a video conversation model capable of generating meaningful conversation about videos. |[arXiv](https://arxiv.org/abs/2306.05424) | | Video |
      • VideoElevator - to-Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2403.05438) | | Video |
      • VideoGen - Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.00398) | | Video |
      • VideoMamba
      • Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video |
      • VisualRWKV - enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | | | Visual |
      • V-JEPA
      • Anything in Any Scene
      • Assistive
      • AtomoVideo - to-Video Generation. |[arXiv](https://arxiv.org/abs/2403.01800) | | Video |
      • Boximator
      • CogVideo
      • Decohere
      • Descript
      • DomoAI
      • DynamiCrafter - domain Images with Video Diffusion Priors. |[arXiv](https://arxiv.org/abs/2310.12190) | | Video |
      • Emu Video - to-Video Generation by Explicit Image Conditioning. | | | Video |
      • Etna
      • Fairy - Guided Video-to-Video Synthesis. | | | Video |
      • Follow Your Pose - Guided Text-to-Video Generation using Pose-Free Videos. |[arXiv](https://arxiv.org/abs/2304.01186) | | Video |
      • FullJourney
      • Gen-2 - modal AI system that can generate novel videos with text, images, or video clips. | | | Video |
      • Generative Dynamics
      • Genmo
      • GenTron
      • HiGen - temporal Decoupling for Text-to-Video generation. | | | Video |
      • Imagen Video - resolution models. | | | Video |
      • InstructVideo
      • I2VGen-XL - Quality Image-to-Video Synthesis via Cascaded Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.04145) | | Video |
      • LTX Studio - driven filmmaking platform for creators, marketers, filmmakers and studios. | | | Video |
      • Lumiere - Time Diffusion Model for Video Generation. |[arXiv](https://arxiv.org/abs/2401.12945) | | Video |
      • MagicVideo
      • MagicVideo-V2 - Stage High-Aesthetic Video Generation. |[arXiv](https://arxiv.org/abs/2401.04468) | | Video |
      • Magic Hour
      • MAGVIT-v2
      • MAGVIT
      • Make-A-Video - A-Video is a state-of-the-art AI system that generates videos from text. |[arXiv](https://arxiv.org/abs/2209.14792) | | Video |
      • Make Pixels Dance - Dynamic Video Generation. |[arXiv](https://arxiv.org/abs/2311.10982) | | Video |
      • Make-Your-Video
      • MobileVidFactory - Based Social Media Video Generation for Mobile Devices from Text. | | | Video |
      • Moonvalley - to-video generative AI model. | | | Video |
      • Morph Studio - to-Video AI Magic, manifest your creativity through your prompt. | | | Video |
      • MotionCtrl
      • MovieFactory
      • Neural Frames
      • NeverEnds
      • Phenaki
      • Pixeling - realistic, and extremely controllable visual content including images, videos and 3D models. | | | Video |
      • PixVerse - taking videos with AI. | | | Video |
      • Pollinations
      • Sora
      • TATS - Agnostic VQGAN and Time-Sensitive Transformer. | | | Video |
      • TF-T2V - to-Video Generation with Text-free Videos. |[arXiv](https://arxiv.org/abs/2312.15770) | | Video |
      • TwelveLabs
      • UniVG - modal Video Generation. | | | Video |
      • VideoComposer
      • VideoCrafter1 - Quality Video Generation. |[arXiv](https://arxiv.org/abs/2310.19512) | | Video |
      • VideoCrafter2 - Quality Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.09047) | | Video |
      • VideoDrafter - Consistent Multi-Scene Video Generation with LLM. |[arXiv](https://arxiv.org/abs/2401.01256) | | Video |
      • VideoFactory - to-Video Generation. | | | Video |
      • VideoLCM
      • Video LDMs - resolution Video Synthesis with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2304.08818) | | Video |
      • VideoPoet - shot video generation. |[arXiv](https://arxiv.org/abs/2312.14125) | | Video |
      • Vispunk Motion
      • W.A.L.T
      • Zeroscope - to-Video. | | | Video |
      • Genie
      • CogVideoX - source version of the video generation model, which is homologous to 清影. | | | Video |
      • Tora - oriented Diffusion Transformer for Video Generation. |[arXiv](https://arxiv.org/abs/2407.21705) | | Video |
      • MIMO
      • Follow-Your-Canvas - Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. |[arXiv](https://arxiv.org/abs/2409.01055) | | Video |
      • Video-LLaVA
      • FullJourney
      • Mini-Gemini - modality Vision Language Models. | | | Vision |
      • Mochi 1 - of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. | | | Video |
      • Animate-A-Story - Augmented Video Generation for Telling a Story. |[arXiv](https://arxiv.org/abs/2307.06940) | | Video |
      • CoNR - drawn anime character sheets(ACS). |[arXiv](https://arxiv.org/abs/2207.05378) | | Video |
      • Vchitect-2.0 - 2.0: Parallel Transformer for Scaling Up Video Diffusion Models. | | | Video |
      • LTX-Video - Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. | | | Video |
  • <span id="music">Music</span>

  • <span id="writer">Writer</span>

    • <span id="tool">Tool (AI LLM)</span>

  • Project List

  • <span id="shader">Shader</span>

    • <span id="tool">Tool (AI LLM)</span>

      • AI Shader - powered shader generator for Unity. | | Unity | Shader |
  • <span id="animation">Animation</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Animate Anyone - to-Video Synthesis for Character Animation. |[arXiv](https://arxiv.org/abs/2311.17117) | | Animation |
      • AnimateAnything - Grained Open Domain Image Animation with Motion Guidance. |[arXiv](https://arxiv.org/abs/2311.12886) | | Animation |
      • AnimateLCM
      • AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
      • AnimationGPT
      • DreaMoving
      • FaceFusion
      • GeneFace - Fidelity Audio-Driven 3D Talking Face Synthesis. |[arXiv](https://arxiv.org/abs/2301.13430) | | Animation |
      • MagicAnimate
      • SadTalker-Video-Lip-Sync
      • Wav2Lip - syncing Videos In The Wild. |[arXiv](https://arxiv.org/abs/2008.10010) | | Animation |
      • Deforum
      • FreeInit
      • ID-Animator - Shot Identity-Preserving Human Video Generation. |[arXiv](https://arxiv.org/abs/2404.15275) | | Animation |
      • NUWA-Infinity - Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | | | Animation |
      • Omni Animation
      • PIA - and-Play Modules in Text-to-Image Models. |[arXiv](https://arxiv.org/abs/2312.13964) | | Animation |
      • Stable Animation - to-animation tool for developers. | | | Animation |
      • Wonder Studio - action scene. | | | Animation |
      • DrawingSpinUp
      • AnimateDiff - to-Image Diffusion Models without Specific Tuning. |[arXiv](https://arxiv.org/abs/2307.04725) | | Animation |
      • SadTalker - Driven Single Image Talking Face Animation. |[arXiv](https://arxiv.org/abs/2211.12194) | | Animation |
      • TaleCrafter
      • Animate-X - X: Universal Character Image Animation with Enhanced Motion Representation. |[arXiv](https://arxiv.org/abs/2410.10306) | | Animation |
      • NUWA-XL
  • <span id="visual">Visual</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Cambrian-1 - 1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. |[arXiv](https://arxiv.org/abs/2406.16860) | | Multimodal LLMs |
      • CogVLM2 - level open-source multi-modal model based on Llama3-8B. | | | Visual |
      • EVF-SAM - SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. |[arXiv](https://arxiv.org/abs/2406.20076) | | Visual |
      • Kangaroo - Language Model Supporting Long-context Video Input. | | | Visual |
      • LLaVA++ - 3 and Phi-3. | | | Visual |
      • LongVA
      • MiniCPM-Llama3-V 2.5 - 4V Level MLLM on Your Phone. | | | Visual |
      • MotionLLM
      • PLLaVA - free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual |
      • Qwen-VL - Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual |
      • ShareGPT4V - modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual |
      • SOLO - Language Modeling. |[arXiv](https://arxiv.org/abs/2407.06438) | | Visual |
      • Video-CCAM - CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | | | Visual |
      • VideoLLaMA 2 - Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual |
      • Video-MME - Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual |
      • Vitron - level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual |
      • VILA - training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual |
      • CoTracker
      • FaceHi
      • LGVI - Driven Video Inpainting via Multimodal Large Language Models. | | | Visual |
      • MaskViT - Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual |
      • LLaVA-OneVision - OneVision: Easy Visual Task Transfer. |[arXiv](https://arxiv.org/abs/2408.03326) | | Visual |
      • Sapiens
      • MoE-LLaVA - Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual |
      • InternLM-XComposer2 - XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2404.06512) | | Visual |
  • <span id="audio">Audio</span>

    • <span id="tool">Tool (AI LLM)</span>

      • AcademiCodec
      • Amphion - Source Audio, Music, and Speech Generation Toolkit. |[arXiv](https://arxiv.org/abs/2312.09911) | | Audio |
      • ArchiSound
      • AudioEditing - Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. |[arXiv](https://arxiv.org/abs/2402.10009) | | Audio |
      • Audiogen Codec
      • AudioGPT
      • AudioLCM - to-Audio Generation with Latent Consistency Models. |[arXiv](https://arxiv.org/abs/2406.00356v1) | | Audio |
      • AudioLDM 2 - supervised Pretraining. |[arXiv](https://arxiv.org/abs/2308.05734) | | Audio |
      • Auffusion - to-Audio Generation. |[arXiv](https://arxiv.org/abs/2401.01044) | | Audio |
      • CTAG - to-Audio Generation via Synthesizer Programming. | | | Audio |
      • Make-An-Audio 3 - based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio |
      • NeuralSound - based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio |
      • Qwen2-Audio - Audio chat & pretrained large audio language model proposed by Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2407.10759) | | Audio |
      • SEE-2-SOUND - Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio |
      • TANGO - to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio |
      • VTA-LDM - to-Audio Generation with Hidden Alignment. |[arXiv](https://arxiv.org/abs/2407.07464) | | Audio |
      • WavJourney
      • Audiobox
      • AudioLDM - to-Audio Generation with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12503) | | Audio |
      • MAGNeT - Autoregressive Transformer. | | | Audio |
      • Make-An-Audio - To-Audio Generation with Prompt-Enhanced Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12661) | | Audio |
      • OptimizerAI
      • SoundStorm
      • Stable Audio - Conditioned Latent Audio Diffusion. | | | Audio |
      • Stable Audio Open - length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio |
      • FoleyCrafter
      • SyncFusion - synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio |
  • <span id="voice">Singing Voice</span>

  • <span id="speech">Speech</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Applio - friendly experience. | | | Speech |
      • Bark - Prompted Generative Audio Model. | | | Speech |
      • Bert-VITS2
      • ChatTTS
      • CosyVoice - lingual large voice generation model, providing inference, training and deployment full-stack ability. | | | Speech |
      • DEX-TTS - based EXpressive Text-to-Speech with Style Modeling on Time Variability. | [arXiv](https://arxiv.org/abs/2406.19135) | | Speech |
      • EmotiVoice - Voice and Prompt-Controlled TTS Engine. | | | Speech |
      • Glow-TTS - to-Speech via Monotonic Alignment Search. | [arXiv](https://arxiv.org/abs/2005.11129) | | Speech |
      • GPT-SoVITS - shot Voice Conversion and Text-to-Speech WebUI. | | | Speech |
      • MahaTTS - Source Large Speech Generation Model. | | | Speech |
      • Matcha-TTS
      • MeloTTS - quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | | | Speech |
      • MetaVoice-1B - level speech intelligence. | | | Speech |
      • One-Shot-Voice-Cloning - TTS. | | | Speech |
      • OpenVoice
      • OverFlow
      • RealtimeTTS - of-the-art text-to-speech (TTS) library designed for real-time applications. | | | Speech |
      • SenseVoice
      • SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech |
      • speech-to-text-gpt3-unity
      • StableTTS - generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | | | Speech |
      • StyleTTS 2 - Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | [arXiv](https://arxiv.org/abs/2306.07691) | | Speech |
      • TorToiSe-TTS - voice TTS system trained with an emphasis on quality. | | | Speech |
      • TTS Generation WebUI
      • Voicebox - Guided Multilingual Universal Speech Generation at Scale. | [arXiv](https://arxiv.org/abs/2306.15687) | | Speech |
      • VoiceCraft - Shot Speech Editing and Text-to-Speech in the Wild. | | | Speech |
      • Whisper - purpose speech recognition model. | | | Speech |
      • WhisperSpeech - to-speech system built by inverting Whisper. | | | Speech |
      • X-E-Speech - Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | | | Speech |
      • XTTS - to-Speech generation. | | | Speech |
      • YourTTS - Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | [arXiv](https://arxiv.org/abs/2112.02418) | | Speech |
      • ZMM-TTS - shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | [arXiv](https://arxiv.org/abs/2312.14398) | | Speech |
      • Audyo
      • CLAPSpeech - Audio Pre-Training. | [arXiv](https://arxiv.org/abs/2305.10763) | | Speech |
      • Fliki
      • LOVO - to AI Voice Generator & Text to Speech platform for thousands of creators. | | | Speech |
      • Narakeet
      • VALL-E - Shot Text to Speech Synthesizers. | [arXiv](https://arxiv.org/abs/2301.02111) | | Speech |
      • VALL-E X - Lingual Neural Codec Language Modeling | [arXiv](https://arxiv.org/abs/2303.03926) | | Speech |
      • Vocode - source library for building voice-based LLM applications. | | | Speech |
      • tortoise.cpp - tts. | | | Speech |
      • Mini-Omni - Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | [arXiv](https://arxiv.org/abs/2408.16725) | | Speech |
      • GLM-4-Voice - 4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. | | | Speech |
      • Stable Speech - to-Speech model. | | | Speech |
  • <span id="speech">Analytics</span>

    • <span id="tool">Tool (AI LLM)</span>