Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

ai-game-devtools

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
https://github.com/Yuan-ManX/ai-game-devtools

Last synced: 5 days ago
JSON representation

  • <span id="game">Game (Agent)</span>

  • <span id="model">3D Model</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Blender-GPT - in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model |
      • Anything-3D - Anything + 3D. Let's lift the anything to 3D. |[arXiv](https://arxiv.org/abs/2304.10261) | | Model |
      • Any2Point - modality Large Models for Efficient 3D Understanding. |[arXiv](https://arxiv.org/abs/2404.07989) | | 3D |
      • BlenderGPT - 4. | | Blender | Model |
      • CF-3DGS - Free 3D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.07504) | | 3D |
      • CharacterGen - View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D |
      • chatGPT-maya
      • CityDreamer
      • DreamCatalyst - Quality 3D Editing via Controlling Editability and Identity Preservation. |[arXiv](https://arxiv.org/abs/2407.11394) | | 3D |
      • DreamGaussian4D
      • DUSt3R
      • GALA3D - to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D |
      • GaussCtrl - View Consistent Text-Driven 3D Gaussian Splatting Editing. |[arXiv](https://arxiv.org/abs/2403.08733) | | 3D |
      • GaussianCube
      • GaussianDreamer
      • HiFA - fidelity Text-to-3D with advance Diffusion guidance. | | | Model |
      • HoloDreamer
      • Infinigen
      • Interactive3D
      • Isotropic3D - to-3D Generation Based on a Single CLIP Embedding. | | | 3D |
      • LION
      • Make-It-3D - Fidelity 3D Creation from A Single Image with Diffusion Prior. |[arXiv](https://arxiv.org/abs/2303.14184) | | Model |
      • MVDream - view Diffusion for 3D Generation. |[arXiv](https://arxiv.org/abs/2308.16512) | | 3D |
      • NVIDIA Instant NeRF
      • Paint3D - Less Texture Diffusion Models. |[arXiv](https://arxiv.org/abs/2312.13913) | | 3D |
      • PAniC-3D - view 3D Reconstruction from Portraits of Anime Characters. |[arXiv](https://arxiv.org/abs/2303.14587) | | Model |
      • Point·E
      • Shap-E
      • Stable Dreamfusion - to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | | | Model |
      • 3D-LLM
      • 3DTopia - to-3D Generation within 5 Minutes. |[arXiv](https://arxiv.org/abs/2403.02234) | | 3D |
      • threestudio
      • TripoSR - of-the-art open-source model for fast feedforward 3D reconstruction from a single image. |[arXiv](https://arxiv.org/abs/2403.02151) | | Model |
      • Unique3D - Quality and Efficient 3D Mesh Generation from a Single Image. |[arXiv](https://arxiv.org/abs/2405.20343) | | 3D |
      • UnityGaussianSplatting
      • ViVid-1-to-3
      • Wonder3D - Domain Diffusion. |[arXiv](https://arxiv.org/abs/2310.15008) | | 3D |
      • Zero-1-to-3 - shot One Image to 3D Object. |[arXiv](https://arxiv.org/abs/2303.11328) | | Model |
      • Blockade Labs - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model |
      • CSM
      • Dash
      • GenieLabs - UGC. | | | 3D |
      • Instruct-NeRF2NeRF
      • Luma AI
      • LATTE3D - scale Amortized Text-To-Enhanced3D Synthesis. |[arXiv](https://arxiv.org/abs/2403.15385) | | 3D |
      • lumine AI - Powered Creativity. | | | 3D |
      • Meshy
      • Mootion
      • One-2-3-45 - Shape Optimization. |[arXiv](https://arxiv.org/abs/2306.16928) | | Model |
      • ProlificDreamer - Fidelity and diverse Text-to-3D generation with Variational score Distillation. |[arXiv](https://arxiv.org/abs/2305.16213) | | Model |
      • Sloyd
      • Spline AI
      • SV3D - view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. |[arXiv](https://arxiv.org/abs/2403.12008) | | 3D |
      • Tafi
      • 3D-GPT
      • 3Dpresso
      • Voxcraft - to-Use 3D Models with AI. | | | 3D |
      • SF3D - unwrapping and Illumination Disentanglement. |[arXiv](https://arxiv.org/abs/2408.00653) | | 3D |
  • <span id="code">Code</span>

  • <span id="avatar">Avatar</span>

    • <span id="tool">Tool (AI LLM)</span>

      • RodinHD - Fidelity 3D Avatar Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2407.06938) | | Avatar |
      • AniPortrait - Driven Synthesis of Photorealistic Portrait Animations. |[arXiv](https://arxiv.org/abs/2403.17694) | | Avatar |
      • CALM
      • ChatdollKit
      • DreamTalk
      • Duix - Silicon-Based Digital Human SDK 🌐🤖 | | | Avatar |
      • EchoMimic - Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar |
      • EMOPortraits - enhanced Multimodal One-shot Head Avatars. | | | Avatar |
      • E3 Gen
      • GeneAvatar - Aware Volumetric Head Avatar Editing from a Single Image. |[arXiv](https://arxiv.org/abs/2404.02152) | | Avatar |
      • GeneFace++ - Time 3D Talking Face Generation. | | | Avatar |
      • Hallo - Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar |
      • IntrinsicAvatar
      • Linly-Talker
      • LivePortrait
      • MotionGPT - language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar |
      • MusePose - Driven Image-to-Video Framework for Virtual Human Generation. | | | Avatar |
      • MuseTalk - Time High Quality Lip Synchorization with Latent Space Inpainting. | | | Avatar |
      • MuseV - length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | | | Avatar |
      • Portrait4D - Shot 4D Head Avatar Synthesis using Synthetic Data. |[arXiv](https://arxiv.org/abs/2311.18729) | | Avatar |
      • StyleAvatar3D - Text Diffusion Models for High-Fidelity 3D Avatar Generation. |[arXiv](https://arxiv.org/abs/2305.19012) | | Avatar |
      • Topo4D - Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. |[arXiv](https://arxiv.org/abs/2406.00440) | | Avatar |
      • UnityAIWithChatGPT
      • Vid2Avatar - supervised Scene Decomposition. |[arXiv](https://arxiv.org/abs/2302.11566) | | Avatar |
      • ChatAvatar
      • HeadSculpt
      • Ready Player Me
      • Text2Control3D - Guided Text-to-Image Diffusion Model. |[arXiv](https://arxiv.org/abs/2309.03550) | | Avatar |
      • VLOGGER
      • Wild2Avatar
  • <span id="image">Image</span>

  • <span id="texture">Texture</span>

    • <span id="tool">Tool (AI LLM)</span>

      • CRM
      • Paint-it - to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture |
      • Text2Tex - driven texture Synthesis via Diffusion Models. |[arXiv](https://arxiv.org/abs/2303.11396) | | Texture |
      • X-Mesh - Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. |[arXiv](https://arxiv.org/abs/2303.15764) | | Texture |
      • DreamMat - quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. |[arXiv](https://arxiv.org/abs/2405.17176) | | Texture |
      • DreamSpace - Driven Panoramic Texture Propagation. | | | Texture |
      • Dream Textures - in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | | Blender | Texture |
      • InstructHumans
      • InteX - to-Texture Synthesis via Unified Depth-aware Inpainting. |[arXiv](https://arxiv.org/abs/2403.11878) | | Texture |
      • MaterialSeg3D
      • MeshAnything
      • Neuralangelo - Fidelity Neural Surface Reconstruction. |[arXiv](https://arxiv.org/abs/2306.03092) | | Texture |
      • Polycam
      • TexFusion - Guided Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.13772) | | Texture |
      • Texture Lab - generated texures. You can generate your own with a text prompt. | | | Texture |
      • With Poly
  • <span id="writer">Writer</span>

    • <span id="tool">Tool (AI LLM)</span>

  • Project List

  • <span id="visual">Visual</span>

    • <span id="tool">Tool (AI LLM)</span>

      • InternLM-XComposer2 - XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2404.06512) | | Visual |
      • Cambrian-1 - 1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. |[arXiv](https://arxiv.org/abs/2406.16860) | | Multimodal LLMs |
      • CogVLM2 - level open-source multi-modal model based on Llama3-8B. | | | Visual |
      • EVF-SAM - SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. |[arXiv](https://arxiv.org/abs/2406.20076) | | Visual |
      • Kangaroo - Language Model Supporting Long-context Video Input. | | | Visual |
      • LLaVA++ - 3 and Phi-3. | | | Visual |
      • LongVA
      • MiniCPM-Llama3-V 2.5 - 4V Level MLLM on Your Phone. | | | Visual |
      • MotionLLM
      • PLLaVA - free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual |
      • Qwen-VL - Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual |
      • ShareGPT4V - modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual |
      • SOLO - Language Modeling. |[arXiv](https://arxiv.org/abs/2407.06438) | | Visual |
      • Video-CCAM - CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | | | Visual |
      • VideoLLaMA 2 - Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual |
      • Video-MME - Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual |
      • Vitron - level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual |
      • VILA - training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual |
      • CoTracker
      • FaceHi
      • LGVI - Driven Video Inpainting via Multimodal Large Language Models. | | | Visual |
      • MaskViT - Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual |
      • LLaVA-OneVision - OneVision: Easy Visual Task Transfer. |[arXiv](https://arxiv.org/abs/2408.03326) | | Visual |
      • MoE-LLaVA - Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual |
      • Sapiens
  • <span id="shader">Shader</span>

    • <span id="tool">Tool (AI LLM)</span>

      • AI Shader - powered shader generator for Unity. | | Unity | Shader |
  • <span id="animation">Animation</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Animate Anyone - to-Video Synthesis for Character Animation. |[arXiv](https://arxiv.org/abs/2311.17117) | | Animation |
      • AnimateAnything - Grained Open Domain Image Animation with Motion Guidance. |[arXiv](https://arxiv.org/abs/2311.12886) | | Animation |
      • AnimateLCM
      • AnimateZero - Shot Image Animators. |[arXiv](https://arxiv.org/abs/2312.03793) | | Animation |
      • AnimationGPT
      • DreaMoving
      • FaceFusion
      • GeneFace - Fidelity Audio-Driven 3D Talking Face Synthesis. |[arXiv](https://arxiv.org/abs/2301.13430) | | Animation |
      • MagicAnimate
      • SadTalker-Video-Lip-Sync
      • ToonCrafter
      • Wav2Lip - syncing Videos In The Wild. |[arXiv](https://arxiv.org/abs/2008.10010) | | Animation |
      • Deforum
      • FreeInit
      • ID-Animator - Shot Identity-Preserving Human Video Generation. |[arXiv](https://arxiv.org/abs/2404.15275) | | Animation |
      • NUWA-Infinity - Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | | | Animation |
      • Omni Animation
      • PIA - and-Play Modules in Text-to-Image Models. |[arXiv](https://arxiv.org/abs/2312.13964) | | Animation |
      • Stable Animation - to-animation tool for developers. | | | Animation |
      • Wonder Studio - action scene. | | | Animation |
      • NUWA-XL
  • <span id="video">Video</span>

    • <span id="tool">Tool (AI LLM)</span>

      • 360DVD - Degree Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2401.06578) | | Video |
      • ART•V - Regressive Text-to-Video Generation with Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.18834) | | Video |
      • BackgroundRemover
      • CoDeF
      • CogVLM - source visual language model (VLM). | | | Visual |
      • Diffutoon - Resolution Editable Toon Shading via Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.16224) | | Video |
      • dolphin
      • EDGE - plausible dances while remaining faithful to arbitrary input music. |[arXiv](https://arxiv.org/abs/2211.10658) | | Video |
      • EMO - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. |[arXiv](https://arxiv.org/abs/2402.17485) | | Video |
      • Hotshot-XL - XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | | | Video |
      • LaVie - Quality Video Generation with Cascaded Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2309.15103) | | Video |
      • LVDM - Fidelity Long Video Generation. |[arXiv](https://arxiv.org/abs/2211.13221) | | Video |
      • MicroCinema - and-Conquer Approach for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2311.18829) | | Video |
      • MOFA-Video - to-Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2405.20222) | | Video |
      • MoneyPrinterTurbo
      • Mora
      • MotionDirector - to-Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2310.08465) | | Video |
      • Motionshop
      • Mov2mov - diffusion-webui. | | | Video |
      • Open-Sora
      • Open-Sora - Sora Plan. | | | Video |
      • Reuse and Diffuse - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.03549) | | Video |
      • ShortGPT
      • Show-1 - to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.15818) | | Video |
      • Snap Video - to-Video Synthesis. |[arXiv](https://arxiv.org/abs/2402.14797) | | Video |
      • SoraWebui - source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | | | Video |
      • StableVideo - driven Consistency-aware Diffusion Video Editing. | | | Video |
      • Stable Video Diffusion - to-Video. | | | Video |
      • StoryDiffusion - Attention for Long-Range Image and Video Generation. |[arXiv](https://arxiv.org/abs/2405.01434) | | Video |
      • StreamingT2V
      • StyleCrafter - to-Video Generation with Style Adapter. |[arXiv](https://arxiv.org/abs/2312.00330) | | Video |
      • Text2Video-Zero - to-Image Diffusion Models are Zero-Shot Video Generators. |[arXiv](https://arxiv.org/abs/2303.13439) | | Video |
      • Track-Anything - Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. |[arXiv](https://arxiv.org/abs/2304.11968) | | Video |
      • Tune-A-Video - Shot Tuning of Image Diffusion Models for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2212.11565) | | Video |
      • VGen
      • Video-ChatGPT - ChatGPT is a video conversation model capable of generating meaningful conversation about videos. |[arXiv](https://arxiv.org/abs/2306.05424) | | Video |
      • VideoElevator - to-Image Diffusion Models. |[arXiv](https://arxiv.org/abs/2403.05438) | | Video |
      • VideoGen - Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2309.00398) | | Video |
      • VideoMamba
      • Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video |
      • VisualRWKV - enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | | | Visual |
      • V-JEPA
      • Anything in Any Scene
      • Assistive
      • AtomoVideo - to-Video Generation. |[arXiv](https://arxiv.org/abs/2403.01800) | | Video |
      • Boximator
      • CogVideo
      • Decohere
      • Descript
      • DomoAI
      • DynamiCrafter - domain Images with Video Diffusion Priors. |[arXiv](https://arxiv.org/abs/2310.12190) | | Video |
      • Emu Video - to-Video Generation by Explicit Image Conditioning. | | | Video |
      • Etna
      • Fairy - Guided Video-to-Video Synthesis. | | | Video |
      • Follow Your Pose - Guided Text-to-Video Generation using Pose-Free Videos. |[arXiv](https://arxiv.org/abs/2304.01186) | | Video |
      • FullJourney
      • Gen-2 - modal AI system that can generate novel videos with text, images, or video clips. | | | Video |
      • Generative Dynamics
      • Genmo
      • GenTron
      • HiGen - temporal Decoupling for Text-to-Video generation. | | | Video |
      • Imagen Video - resolution models. | | | Video |
      • InstructVideo
      • I2VGen-XL - Quality Image-to-Video Synthesis via Cascaded Diffusion Models. |[arXiv](https://arxiv.org/abs/2311.04145) | | Video |
      • LTX Studio - driven filmmaking platform for creators, marketers, filmmakers and studios. | | | Video |
      • Lumiere - Time Diffusion Model for Video Generation. |[arXiv](https://arxiv.org/abs/2401.12945) | | Video |
      • MagicVideo
      • MagicVideo-V2 - Stage High-Aesthetic Video Generation. |[arXiv](https://arxiv.org/abs/2401.04468) | | Video |
      • Magic Hour
      • MAGVIT-v2
      • MAGVIT
      • Make-A-Video - A-Video is a state-of-the-art AI system that generates videos from text. |[arXiv](https://arxiv.org/abs/2209.14792) | | Video |
      • Make Pixels Dance - Dynamic Video Generation. |[arXiv](https://arxiv.org/abs/2311.10982) | | Video |
      • Make-Your-Video
      • MobileVidFactory - Based Social Media Video Generation for Mobile Devices from Text. | | | Video |
      • Moonvalley - to-video generative AI model. | | | Video |
      • Morph Studio - to-Video AI Magic, manifest your creativity through your prompt. | | | Video |
      • MotionCtrl
      • MovieFactory
      • Neural Frames
      • NeverEnds
      • Phenaki
      • Pixeling - realistic, and extremely controllable visual content including images, videos and 3D models. | | | Video |
      • PixVerse - taking videos with AI. | | | Video |
      • Pollinations
      • Sora
      • TATS - Agnostic VQGAN and Time-Sensitive Transformer. | | | Video |
      • TF-T2V - to-Video Generation with Text-free Videos. |[arXiv](https://arxiv.org/abs/2312.15770) | | Video |
      • TwelveLabs
      • UniVG - modal Video Generation. | | | Video |
      • VideoComposer
      • VideoCrafter1 - Quality Video Generation. |[arXiv](https://arxiv.org/abs/2310.19512) | | Video |
      • VideoCrafter2 - Quality Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.09047) | | Video |
      • VideoDrafter - Consistent Multi-Scene Video Generation with LLM. |[arXiv](https://arxiv.org/abs/2401.01256) | | Video |
      • VideoFactory - to-Video Generation. | | | Video |
      • VideoLCM
      • Video LDMs - resolution Video Synthesis with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2304.08818) | | Video |
      • VideoPoet - shot video generation. |[arXiv](https://arxiv.org/abs/2312.14125) | | Video |
      • Vispunk Motion
      • W.A.L.T
      • Zeroscope - to-Video. | | | Video |
      • Genie
      • CogVideoX - source version of the video generation model, which is homologous to 清影. | | | Video |
      • Tora - oriented Diffusion Transformer for Video Generation. |[arXiv](https://arxiv.org/abs/2407.21705) | | Video |
      • MotionClone - Free Motion Cloning for Controllable Video Generation. |[arXiv](https://arxiv.org/abs/2406.05338) | | Video |
      • Video-LLaVA
      • FullJourney
  • <span id="audio">Audio</span>

    • <span id="tool">Tool (AI LLM)</span>

      • AcademiCodec
      • Amphion - Source Audio, Music, and Speech Generation Toolkit. |[arXiv](https://arxiv.org/abs/2312.09911) | | Audio |
      • ArchiSound
      • AudioEditing - Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. |[arXiv](https://arxiv.org/abs/2402.10009) | | Audio |
      • Audiogen Codec
      • AudioGPT
      • AudioLCM - to-Audio Generation with Latent Consistency Models. |[arXiv](https://arxiv.org/abs/2406.00356v1) | | Audio |
      • AudioLDM 2 - supervised Pretraining. |[arXiv](https://arxiv.org/abs/2308.05734) | | Audio |
      • Auffusion - to-Audio Generation. |[arXiv](https://arxiv.org/abs/2401.01044) | | Audio |
      • CTAG - to-Audio Generation via Synthesizer Programming. | | | Audio |
      • Make-An-Audio 3 - based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio |
      • NeuralSound - based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio |
      • Qwen2-Audio - Audio chat & pretrained large audio language model proposed by Alibaba Cloud. |[arXiv](https://arxiv.org/abs/2407.10759) | | Audio |
      • SEE-2-SOUND - Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio |
      • TANGO - to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio |
      • VTA-LDM - to-Audio Generation with Hidden Alignment. |[arXiv](https://arxiv.org/abs/2407.07464) | | Audio |
      • WavJourney
      • Audiobox
      • AudioLDM - to-Audio Generation with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12503) | | Audio |
      • MAGNeT - Autoregressive Transformer. | | | Audio |
      • Make-An-Audio - To-Audio Generation with Prompt-Enhanced Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12661) | | Audio |
      • OptimizerAI
      • SoundStorm
      • Stable Audio - Conditioned Latent Audio Diffusion. | | | Audio |
      • Stable Audio Open - length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio |
      • FoleyCrafter
      • SyncFusion - synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio |
  • <span id="music">Music</span>

  • <span id="voice">Singing Voice</span>

  • <span id="speech">Speech</span>

    • <span id="tool">Tool (AI LLM)</span>

      • Applio - friendly experience. | | | Speech |
      • Bark - Prompted Generative Audio Model. | | | Speech |
      • Bert-VITS2
      • ChatTTS
      • CosyVoice - lingual large voice generation model, providing inference, training and deployment full-stack ability. | | | Speech |
      • DEX-TTS - based EXpressive Text-to-Speech with Style Modeling on Time Variability. | [arXiv](https://arxiv.org/abs/2406.19135) | | Speech |
      • EmotiVoice - Voice and Prompt-Controlled TTS Engine. | | | Speech |
      • Glow-TTS - to-Speech via Monotonic Alignment Search. | [arXiv](https://arxiv.org/abs/2005.11129) | | Speech |
      • GPT-SoVITS - shot Voice Conversion and Text-to-Speech WebUI. | | | Speech |
      • MahaTTS - Source Large Speech Generation Model. | | | Speech |
      • Matcha-TTS
      • MeloTTS - quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | | | Speech |
      • MetaVoice-1B - level speech intelligence. | | | Speech |
      • One-Shot-Voice-Cloning - TTS. | | | Speech |
      • OpenVoice
      • OverFlow
      • RealtimeTTS - of-the-art text-to-speech (TTS) library designed for real-time applications. | | | Speech |
      • SenseVoice
      • SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech |
      • speech-to-text-gpt3-unity
      • StableTTS - generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | | | Speech |
      • StyleTTS 2 - Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | [arXiv](https://arxiv.org/abs/2306.07691) | | Speech |
      • TorToiSe-TTS - voice TTS system trained with an emphasis on quality. | | | Speech |
      • TTS Generation WebUI
      • Voicebox - Guided Multilingual Universal Speech Generation at Scale. | [arXiv](https://arxiv.org/abs/2306.15687) | | Speech |
      • VoiceCraft - Shot Speech Editing and Text-to-Speech in the Wild. | | | Speech |
      • Whisper - purpose speech recognition model. | | | Speech |
      • WhisperSpeech - to-speech system built by inverting Whisper. | | | Speech |
      • X-E-Speech - Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | | | Speech |
      • XTTS - to-Speech generation. | | | Speech |
      • YourTTS - Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | [arXiv](https://arxiv.org/abs/2112.02418) | | Speech |
      • ZMM-TTS - shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | [arXiv](https://arxiv.org/abs/2312.14398) | | Speech |
      • Audyo
      • CLAPSpeech - Audio Pre-Training. | [arXiv](https://arxiv.org/abs/2305.10763) | | Speech |
      • Fliki
      • LOVO - to AI Voice Generator & Text to Speech platform for thousands of creators. | | | Speech |
      • Narakeet
      • VALL-E - Shot Text to Speech Synthesizers. | [arXiv](https://arxiv.org/abs/2301.02111) | | Speech |
      • VALL-E X - Lingual Neural Codec Language Modeling | [arXiv](https://arxiv.org/abs/2303.03926) | | Speech |
      • Vocode - source library for building voice-based LLM applications. | | | Speech |
      • tortoise.cpp - tts. | | | Speech |
      • Mini-Omni - Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | [arXiv](https://arxiv.org/abs/2408.16725) | | Speech |
  • <span id="speech">Analytics</span>

    • <span id="tool">Tool (AI LLM)</span>