Projects in Awesome Lists tagged with audio-generation

https://github.com/go-skynet/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference

ai api audio-generation distributed gemma gpt4all image-generation kubernetes llama llama3 llm mamba mistral musicgen p2p rerank rwkv stable-diffusion text-generation tts

Last synced: 09 Nov 2024

https://github.com/mudler/localai

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference

ai api audio-generation distributed gemma gpt4all image-generation kubernetes llama llama3 llm mamba mistral musicgen p2p rerank rwkv stable-diffusion text-generation tts

Last synced: 30 Dec 2024

https://github.com/mudler/LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

ai api audio-generation distributed gemma gpt4all image-generation kubernetes llama llama3 llm mamba mistral musicgen p2p rerank rwkv stable-diffusion text-generation tts

Last synced: 25 Oct 2024

https://github.com/open-mmlab/amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

audio-generation audio-synthesis audioldm audit emilia fastspeech2 maskgct music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e vits vocoder voice-conversion

Last synced: 24 Dec 2024

https://github.com/funaudiollm/cosyvoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation cantonese chatbot chatgpt chinese cosyvoice cross-lingual english fine-grained fine-tuning gpt-4o japanese korean multi-lingual natural-language-generation python text-to-speech tts voice-cloning

Last synced: 24 Dec 2024

https://github.com/FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation cantonese chatbot chatgpt chinese cosyvoice cross-lingual english fine-grained fine-tuning gpt-4o japanese korean multi-lingual natural-language-generation python text-to-speech tts voice-cloning

Last synced: 29 Oct 2024

https://github.com/open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

audio-generation audio-synthesis audioldm audit fastspeech2 hifi-gan music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e vits voice-conversion

Last synced: 31 Oct 2024

https://github.com/haoheliu/audioldm

AudioLDM: Generate speech, sound effects, music and beyond, with text.

audio-generation

Last synced: 24 Dec 2024

https://github.com/haoheliu/AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

audio-generation

Last synced: 30 Oct 2024

https://github.com/haoheliu/audioldm2

Text-to-Audio/Music Generation

audio-generation

Last synced: 24 Dec 2024

https://github.com/haoheliu/AudioLDM2

Text-to-Audio/Music Generation

audio-generation

Last synced: 29 Oct 2024

https://github.com/archinetai/audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

artificial-intelligence audio-generation deep-learning denoising-diffusion

Last synced: 26 Dec 2024

https://github.com/archinetai/audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

artificial-intelligence audio-generation machine-learning

Last synced: 03 Dec 2024

https://github.com/rsxdalv/tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

ai audio-generation audiogen bark deep-learning generator gradio machine-learning magnet music musicgen rvc seamlessm4t styletts2 text-to-speech torch tortoise-tts tts vocos web

Last synced: 05 Nov 2024

https://github.com/lucidrains/soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

artificial-intelligence attention-mechanism audio-generation deep-learning non-autoregressive transformers

Last synced: 25 Dec 2024

https://github.com/declare-lab/tango

A family of diffusion models for text-to-audio generation.

audio-generation diffusion diffusion-models language-models large-language-models text-to-audio

Last synced: 27 Dec 2024

https://github.com/nvidia/bigvgan

Official PyTorch implementation of BigVGAN (ICLR 2023)

audio-generation audio-synthesis music-synthesis neural-vocoder singing-voice-synthesis speech-synthesis

Last synced: 27 Dec 2024

https://github.com/Yuan-ManX/ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

aigc artificial-intelligence audio audio-effect audio-generation datasets deep-learning machine-learning music-generation

Last synced: 27 Oct 2024

https://github.com/modelscope/funcodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

audio-generation audio-quantization codec encodec speech-synthesis speech-to-text tts voicecloning

Last synced: 30 Dec 2024

https://github.com/modelscope/FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

audio-generation audio-quantization codec encodec speech-synthesis speech-to-text tts voicecloning

Last synced: 11 Oct 2024

https://github.com/v-iashin/SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio audio-generation bmvc evaluation-metrics gan melgan multi-modal pytorch transformer vas vggsound video video-features video-understanding vqvae

Last synced: 06 Nov 2024

https://github.com/Yuan-ManX/audio-development-tools

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

artificial-intelligence audio audio-generation audio-processing deep-learning dsp machine-learning music music-generation signal-processing speech speech-processing speech-synthesis

Last synced: 27 Oct 2024

https://github.com/sony/bigvsan

Pytorch implementation of BigVSAN

audio-generation audio-synthesis gan neural-vocoder pytorch speech-synthesis

Last synced: 26 Dec 2024

https://github.com/archinetai/audio-data-pytorch

A collection of useful audio datasets and transforms for PyTorch.

artifical-intelligense audio-generation datasets deep-learning pytorch

Last synced: 24 Dec 2024

https://github.com/happylittlecat2333/Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

audio-generation diffusion diffusion-models large-language-models text-to-audio

Last synced: 14 Nov 2024

https://github.com/archinetai/audio-diffusion-pytorch-trainer

Trainer for audio-diffusion-pytorch

artificial-intelligence audio-generation deep-learning denoising-diffusion

Last synced: 05 Nov 2024

https://github.com/ilaria-manco/word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

ai-music audio-generation music-generation text-to-audio

Last synced: 22 Nov 2024

https://github.com/sony/soundctm

Pytorch implementation of SoundCTM

audio-generation diffusion-models pytorch text-to-audio

Last synced: 25 Dec 2024

https://github.com/rsxdalv/musicgen-prompts

Site for sharing MusicGen + AudioGen Prompts and Creations

ai audio-generation audiogen generator machine-learning musicgen

Last synced: 30 Nov 2024

https://github.com/Bai-YT/ConsistencyTTA

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

audio-generation audio-processing consistency-models diffusion-models ldm

Last synced: 26 Dec 2024

https://github.com/ljungang/awesome-omni-large-models-and-datasets

🔥 Omni large models and datasets for understanding and generating multi-modalities.

ai artificial-intelligence audio-generation audio-understanding awesome large-language-models llm omni paper-list video-generation video-understanding-dataset

Last synced: 22 Oct 2024

https://github.com/merekat/children-stories

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

ai audio-generation data-science image-generation large-language-models llama lora lux neural-networks stable-diffusion story text-generation tts xtts

Last synced: 10 Oct 2024

https://github.com/radoslawregula/voxg

Singing voice synthesizer using GANs

audio audio-generation audio-processing deep-learning gan generative-adversarial-network machine-learning music-programming music-technology python singing-voice-synthesis tensorflow voice-synthesis

Last synced: 20 Nov 2024

https://github.com/0x7o/deepmozart

Audio generation using diffusion models

audio-generation diffusion-models neural-network

Last synced: 16 Nov 2024

https://github.com/lucadellalib/bigvgan

A single-file implementation of BigVGAN generator

audio-generation audio-synthesis bigvgan music-synthesis neural-vocoder pytorch singing-voice-synthesis speech-synthesis

Last synced: 13 Dec 2024

https://github.com/simonbernarding/ohanashigpt-children-story-generation

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

ai audio-generation data-science image-generation large-language-model text-generation

Last synced: 22 Dec 2024

https://github.com/gregogiudici/knowledge-distillation_ddsp-decoder

Knowledge Distillation of different DDSP Decoders for audio signal generation

audio audio-generation ddsp gru pytorch pytorch-lightning s4 tcn

Last synced: 16 Nov 2024

https://github.com/ewdlop/ai-tools

Various AI Online Tools. AI is taking over.

ai audio-generation gpt img2img

Last synced: 27 Dec 2024

https://github.com/anas436/image-to-audio-app

Image Captioning and Text-to-Speech

audio-generation image-processing text-to-speech

Last synced: 06 Dec 2024

https://github.com/koppalexander/ohanashi-childgpt

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

ai audio-generation data-science flux generative-ai image-generation large-language-models llama lora neural-networks stable-diffusion story text-generation tts xtts

Last synced: 10 Oct 2024

https://github.com/iris2c/inspiremusic

InspireMusic: A Unified Framework for Music, Song and Audio Generation

audio-generation music-generation pytorch text-to-music

Last synced: 03 Dec 2024

https://github.com/work-nobu/ohanashigpt

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

ai audio-generation data-science image-generation large-language-models llama3 llamacpp lora low-rank-adaptation stable-diffusion text-generation xtts

Last synced: 22 Dec 2024