https://github.com/olaviinha/neuraltexttoaudio
Text prompt steered synthetic audio generators
https://github.com/olaviinha/neuraltexttoaudio
audio audio-generation audio-processing audio-synthesis audioldm colab colab-notebook mubert mubertai music-generation text2audio text2music voice-cloning voice-synthesis
Last synced: 3 months ago
JSON representation
Text prompt steered synthetic audio generators
- Host: GitHub
- URL: https://github.com/olaviinha/neuraltexttoaudio
- Owner: olaviinha
- Created: 2022-12-21T22:19:49.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-02T15:19:34.000Z (over 1 year ago)
- Last Synced: 2023-12-02T16:27:29.435Z (over 1 year ago)
- Topics: audio, audio-generation, audio-processing, audio-synthesis, audioldm, colab, colab-notebook, mubert, mubertai, music-generation, text2audio, text2music, voice-cloning, voice-synthesis
- Language: Jupyter Notebook
- Homepage:
- Size: 337 KB
- Stars: 34
- Watchers: 2
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Colab notebooks for text-to-audio generators
User-friendly Colab notebooks for various text prompt steered synthetic audio generators.
**Available notebooks:**
- [AudioLDM](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/AudioLDM_pub.ipynb) – _text-to-audio_
- [TorToiSe TTS](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/tortoise_tts_pub.ipynb) – _text-to-speech w/ voice-cloning_
- [MubertAI Text-to-Music](https://colab.research.google.com/github/olaviinha/NeuralTextToMusic/blob/main/mubert_txt2music.ipynb) – _text-to-music_
- [TTS Voice Cloning](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/TTS_voice_cloning_pub.ipynb) – _text-to-speech w/ voice-cloning_---
## AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
[](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/AudioLDM_pub.ipynb)Paper: [Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503)
Colab for [AudioLDM](https://github.com/haoheliu/AudioLDM). Generates audio based on text description. This is probably the beginning of _"Stable Diffusion of audio"_. Currently capable of producing 16 kHz audio only.
---
## TorToiSe: Text-to-speech
[](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/tortoise_tts_pub.ipynb)Paper: [TorToiSe - Spending Compute for High Quality TTS](https://docs.google.com/document/d/13O_eyY65i6AkNrN_LdPhpUjGhyTNKYHvDrIvHnHe1GA)
Colab for [TorToiSe](https://github.com/neonbjb/tortoise-tts) text-to-speech voice-cloning. This notebook takes a text string and an audio file (or files) of a speaker's voice, and attempts to synthesize the text using the given voice. Currently works with English text only.
---
## MubertAI Text-to-Music
UPDATE: it seems like Mubert API now requires (paid) API key.
[](https://colab.research.google.com/github/olaviinha/NeuralTextToMusic/blob/main/mubert_txt2music.ipynb)
Colab for [MubertAI Text-to-Music](https://github.com/MubertAI/Mubert-Text-to-Music). Generates music using predefined blocks created by the community (afaik) based on text description. See the [source repository](https://github.com/MubertAI/Mubert-Text-to-Music) for information, such as licensing.
---
## TTS Voice Cloning
[](https://colab.research.google.com/github/olaviinha/NeuralTextToAudio/blob/main/TTS_voice_cloning_pub.ipynb)
Paper: [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf)
Colab for [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning) text-to-speech voice-cloning. This notebook takes a text string and an audio file of a speaker's voice, and attempt to synthesize the text using the given voice. Fair warning: results are not great.