Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kennethleungty/Text-to-Audio-with-Bark
Exploring Bark, the Open-Source Text-to-Audio Generative Model
https://github.com/kennethleungty/Text-to-Audio-with-Bark
ai artificial-intelligence bark data-science deep-learning gen-ai generative-ai machine-learning prompt-engineering speech text-prompt text-to-audio text-to-music text-to-sound text-to-speech
Last synced: 23 days ago
JSON representation
Exploring Bark, the Open-Source Text-to-Audio Generative Model
- Host: GitHub
- URL: https://github.com/kennethleungty/Text-to-Audio-with-Bark
- Owner: kennethleungty
- License: mit
- Created: 2023-09-28T04:57:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-10T16:55:13.000Z (about 1 year ago)
- Last Synced: 2024-08-01T02:35:08.955Z (4 months ago)
- Topics: ai, artificial-intelligence, bark, data-science, deep-learning, gen-ai, generative-ai, machine-learning, prompt-engineering, speech, text-prompt, text-to-audio, text-to-music, text-to-sound, text-to-speech
- Language: Jupyter Notebook
- Homepage: https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a
- Size: 2.67 MB
- Stars: 15
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Exploring Text-to-Audio with Bark
Link to article: https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a
## Context
- Amidst the transformative surge of generative AI, text-to-audio models are emerging as one of the most promising frontiers.
- These advances are not just about converting text to speech, but also about crafting audio experiences that are indistinguishable from human-produced content.
- From audiobooks narrated in any voice imaginable to dynamic music compositions prompted by mere sentences, the potential applications are vast and captivating.
- In this article, we delve into the capabilities and technical intricacies of Bark, an open-source text-prompted audio generation model in Python.___
## Introducing Bark
Bark is a transformer-based text-to-audio model capable of generating realistic multilingual speech, music, and sound effects. It is created by Suno, a research-driven company that develops cutting-edge audio AI.
As Bark was developed for research purposes, its pre-trained model checkpoints have been made open-source and available for commercial use, which is a valuable contribution to the generative AI community.___
### References
- https://github.com/suno-ai/bark
- https://audiocraft.metademolab.com/encodec.html
- https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=74487
- https://towardsdatascience.com/optimizing-vector-quantization-methods-by-machine-learning-algorithms-77c436d0749d
- https://www.assemblyai.com/blog/what-is-residual-vector-quantization/
- https://github.com/facebookresearch/encodec
- https://ai.meta.com/blog/ai-powered-audio-compression-technique/
- https://arxiv.org/abs/2210.13438
- https://github.com/facebookresearch/encodec#extracting-discrete-representations
- https://paperswithcode.com/paper/speaker-anonymization-using-neural-audio
- https://huggingface.co/suno/bark/tree/main/speaker_embeddings/v2