https://github.com/kennethleungty/Text-to-Audio-with-Bark

Exploring Bark, the Open-Source Text-to-Audio Generative Model
https://github.com/kennethleungty/Text-to-Audio-with-Bark

ai artificial-intelligence bark data-science deep-learning gen-ai generative-ai machine-learning prompt-engineering speech text-prompt text-to-audio text-to-music text-to-sound text-to-speech

Last synced: 7 months ago
JSON representation

Exploring Bark, the Open-Source Text-to-Audio Generative Model

Host: GitHub
URL: https://github.com/kennethleungty/Text-to-Audio-with-Bark
Owner: kennethleungty
License: mit
Created: 2023-09-28T04:57:01.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-10-10T16:55:13.000Z (almost 2 years ago)
Last Synced: 2025-03-04T08:51:18.874Z (7 months ago)
Topics: ai, artificial-intelligence, bark, data-science, deep-learning, gen-ai, generative-ai, machine-learning, prompt-engineering, speech, text-prompt, text-to-audio, text-to-music, text-to-sound, text-to-speech
Language: Jupyter Notebook
Homepage: https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a
Size: 2.67 MB
Stars: 15
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Exploring Text-to-Audio with Bark

Link to article: https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a

## Context
- Amidst the transformative surge of generative AI, text-to-audio models are emerging as one of the most promising frontiers.
- These advances are not just about converting text to speech, but also about crafting audio experiences that are indistinguishable from human-produced content.
- From audiobooks narrated in any voice imaginable to dynamic music compositions prompted by mere sentences, the potential applications are vast and captivating.
- In this article, we delve into the capabilities and technical intricacies of Bark, an open-source text-prompted audio generation model in Python.

___

## Introducing Bark
Bark is a transformer-based text-to-audio model capable of generating realistic multilingual speech, music, and sound effects. It is created by Suno, a research-driven company that develops cutting-edge audio AI.
As Bark was developed for research purposes, its pre-trained model checkpoints have been made open-source and available for commercial use, which is a valuable contribution to the generative AI community.

___

### References
- https://github.com/suno-ai/bark
- https://audiocraft.metademolab.com/encodec.html
- https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=74487
- https://towardsdatascience.com/optimizing-vector-quantization-methods-by-machine-learning-algorithms-77c436d0749d
- https://www.assemblyai.com/blog/what-is-residual-vector-quantization/
- https://github.com/facebookresearch/encodec
- https://ai.meta.com/blog/ai-powered-audio-compression-technique/
- https://arxiv.org/abs/2210.13438
- https://github.com/facebookresearch/encodec#extracting-discrete-representations
- https://paperswithcode.com/paper/speaker-anonymization-using-neural-audio
- https://huggingface.co/suno/bark/tree/main/speaker_embeddings/v2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kennethleungty/Text-to-Audio-with-Bark

Awesome Lists containing this project

README