https://github.com/vtuber-plan/deepaudio

State-of-the-art Audio Machine Learning Models
https://github.com/vtuber-plan/deepaudio

Last synced: 6 months ago
JSON representation

State-of-the-art Audio Machine Learning Models

Host: GitHub
URL: https://github.com/vtuber-plan/deepaudio
Owner: vtuber-plan
License: mit
Created: 2023-04-14T16:54:36.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-02-06T04:49:29.000Z (8 months ago)
Last Synced: 2025-03-24T17:11:06.040Z (7 months ago)
Language: Python
Homepage:
Size: 126 KB
Stars: 5
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # deepaudio

Deepaudio is a collection of advanced machine learning speech models based on PyTorch.

Given the success of Transformers in the field of NLP and its provision of convenient APIs and tools for easily downloading and training state-of-the-art pre-trained models, deepaudio serves as a complementary addition in the domain of speech.

It provides a similar interface and can be published on the Hugging Face Hub. The implemented model types in deepaudio include:

* ASR (Automatic Speech Recognition)

* TTS (Text-to-Speech): VITS

* Vocoder: HifiGAN, MelGAN

* F0

* Content Encoder

* Speaker Encoder

## Installation

### Method 1: With pip

```bash

pip install deepaudio

```

or:

```bash

pip install git+https://github.com/vtuber-plan/deepaudio.git 

```

### Method 2: From source

1. Clone this repository

```bash

git clone https://github.com/vtuber-plan/deepaudio.git

cd deepaudio

```

2. Install the Package

```bash

pip install --upgrade pip

pip install .

```

## Usage

Use HifiGAN to convert mel to wav:

```python

import torchaudio

import torchaudio.transforms as T

# Load audio

wav, sr = torchaudio.load("zszy_48k.wav")

assert sr == 48000

from deepaudio.pipelines import MelPipeline

audio_pipeline = MelPipeline(freq=48000, n_fft=2048, n_mel=128, win_length=2048, hop_length=512)

from deepaudio.models.vocoders.hifigan.configuration_hifigan import HifiGANConfig

from deepaudio.models.vocoders.hifigan.modeling_hifigan import HifiGAN, HifiGANPipeline

hifigan_48k = HifiGAN.from_pretrained("vtb-plan/hifigan-48k")

mel = audio_pipeline(wav)

out = hifigan_48k(mel)

```

## License

deepaudio is under the MIT License. It is free for both research and commercial use cases.

## Acknowledgement

- [Amphion](https://github.com/open-mmlab/Amphion)

- [HiFi-GAN](https://github.com/jik876/hifi-gan)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vtuber-plan/deepaudio

Awesome Lists containing this project

README