Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tuanad121/python-world
https://github.com/tuanad121/python-world
manifold-learning manifold-vocoder pitch pycharm python vae variational-autoencoder world-vocoder
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tuanad121/python-world
- Owner: tuanad121
- License: other
- Created: 2018-04-05T01:24:18.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-12-20T20:23:14.000Z (about 1 year ago)
- Last Synced: 2024-10-09T19:00:43.834Z (2 months ago)
- Topics: manifold-learning, manifold-vocoder, pitch, pycharm, python, vae, variational-autoencoder, world-vocoder
- Language: Python
- Size: 14.9 MB
- Stars: 150
- Watchers: 8
- Forks: 31
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# PYTHON WORLD VOCODER:
*************************************This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports *python 3.0* and later.
For technical detail, please check the [website](http://www.kki.yamanashi.ac.jp/~mmorise/world/english/).
# INSTALATION
*********************Python WORLD uses the following dependencies:
* numpy, scipy
* matplotlib
* numba
* simpleaudio (just for demonstration)Install python dependencies:
```
pip install -r requirements.txt
```Or import the project with [PyCharm](https://www.jetbrains.com/pycharm/) and open ```requirements.txt``` in PyCharm.
It will ask to install the missing libraries by itself.# EXAMPLE
**************The easiest way to run those examples is to import the ```Python-WORLD``` folder into PyCharm.
In ```example/prodosy.py```, there is an example of analysis/modification/synthesis with WORLD vocoder.
It has some examples of pitch, duration, spectrum modification.First, we read an audio file:
```python
from scipy.io.wavfile import read as wavread
fs, x_int16 = wavread(wav_path)
x = x_int16 / (2 ** 15 - 1) # to float
```Then, we declare a vocoder and encode the audio file:
```python
from world import main
vocoder = main.World()
# analysis
dat = vocoder.encode(fs, x, f0_method='harvest')
```in which, ```fs``` is sampling frequency and ```x``` is the speech signal.
The ```dat``` is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.
We can scale the pitch:
```python
dat = vocoder.scale_pitch(dat, 1.5)
```Be careful when you scale the pich because there is upper limit and lower limit.
We can make speech faster or slower:
```python
dat = vocoder.scale_duration(dat, 2)
```In ```test/speed.py```, we estimate the time of analysis.
To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable ```is_requiem=True```:
```python
# requiem analysis
dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)
```To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper `Using a Manifold Vocoder for Spectral Voice and Style Conversion`, check ```test/spectralFeatures.py```. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.
Check out [speech samples](https://tuanad121.github.io/samples/2019-09-15-Manifold/)# NOTE:
*********** The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency ```F0```. The centers of the windows are equally spaced with the distance of ```frame_period``` ms.
* The Fourier transform size (```fft_size```) is determined automatically using sampling frequency and the lowest value of F0 ```f0_floor```.
When you want to specify your own ```fft_size```, you have to use ```f0_floor = 3.0 * fs / fft_size```.
If you decrease ```fft_size```, the ```f0_floor``` increases. But, a high ```f0_floor``` might be not good for the analysis of male voices.* The F0 analysis ```Harvest``` is the slowest one. It's speeded up using ```numba``` and ```python multiprocessing```. The more cores you have, the faster it can become. However, you can use your own F0 analysis. In our case, we support 3 F0 analysis: ```DIO, HARVEST, and SWIPE'```
# CITATION:
If you find the code helpful and want to cite it, please use:
Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.
# CONTACT US
******************Post your questions, suggestions, and discussions to GitHub Issues.