https://github.com/albertaparicio/bsc-thesis

BSc thesis: Voice Conversion using Deep Learning
https://github.com/albertaparicio/bsc-thesis

bachelor-thesis deep-learning keras python3 pytorch sequence-to-sequence speech-processing tensorflow voice-conversion

Last synced: 6 months ago
JSON representation

BSc thesis: Voice Conversion using Deep Learning

Host: GitHub
URL: https://github.com/albertaparicio/bsc-thesis
Owner: albertaparicio
License: other
Created: 2017-06-21T18:31:39.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-05-13T17:28:18.000Z (over 7 years ago)
Last Synced: 2025-03-24T02:02:58.707Z (7 months ago)
Topics: bachelor-thesis, deep-learning, keras, python3, pytorch, sequence-to-sequence, speech-processing, tensorflow, voice-conversion
Homepage: http://upcommons.upc.edu/handle/2117/105638
Size: 2.96 MB
Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

[![Download PDF](https://img.shields.io/badge/Download-PDF-brightgreen.svg)](http://upcommons.upc.edu/bitstream/handle/2117/105638/AparicioAlbert_FinalReport.pdf?sequence=1&isAllowed=y)

# BSc Thesis: Voice Conversion using Deep Learning

BSc thesis on applying Deep Learning to convert the speaker features on voice signals, presented in [Telecom BCN](http://etsetb.upc.edu/en), Barcelona on May 2017

## Abstract

In this project we present a first attempt at a Voice Conversion system based on Deep Learning in which the alignment between the training data is intrinsic to the model. Our system is structured in three main blocks. The first performs a vocoding of the speech (we have used Ahocoder for this task) and a normalization of the data. The second and main block consists of a Sequence-to-Sequence model. It consists of an RNN-based encoder-decoder structure with an Attention Mechanism. Its main strengts are the ability to process variable-length sequences, as well as aligning them internallly. The third block of the system performs a denormalization and reconstructs the speech signal. For the development of our system we have used the Voice Conversion Challenge 2016 dataset, as well as a part of the TC-STAR dataset. Unfortunately we have not obtained the results we expected. At the end of this thesis we present them and discuss some hypothesis to explain the reasons behind them.

## Contents

* Introduction
* State of the art in Voice Conversion
* Methodology
* Datasets
* Preparation of the datasets
* Proposed Models
* Baseline
* Sequence-to-Sequence
* Results

### Reference

Please cite this work if it is useful for your research:

```
@mastersthesis{aparicio2017voice,
author = {Aparicio Isarn, Albert},
title = {Voice Conversion using Deep Learning},
school = {Universitat Polit{\`e}cnica de Catalunya},
year = 2017,
month = 5
}
```

### Author:

Albert Aparicio Isarn ([e-mail](mailto:albert.aparicio.isarn@alu-etsetb.upc.edu))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/albertaparicio/bsc-thesis

Awesome Lists containing this project

README