Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/twerkmeister/iLID
automatic spoken language identification
https://github.com/twerkmeister/iLID
Last synced: about 2 months ago
JSON representation
automatic spoken language identification
- Host: GitHub
- URL: https://github.com/twerkmeister/iLID
- Owner: twerkmeister
- License: mit
- Created: 2015-10-29T10:02:32.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2018-11-19T17:39:40.000Z (about 6 years ago)
- Last Synced: 2024-08-04T23:10:41.161Z (5 months ago)
- Language: Python
- Size: 3.13 MB
- Stars: 90
- Watchers: 13
- Forks: 24
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-CoreML-Models - iLID
README
# iLID
Automatic spoken language identification (LID) using deep learning.## Motivation
We wanted to classify the spoken language within audio files, a process that usually serves as the first step for NLP or speech transcription.We used two deep learning approaches using the Tensorflow and Caffe frameworks for different model configuration.
## Repo Structure
- **/data**
- Scripts to download training data from Voxforge and Youtube. For usage details see below.
- **/Evaluation**
- Prediction scripts for single audio files or list of files using Caffe
- **/Preprocessing**
- Includes all scripts to convert a WAV audio file into spectrogram and mel-filter spectrogram images using a Spark Pipeline.
- All scripts to create/extract the audio features
- To convert a directory of WAV audio files using the Spark pipeline run: `./run.sh --inputPath {input_path} --outputPath {output_path} | tee sparkline.log -`
- **/models**
- All our Caffe models: Berlin_net, Topcoder, VGG_M
- Berlin_net: 3Conv + Batch Normalisation, 2 FullyConnected Layer (Shallow Architecture)
- Topcoder_net: (Deep Architecture) inspired by [Topcoder's spoken language identification challenge](https://yerevann.github.io/2015/10/11/spoken-language-identification-with-deep-convolutional-networks/)
- Finetuning of [VGG_M](http://www.robots.ox.ac.uk/~vgg/research/deep_eval/)
- **/tensorflow**
- All the code for setting up and training various models with Tensorflow.
- Includes training and prediction script. See `train.py` and `predict.py`.
- Configure your learning parameters in `config.yaml`.
- Add or change network under `/tensorflow/networks/instances/`.
- **/tools**
- Some handy scripts to clean filenames, normalize audio files and other stuff.
- **/webserver**
- A web demo to upload audio files for prediction.
- See the included README
## Requirements
- Caffe
- TensorFlow
- Spark
- Python 2.7
- OpenCV 2.4+
- youtube_dl```
// Install additional Python requirements
pip install -r requirements.txt
pip install youtube_dl
```## Datasets
Downloads training data / audio samples from various sources.#### Voxforge
- Downloads the audio samples from www.voxforge.org for some languages
```bash
/data/voxforge/download-data.sh
/data/voxforge/extract_tgz.sh {path_to_german.tgz} german
```#### Youtube
- Downloads various news channels from Youtube.
- Configure channels/sources in `youtube/sources.yml````python
python /data/youtube/download.py
```## Models
We trained models for 2/4 languages (English, German, French, Spanish).
#### Best Performing Models
The top scoring networks were trained with 15.000 images per languages, a batch size of 64, and a learning rate of 0.001 that was decayed to 0.0001 after 7.000 iterations.[Shallow Network EN/DE](https://github.com/twerkmeister/iLID/blob/master/models/Berlin_net/net_mel_2lang_bn.prototxt)
[Shallow Network EN/DE/FR/ES](https://github.com/twerkmeister/iLID/blob/master/models/Berlin_net/net_mel_4lang_bn.prototxt)
#### Training
```
// Caffe:
/models/{model_name}/training.sh
``````
// Tensorflow:
python /tensorflow/train.py
```#### Labels
```
0 English,
1 German,
2 French,
3 Spanish
```## Training Data
For training we used both the public [Voxforge](http://www.voxforge.org/) dataset and downloaded news reel videos from Youtube. Check out the */data* directory for download scripts.