An open API service indexing awesome lists of open source software.

https://github.com/idiap/asrt

Various scripts that facilitate the preparation of Automatic Speech Recognition related resources
https://github.com/idiap/asrt

Last synced: 11 months ago
JSON representation

Various scripts that facilitate the preparation of Automatic Speech Recognition related resources

Awesome Lists containing this project

README

          

README
======
Authors
-------
Alexandre Nanchen, Christine Marcel

Description
-----------
This is the README for the Automatic Speech Recognition Tools.

This project contains various scripts in order to facilitate the preparation of
ASR related tasks.

Current tasks are:

1. Sentences extraction from pdf files
2. Sentences classification by language
3. Sentences filtering and cleaning

Document sentences can be extracted into single document or batch mode.

For an example on how to extract sentences in batch mode, please have a
look at the `run_data_preparation_task.sh` script located in
`examples/bash` directory.

For an example on how to extract sentences in single document mode,
please have a look at the `run_data_preparation.sh` script located in
`examples/bash` directory.

There is also an API to be used in python code. It is located into the
common package and is called `DataPreparationAPI.py`