https://github.com/idiap/asrt
Various scripts that facilitate the preparation of Automatic Speech Recognition related resources
https://github.com/idiap/asrt
Last synced: 11 months ago
JSON representation
Various scripts that facilitate the preparation of Automatic Speech Recognition related resources
- Host: GitHub
- URL: https://github.com/idiap/asrt
- Owner: idiap
- License: other
- Created: 2015-05-12T07:32:01.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2020-04-16T10:39:23.000Z (about 6 years ago)
- Last Synced: 2025-04-07T21:41:31.558Z (about 1 year ago)
- Language: Python
- Size: 4.1 MB
- Stars: 17
- Watchers: 10
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
README
======
Authors
-------
Alexandre Nanchen, Christine Marcel
Description
-----------
This is the README for the Automatic Speech Recognition Tools.
This project contains various scripts in order to facilitate the preparation of
ASR related tasks.
Current tasks are:
1. Sentences extraction from pdf files
2. Sentences classification by language
3. Sentences filtering and cleaning
Document sentences can be extracted into single document or batch mode.
For an example on how to extract sentences in batch mode, please have a
look at the `run_data_preparation_task.sh` script located in
`examples/bash` directory.
For an example on how to extract sentences in single document mode,
please have a look at the `run_data_preparation.sh` script located in
`examples/bash` directory.
There is also an API to be used in python code. It is located into the
common package and is called `DataPreparationAPI.py`