https://github.com/ysdede/asrtk
An open-source Python toolkit designed to streamline the development and enhancement of ASR systems.
https://github.com/ysdede/asrtk
asr ml tts
Last synced: 4 months ago
JSON representation
An open-source Python toolkit designed to streamline the development and enhancement of ASR systems.
- Host: GitHub
- URL: https://github.com/ysdede/asrtk
- Owner: ysdede
- License: mit
- Created: 2024-05-13T16:37:54.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-11-27T21:56:55.000Z (7 months ago)
- Last Synced: 2025-11-30T10:24:25.242Z (7 months ago)
- Topics: asr, ml, tts
- Language: Python
- Homepage:
- Size: 270 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ASRTK: Automatic Speech Recognition Toolkit
ASRTK is a Python toolkit focused on collecting and processing high-quality audio datasets for training Automatic Speech Recognition (ASR) systems. It provides a streamlined workflow and utilities for downloading, cleaning, and preparing audio data with accurate transcriptions.
## Key Features
### Data Collection & Preparation
- **YouTube Integration**: Automated download of videos with subtitles from playlists
- **Audio Processing**: Advanced audio splitting, conversion, and resampling capabilities
- **Subtitle Processing**: Comprehensive VTT file handling with support for cleaning and normalization
### Text Processing & Analysis
- **Text Normalization**: Intelligent handling of abbreviations, numbers, and special characters
- **Pattern Analysis**: Tools for analyzing word frequencies and text patterns
### Audio Processing
- **Forced Alignment**: Precise audio-text alignment using Silero VAD models
- **Format Optimization**: Efficient audio format conversion with multi-threading support
- **Quality Control**: Advanced handling of silence detection and failed alignments
### Workflow Automation
- **Batch Processing**: Efficient handling of large datasets
- **Pipeline Integration**: Seamless workflow from data collection to model training
- **Progress Tracking**: Detailed logging and progress monitoring
## Requirements
- Python 3.9 or higher
- PyTorch 1.7 or higher
- NumPy
- SciPy
- librosa
- pydub
- transformers
- BeautifulSoup4
- Additional dependencies in `requirements.txt`
## Installation
```bash
# Clone the repository
git clone https://github.com/ysdede/asrtk.git
# Navigate to project directory
cd asrtk
# Install dependencies
pip install -r requirements.txt
```
## Quick Start Guide
### 1. Download Training Data
```bash
# Download from a single playlist
asrtk download-playlist ./dataset "https://www.youtube.com/playlist?list=PLAYLIST_ID"
# Or download from multiple playlists
asrtk download-playlist ./dataset --playlist-file playlists.txt
```
### 2. Process Audio Files
```bash
# Split audio into chunks based on subtitles
asrtk split ./dataset ./chunks --tolerance 500 -fm --restore-punctuation
# Convert to optimized format
asrtk convert-opus ./chunks --remove-original --workers 8
```
### 3. Clean and Normalize Text
```bash
# Apply text normalization rules
asrtk apply-patch rules.json ./chunks
# Analyze text patterns
asrtk create-wordset ./chunks --min-frequency 5
```
## Advanced Usage
### Audio Processing Options
```bash
asrtk split INPUT_DIR OUTPUT_DIR [OPTIONS]
--tolerance MILLISECONDS Alignment tolerance
--force-merge, -fm Merge consecutive subtitles
--restore-punctuation Use BERT for punctuation restoration
--keep-effects Preserve effect annotations
```
### Text Processing Features
```bash
asrtk apply-patch RULES INPUT_DIR [OPTIONS]
--dry-run Preview changes
--show-replacements Display detailed replacement plan
--verbose Show per-file changes
```
### Batch Processing
```bash
asrtk convert-opus INPUT_DIR [OPTIONS]
--input-type FORMAT Source audio format
--remove-original Delete source files after conversion
--workers NUMBER Parallel processing threads
```
## Project Structure
```
asrtk/
├── dataset/ # Downloaded content
│ ├── json_info/ # Metadata cache
│ └── content/ # Audio and subtitle files
├── chunks/ # Processed audio segments
└── output/ # Final processed dataset
```
## Contributing
We welcome contributions! Please feel free to submit pull requests or open issues for:
- Bug fixes
- Feature additions
- Documentation improvements
- Performance optimizations
## License
This project is licensed under the MIT License - see the LICENSE file for details.