https://github.com/mbotsu/mlx_speech2text

Audio transcription using mlx whisper and vad silence processing
https://github.com/mbotsu/mlx_speech2text

mlx silero-vad whisper

Last synced: 8 months ago
JSON representation

Audio transcription using mlx whisper and vad silence processing

Host: GitHub
URL: https://github.com/mbotsu/mlx_speech2text
Owner: mbotsu
License: mit
Created: 2024-07-08T12:30:28.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-14T15:26:27.000Z (12 months ago)
Last Synced: 2025-01-30T18:05:35.332Z (9 months ago)
Topics: mlx, silero-vad, whisper
Language: Python
Homepage:
Size: 18.6 KB
Stars: 12
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Abstract
Transcription for Apple Silicon.

Segmentation is performed to divide the sound source into small chunks, a sound source is created by removing silent parts for each chunk, and text is extracted.

## Install
```
$ git clone https://github.com/mbotsu/mlx_speech2text.git
$ pip install -r requirements.txt
```

## Run
```
// convert to wav 16K
$ ffmpeg -i input.mp4 -ar 16000 out.wav

// run
$ python speech2text.py -i out.wav -o track -v
```

## References
- [ml-explore/mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
- [snakers4/silero-vad](https://github.com/snakers4/silero-vad)
- [Softcatala/whisper-ctranslate2](https://github.com/Softcatala/whisper-ctranslate2)
- [Segmenting a long audio file #295](https://github.com/snakers4/silero-vad/discussions/295)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mbotsu/mlx_speech2text

Awesome Lists containing this project

README