Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mbotsu/mlx_speech2text
Audio transcription using mlx whisper and vad silence processing
https://github.com/mbotsu/mlx_speech2text
mlx silero-vad whisper
Last synced: about 1 month ago
JSON representation
Audio transcription using mlx whisper and vad silence processing
- Host: GitHub
- URL: https://github.com/mbotsu/mlx_speech2text
- Owner: mbotsu
- License: mit
- Created: 2024-07-08T12:30:28.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-14T13:59:19.000Z (3 months ago)
- Last Synced: 2024-09-27T06:22:13.183Z (about 2 months ago)
- Topics: mlx, silero-vad, whisper
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Abstract
Transcription for Apple Silicon.Segmentation is performed to divide the sound source into small chunks, a sound source is created by removing silent parts for each chunk, and text is extracted.
## Install
```
$ git clone https://github.com/mbotsu/mlx_speech2text.git
$ pip install -r requirements.txt
```## Run
```
// convert to wav 16K
$ ffmpeg -i input.mp4 -ar 16000 out.wav// run
$ python speech2text.py -i out.wav -o track -v
```## References
- [ml-explore/mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
- [snakers4/silero-vad](https://github.com/snakers4/silero-vad)
- [Softcatala/whisper-ctranslate2](https://github.com/Softcatala/whisper-ctranslate2)
- [Segmenting a long audio file #295](https://github.com/snakers4/silero-vad/discussions/295)