Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/densinh/whisperx-diarization
Diarization with whisperx with caching and a simple interface
https://github.com/densinh/whisperx-diarization
Last synced: about 2 months ago
JSON representation
Diarization with whisperx with caching and a simple interface
- Host: GitHub
- URL: https://github.com/densinh/whisperx-diarization
- Owner: DenSinH
- Created: 2024-06-21T09:34:20.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-10-24T21:21:36.000Z (3 months ago)
- Last Synced: 2024-10-26T08:57:50.680Z (3 months ago)
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WhisperX Diarization
This repository contains a simple program to diarize (long)
Dutch audio files. I wrote this to help my mom, so there
are some extra batch files for easier setup and running.Basically, the `diarize.py` script is the main script,
that does all of the diarization. It can be called
from the command line as well, just look at `run.bat`.Then there is `interface.py`, which simply starts
`diarize.py` in a subprocess, with specific parameters.`interface.bat` runs `interface.py` in a virtual environment
if it was found, or in the global interpreter if no virtual
environment is setup in the current directory.## Setup
Simply install the requirements from `requirements.txt`.
If you have a CUDA-enabled GPU (with the appropriate drivers),
you can then run `install-torch-cuda.bat` to install a CUDA-compiled
version of `torch`, which significantly
speeds up the transcription process.You will need to create a HuggingFaceHub account, and a READ enabled
access token, to set as the `HF_TOKEN` environment variable (either
directly, or in a `.env` file). This is to gain access to the
diarization model:
[https://huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
(Clicking that link will prompt you to sign up, do this and go to Profile picture > Settings > Access tokens
and create a new access token with READ access). You may need to go back to the model link above,
and validate some form, but I am not entirely sure.## Patches
There are some patches applied to `whisperx` in `diarize.py`,
which are used to print the progress of the current step.
Otherwise, it is kind of a black box, and there user has
no idea how far along the script is.## Output
The script produces two files:
- `.log` containing the log of the diarizion process.
This includes a full transcript WITHOUT speaker recognition, regardless
of whether you selected speaker recognition or not.
- `.txt` containing the full transcript, either with or
without speaker recognition, depending on your selection.## Debugging
Some problems I encountered setting up.
### 'cublas64_12.dll' not found
If for some reason this DLL (`cublas64_12.dll`) is not found, I added its folder to the `PATH`.
The folder should be something like
`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin`### Only !!!!! output
So apparently the ! token is 0, meaning something is going wrong in the transcription.
In this case, you should try lowering the `--batch-size` parameter in `interface.py`,
or in your command line call to `diarize.py`.