https://github.com/ieasybooks/tafrigh
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
https://github.com/ieasybooks/tafrigh
asr automatic-speech-recognition ctranslate2 facebook faster-whisper javascript python soundcloud srt stable-whisper subtitles twitter vtt whisper youtube
Last synced: 7 months ago
JSON representation
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
- Host: GitHub
- URL: https://github.com/ieasybooks/tafrigh
- Owner: ieasybooks
- License: mit
- Created: 2023-03-20T14:38:37.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-02T14:23:40.000Z (8 months ago)
- Last Synced: 2025-05-16T18:05:52.051Z (7 months ago)
- Topics: asr, automatic-speech-recognition, ctranslate2, facebook, faster-whisper, javascript, python, soundcloud, srt, stable-whisper, subtitles, twitter, vtt, whisper, youtube
- Language: Python
- Homepage: https://tafrigh.ieasybooks.com
- Size: 631 KB
- Stars: 131
- Watchers: 1
- Forks: 18
- Open Issues: 5
-
Metadata Files:
- Readme: README.en.md
- License: LICENSE
Awesome Lists containing this project
README
[](README.md)
[](README.en.md)
Tafrigh
A tool to transcribe visual or audio materials into text. You can view examples transcribed using Tafrigh from here or from Baheth platform, all the transcribed content on it is transcribed using Tafrigh.
Note: If you want to use Tafrigh through JavaScript, take a look at this repository.
Features of Tafrigh
- Transcribing visual and audio materials into text using the latest AI technologies provided by OpenAI
- Ability to transcribe materials using wit.ai technologies provided by Facebook
- Download materials directly from YouTube, Facebook, Twitter, SoundCloud, and other sites
- Download visual content directly from YouTube, whether a single video or a complete playlist
- Provide various output formats like
txt,srt,vtt,csv,tsv, andjson
Requirements
- A strong GPU in your computer is recommended if using Whisper models
- Python version 3.10 or higher installed on your computer
-
FFmpeg installed on your computer -
yt-dlp installed on your computer
Installing Tafrigh
Using pip
You can install Tafrigh using pip with the command: pip install tafrigh[wit,whisper]
You can specify the dependencies you want to install based on the technology you want to use by writing wit or whisper in square brackets as shown in the previous command.
From the Source Code
- Download this repository by clicking on Code then Download ZIP or by executing the following command:
git clone git@github.com:ieasybooks/tafrigh.git - Extract the file if downloaded as ZIP and navigate to the project folder
- Execute the following command to install Tafrigh:
poetry install
Add -E wit or -E whisper to specify the dependencies to install.
Using Tafrigh
Available Options
-
Inputs
- Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example:
tafrigh "https://yout..." "https://yout..." "C:\Users\ieasybooks\leactue.wav" - Skip transcription if output exists: Use the
--skip_if_output_existoption to skip transcription if the required outputs already exist in the specified output folder - Number of download retries: If downloading a full playlist using the
yt-dlplibrary, some items may fail to download. The--download_retriesoption can be used to specify the number of retry attempts if a download fails. The default value is3 - Additional options for
yt-dlp: You can pass additional options to theyt-dlplibrary using the--yt_dlp_optionsoption in valid JSON format. For example, to download only the first 10 items from a playlist, pass--yt_dlp_options '{"playlist_items": "1-10"}'
- Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example:
-
Whisper Options
-
Model: You can specify the model using the--model_name_or_pathoption. Available models:
-
tiny.en(English only) -
tiny(least accurate) -
base.en(English only) base-
small.en(English only) -
small(default) -
medium.en(English only) mediumlarge-v1large-v2large-v3-
large(most accurate) - Whisper model name on HuggingFace Hub
- Path to a pre-downloaded Whisper model
- Path to a Whisper model converted using the
ct2-transformers-convertertool for use with the fast libraryfaster-whisper
-
-
Task: You can specify the task using the--taskoption. Available tasks:
-
transcribe: Convert speech to text (default) -
translate: Translate speech to text in English
-
- Language: You can specify the audio language using the
--languageoption. For example, to specify Arabic, passar. If not specified, the language will be detected automatically - Use faster version of Whisper models: By passing the
--use_faster_whisperoption, the faster version of Whisper models will be used - Beam size: You can improve results using the
--beam_sizeoption, which allows the model to search a wider range of words during text generation. The default value is5 -
Model compression type: You can specify the compression method used during the model conversion using thect2-transformers-convertertool by passing the--ct2_compute_typeoption. Available methods:
-
default(default) int8int8_float16int16float16
-
-
-
Wit Options
- Wit.ai keys: You can use wit.ai technologies to transcribe materials into text by passing your wit.ai client access tokens to the
--wit_client_access_tokensoption. If this option is passed, wit.ai will be used for transcription. Otherwise, Whisper models will be used - Maximum cutting duration: You can specify the maximum cutting duration, which will affect the length of sentences in SRT and VTT files, by passing the
--max_cutting_durationoption. The default value is15
- Wit.ai keys: You can use wit.ai technologies to transcribe materials into text by passing your wit.ai client access tokens to the
-
Outputs
- Merge segments: You can use the
--min_words_per_segmentoption to control the minimum number of words that can be in a single transcription segment. The default value is1. Pass0to disable this feature - Save original files before merging: Use the
--save_files_before_compactoption to save the original files before merging segments based on the--min_words_per_segmentoption - Save yt-dlp library responses: You can save the yt-dlp library responses in JSON format by passing the
--save_yt_dlp_responsesoption - Output sample segments: You can pass a value to the
--output_sampleoption to get a random sample of all transcribed segments from each material after merging based on the--min_words_per_segmentoption. The default value is0, meaning no samples will be output -
Output formats: You can specify the output formats using the--output_formatsoption. Available formats:
txtsrtvttcsvtsvjson-
all(default) -
none(No file will be created if this format is passed)
- Output folder: You can specify the output folder using the
--output_diroption. By default, the current folder will be the output folder if not specified
- Merge segments: You can use the
```
➜ tafrigh --help
usage: tafrigh [-h] [--version] [--skip_if_output_exist | --no-skip_if_output_exist] [--download_retries DOWNLOAD_RETRIES] [--yt_dlp_options YT_DLP_OPTIONS] [--verbose | --no-verbose] [-m MODEL_NAME_OR_PATH] [-t {transcribe,translate}]
[-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}]
[--use_faster_whisper | --no-use_faster_whisper] [--beam_size BEAM_SIZE] [--ct2_compute_type {default,int8,int8_float16,int16,float16}] [-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]] [--max_cutting_duration [1-17]]
[--min_words_per_segment MIN_WORDS_PER_SEGMENT] [--save_files_before_compact | --no-save_files_before_compact] [--save_yt_dlp_responses | --no-save_yt_dlp_responses] [--output_sample OUTPUT_SAMPLE]
[-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]] [-o OUTPUT_DIR]
urls_or_paths [urls_or_paths ...]
options:
-h, --help show this help message and exit
--version show program's version number and exit
Input:
urls_or_paths Video/Playlist URLs or local folder/file(s) to transcribe.
--skip_if_output_exist, --no-skip_if_output_exist
Whether to skip generating the output if the output file already exists.
--download_retries DOWNLOAD_RETRIES
Number of retries for yt-dlp downloads that fail.
--yt_dlp_options YT_DLP_OPTIONS
Additional options to pass to yt-dlp in valid JSON format (e.g. `'{"playlist_items": "1-10"}'`).
--verbose, --no-verbose
Whether to print out the progress and debug messages.
Whisper:
-m MODEL_NAME_OR_PATH, --model_name_or_path MODEL_NAME_OR_PATH
Name or path of the Whisper model to use.
-t {transcribe,translate}, --task {transcribe,translate}
Whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate').
-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}, --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}
Language spoken in the audio, skip to perform language detection.
--use_faster_whisper, --no-use_faster_whisper
Whether to use Faster Whisper implementation.
--beam_size BEAM_SIZE
Number of beams in beam search, only applicable when temperature is zero.
--ct2_compute_type {default,int8,int8_float16,int16,float16}
Quantization type applied while converting the model to CTranslate2 format.
Wit:
-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...], --wit_client_access_tokens WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]
List of wit.ai client access tokens. If provided, wit.ai APIs will be used to do the transcription, otherwise whisper will be used.
--max_cutting_duration [1-17]
The maximum allowed cutting duration. It should be between 1 and 17.
Output:
--min_words_per_segment MIN_WORDS_PER_SEGMENT
The minimum number of words should appear in each transcript segment. Any segment have words count less than this threshold will be merged with the next one. Pass 0 to disable this behavior.
--save_files_before_compact, --no-save_files_before_compact
Saves the output files before applying the compact logic that is based on --min_words_per_segment.
--save_yt_dlp_responses, --no-save_yt_dlp_responses
Whether to save the yt-dlp library JSON responses or not.
--output_sample OUTPUT_SAMPLE
Samples random compacted segments from the output and generates a CSV file contains the sampled data. Pass 0 to disable this behavior.
-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...], --output_formats {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]
Format of the output file; if not specified, all available formats will be produced.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to save the outputs.
```
Transcription from command line
Transcribing using Whisper models
Transcribing a single material
```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Transcribing a full playlist
```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Transcribing multiple materials
```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Speeding up the transcription process
You can use the faster_whisper library, which provides faster transcription, by passing the --use_faster_whisper option as follows:
```bash
tafrigh "https://youtu.be/3K5Jh_-UYeA" \
--model_name_or_path large \
--task transcribe \
--language ar \
--use_faster_whisper \
--output_dir . \
--output_formats txt srt
```
Transcribing using wit.ai technology
Transcribing a single material
```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing a full playlist
```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing multiple materials
```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing using code
You can use Tafrigh through code as follows:
```python
from tafrigh import farrigh, Config
if __name__ == '__main__':
config = Config(
input=Config.Input(
urls_or_paths=['https://youtu.be/qFsUwp5iomU'],
skip_if_output_exist=False,
download_retries=3,
yt_dlp_options='{}',
verbose=False,
),
whisper=Config.Whisper(
model_name_or_path='tiny',
task='transcribe',
language='ar',
use_faster_whisper=True,
beam_size=5,
ct2_compute_type='default',
),
wit=Config.Wit(
wit_client_access_tokens=[],
max_cutting_duration=10,
),
output=Config.Output(
min_words_per_segment=10,
save_files_before_compact=False,
save_yt_dlp_responses=False,
output_sample=0,
output_formats=['txt', 'srt'],
output_dir='.',
),
)
for progress in farrigh(config):
print(progress)
```
The farrigh function is a generator that produces the current transcription state and the progress of the process. If you do not need to track this, you can skip the loop by using deque as follows:
```python
from collections import deque
from tafrigh import farrigh, Config
if __name__ == '__main__':
config = Config(...)
deque(farrigh(config), maxlen=0)
```
Transcribing using Docker
If you have Docker on your computer, the easiest way to use Tafrigh is through Docker. The following command downloads the Tafrigh Docker image and transcribes a YouTube material using wit.ai technologies, outputting the results in the current folder:
```bash
docker run -it --rm -v "$PWD:/tafrigh" ghcr.io/ieasybooks/tafrigh \
"https://www.youtube.com/watch?v=qFsUwp5iomU" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
-f txt srt
```
You can pass any option from the Tafrigh library options mentioned above.
There are multiple Docker images you can use for Tafrigh based on the dependencies you want to use:
-
ghcr.io/ieasybooks/tafrigh: Contains dependencies for both wit.ai technologies and Whisper models -
ghcr.io/ieasybooks/tafrigh-whisper: Contains dependencies for Whisper models only -
ghcr.io/ieasybooks/tafrigh-wit: Contains dependencies for wit.ai technologies only
One drawback is that Whisper models cannot use your computer's GPU when used through Docker, which is something we are working on resolving in the future.
A significant part of this project is based on the yt-whisper repository to achieve Tafrigh faster.