Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ieasybooks/tafrigh
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
https://github.com/ieasybooks/tafrigh
asr automatic-speech-recognition ctranslate2 facebook faster-whisper javascript python soundcloud srt stable-whisper subtitles twitter vtt whisper youtube
Last synced: 7 days ago
JSON representation
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
- Host: GitHub
- URL: https://github.com/ieasybooks/tafrigh
- Owner: ieasybooks
- License: mit
- Created: 2023-03-20T14:38:37.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-06T19:37:47.000Z (15 days ago)
- Last Synced: 2025-01-08T02:06:16.422Z (14 days ago)
- Topics: asr, automatic-speech-recognition, ctranslate2, facebook, faster-whisper, javascript, python, soundcloud, srt, stable-whisper, subtitles, twitter, vtt, whisper, youtube
- Language: Python
- Homepage: https://tafrigh.ieasybooks.com
- Size: 458 KB
- Stars: 116
- Watchers: 1
- Forks: 17
- Open Issues: 4
-
Metadata Files:
- Readme: README.en.md
- License: LICENSE
Awesome Lists containing this project
README
[![ar](https://img.shields.io/badge/lang-ar-brightgreen.svg)](README.md)
[![en](https://img.shields.io/badge/lang-en-red.svg)](README.en.md)Tafrigh
A tool to transcribe visual or audio materials into text. You can view examples transcribed using Tafrigh from here or from Baheth platform, all the transcribed content on it is transcribed using Tafrigh.
Note: If you want to use Tafrigh through JavaScript, take a look at this repository.
Features of Tafrigh
- Transcribing visual and audio materials into text using the latest AI technologies provided by OpenAI
- Ability to transcribe materials using wit.ai technologies provided by Facebook
- Download materials directly from YouTube, Facebook, Twitter, SoundCloud, and other sites
- Download visual content directly from YouTube, whether a single video or a complete playlist
- Provide various output formats like
txt
,srt
,vtt
,csv
,tsv
, andjson
Requirements
- A strong GPU in your computer is recommended if using Whisper models
- Python version 3.10 or higher installed on your computer
-
FFmpeg installed on your computer -
yt-dlp installed on your computer
Installing Tafrigh
Using pip
You can install Tafrigh using pip
with the command: pip install tafrigh[wit,whisper]
You can specify the dependencies you want to install based on the technology you want to use by writing wit
or whisper
in square brackets as shown in the previous command.
From the Source Code
- Download this repository by clicking on Code then Download ZIP or by executing the following command:
git clone [email protected]:ieasybooks/tafrigh.git
- Extract the file if downloaded as ZIP and navigate to the project folder
- Execute the following command to install Tafrigh:
poetry install
Add -E wit
or -E whisper
to specify the dependencies to install.
Using Tafrigh
Available Options
-
Inputs
- Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example:
tafrigh "https://yout..." "https://yout..." "C:\Users\ieasybooks\leactue.wav"
- Skip transcription if output exists: Use the
--skip_if_output_exist
option to skip transcription if the required outputs already exist in the specified output folder - Number of download retries: If downloading a full playlist using the
yt-dlp
library, some items may fail to download. The--download_retries
option can be used to specify the number of retry attempts if a download fails. The default value is3
- Additional options for
yt-dlp
: You can pass additional options to theyt-dlp
library using the--yt_dlp_options
option in valid JSON format. For example, to download only the first 10 items from a playlist, pass--yt_dlp_options '{"playlist_items": "1-10"}'
- Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example:
-
Whisper Options
-
Model: You can specify the model using the--model_name_or_path
option. Available models:
-
tiny.en
(English only) -
tiny
(least accurate) -
base.en
(English only) base
-
small.en
(English only) -
small
(default) -
medium.en
(English only) medium
large-v1
large-v2
large-v3
-
large
(most accurate) - Whisper model name on HuggingFace Hub
- Path to a pre-downloaded Whisper model
- Path to a Whisper model converted using the
ct2-transformers-converter
tool for use with the fast libraryfaster-whisper
-
-
Task: You can specify the task using the--task
option. Available tasks:
-
transcribe
: Convert speech to text (default) -
translation
: Translate speech to text in English
-
- Language: You can specify the audio language using the
--language
option. For example, to specify Arabic, passar
. If not specified, the language will be detected automatically - Use faster version of Whisper models: By passing the
--use_faster_whisper
option, the faster version of Whisper models will be used - Beam size: You can improve results using the
--beam_size
option, which allows the model to search a wider range of words during text generation. The default value is5
-
Model compression type: You can specify the compression method used during the model conversion using thect2-transformers-converter
tool by passing the--ct2_compute_type
option. Available methods:
-
default
(default) int8
int8_float16
int16
float16
-
-
-
Wit Options
- Wit.ai keys: You can use wit.ai technologies to transcribe materials into text by passing your wit.ai client access tokens to the
--wit_client_access_tokens
option. If this option is passed, wit.ai will be used for transcription. Otherwise, Whisper models will be used - Maximum cutting duration: You can specify the maximum cutting duration, which will affect the length of sentences in SRT and VTT files, by passing the
--max_cutting_duration
option. The default value is15
- Wit.ai keys: You can use wit.ai technologies to transcribe materials into text by passing your wit.ai client access tokens to the
-
Outputs
- Merge segments: You can use the
--min_words_per_segment
option to control the minimum number of words that can be in a single transcription segment. The default value is1
. Pass0
to disable this feature - Save original files before merging: Use the
--save_files_before_compact
option to save the original files before merging segments based on the--min_words_per_segment
option - Save yt-dlp library responses: You can save the yt-dlp library responses in JSON format by passing the
--save_yt_dlp_responses
option - Output sample segments: You can pass a value to the
--output_sample
option to get a random sample of all transcribed segments from each material after merging based on the--min_words_per_segment
option. The default value is0
, meaning no samples will be output -
Output formats: You can specify the output formats using the--output_formats
option. Available formats:
txt
srt
vtt
csv
tsv
json
-
all
(default) -
none
(No file will be created if this format is passed)
- Output folder: You can specify the output folder using the
--output_dir
option. By default, the current folder will be the output folder if not specified
- Merge segments: You can use the
```
➜ tafrigh --help
usage: tafrigh [-h] [--version] [--skip_if_output_exist | --no-skip_if_output_exist] [--download_retries DOWNLOAD_RETRIES] [--yt_dlp_options YT_DLP_OPTIONS] [--verbose | --no-verbose] [-m MODEL_NAME_OR_PATH] [-t {transcribe,translate}]
[-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}]
[--use_faster_whisper | --no-use_faster_whisper] [--beam_size BEAM_SIZE] [--ct2_compute_type {default,int8,int8_float16,int16,float16}] [-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]] [--max_cutting_duration [1-17]]
[--min_words_per_segment MIN_WORDS_PER_SEGMENT] [--save_files_before_compact | --no-save_files_before_compact] [--save_yt_dlp_responses | --no-save_yt_dlp_responses] [--output_sample OUTPUT_SAMPLE]
[-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]] [-o OUTPUT_DIR]
urls_or_paths [urls_or_paths ...]
options:
-h, --help show this help message and exit
--version show program's version number and exit
Input:
urls_or_paths Video/Playlist URLs or local folder/file(s) to transcribe.
--skip_if_output_exist, --no-skip_if_output_exist
Whether to skip generating the output if the output file already exists.
--download_retries DOWNLOAD_RETRIES
Number of retries for yt-dlp downloads that fail.
--yt_dlp_options YT_DLP_OPTIONS
Additional options to pass to yt-dlp in valid JSON format (e.g. `'{"playlist_items": "1-10"}'`).
--verbose, --no-verbose
Whether to print out the progress and debug messages.
Whisper:
-m MODEL_NAME_OR_PATH, --model_name_or_path MODEL_NAME_OR_PATH
Name or path of the Whisper model to use.
-t {transcribe,translate}, --task {transcribe,translate}
Whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate').
-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}, --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}
Language spoken in the audio, skip to perform language detection.
--use_faster_whisper, --no-use_faster_whisper
Whether to use Faster Whisper implementation.
--beam_size BEAM_SIZE
Number of beams in beam search, only applicable when temperature is zero.
--ct2_compute_type {default,int8,int8_float16,int16,float16}
Quantization type applied while converting the model to CTranslate2 format.
Wit:
-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...], --wit_client_access_tokens WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]
List of wit.ai client access tokens. If provided, wit.ai APIs will be used to do the transcription, otherwise whisper will be used.
--max_cutting_duration [1-17]
The maximum allowed cutting duration. It should be between 1 and 17.
Output:
--min_words_per_segment MIN_WORDS_PER_SEGMENT
The minimum number of words should appear in each transcript segment. Any segment have words count less than this threshold will be merged with the next one. Pass 0 to disable this behavior.
--save_files_before_compact, --no-save_files_before_compact
Saves the output files before applying the compact logic that is based on --min_words_per_segment.
--save_yt_dlp_responses, --no-save_yt_dlp_responses
Whether to save the yt-dlp library JSON responses or not.
--output_sample OUTPUT_SAMPLE
Samples random compacted segments from the output and generates a CSV file contains the sampled data. Pass 0 to disable this behavior.
-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...], --output_formats {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]
Format of the output file; if not specified, all available formats will be produced.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to save the outputs.
```
Transcription from command line
Transcribing using Whisper models
Transcribing a single material
```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Transcribing a full playlist
```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Transcribing multiple materials
```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```
Speeding up the transcription process
You can use the faster_whisper
library, which provides faster transcription, by passing the --use_faster_whisper
option as follows:
```bash
tafrigh "https://youtu.be/3K5Jh_-UYeA" \
--model_name_or_path large \
--task transcribe \
--language ar \
--use_faster_whisper \
--output_dir . \
--output_formats txt srt
```
Transcribing using wit.ai technology
Transcribing a single material
```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing a full playlist
```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing multiple materials
```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```
Transcribing using code
You can use Tafrigh through code as follows:
```python
from tafrigh import farrigh, Config
if __name__ == '__main__':
config = Config(
input=Config.Input(
urls_or_paths=['https://youtu.be/qFsUwp5iomU'],
skip_if_output_exist=False,
download_retries=3,
yt_dlp_options='{}',
verbose=False,
),
whisper=Config.Whisper(
model_name_or_path='tiny',
task='transcribe',
language='ar',
use_faster_whisper=True,
beam_size=5,
ct2_compute_type='default',
),
wit=Config.Wit(
wit_client_access_tokens=[],
max_cutting_duration=10,
),
output=Config.Output(
min_words_per_segment=10,
save_files_before_compact=False,
save_yt_dlp_responses=False,
output_sample=0,
output_formats=['txt', 'srt'],
output_dir='.',
),
)
for progress in farrigh(config):
print(progress)
```
The farrigh
function is a generator that produces the current transcription state and the progress of the process. If you do not need to track this, you can skip the loop by using deque
as follows:
```python
from collections import deque
from tafrigh import farrigh, Config
if __name__ == '__main__':
config = Config(...)
deque(farrigh(config), maxlen=0)
```
Transcribing using Docker
If you have Docker on your computer, the easiest way to use Tafrigh is through Docker. The following command downloads the Tafrigh Docker image and transcribes a YouTube material using wit.ai technologies, outputting the results in the current folder:
```bash
docker run -it --rm -v "$PWD:/tafrigh" ghcr.io/ieasybooks/tafrigh \
"https://www.youtube.com/watch?v=qFsUwp5iomU" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
-f txt srt
```
You can pass any option from the Tafrigh library options mentioned above.
There are multiple Docker images you can use for Tafrigh based on the dependencies you want to use:
-
ghcr.io/ieasybooks/tafrigh
: Contains dependencies for both wit.ai technologies and Whisper models -
ghcr.io/ieasybooks/tafrigh-whisper
: Contains dependencies for Whisper models only -
ghcr.io/ieasybooks/tafrigh-wit
: Contains dependencies for wit.ai technologies only
One drawback is that Whisper models cannot use your computer's GPU when used through Docker, which is something we are working on resolving in the future.
A significant part of this project is based on the yt-whisper repository to achieve Tafrigh faster.