Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ieasybooks/tafrigh

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
https://github.com/ieasybooks/tafrigh

asr automatic-speech-recognition ctranslate2 facebook faster-whisper javascript python soundcloud srt stable-whisper subtitles twitter vtt whisper youtube

Last synced: 7 days ago
JSON representation

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.

Awesome Lists containing this project

README

        










[![ar](https://img.shields.io/badge/lang-ar-brightgreen.svg)](README.md)
[![en](https://img.shields.io/badge/lang-en-red.svg)](README.en.md)

Tafrigh

A tool to transcribe visual or audio materials into text. You can view examples transcribed using Tafrigh from here or from Baheth platform, all the transcribed content on it is transcribed using Tafrigh.

Note: If you want to use Tafrigh through JavaScript, take a look at this repository.

Features of Tafrigh


  • Transcribing visual and audio materials into text using the latest AI technologies provided by OpenAI

  • Ability to transcribe materials using wit.ai technologies provided by Facebook

  • Download materials directly from YouTube, Facebook, Twitter, SoundCloud, and other sites

  • Download visual content directly from YouTube, whether a single video or a complete playlist

  • Provide various output formats like txt, srt, vtt, csv, tsv, and json

Requirements


  • A strong GPU in your computer is recommended if using Whisper models

  • Python version 3.10 or higher installed on your computer


  • FFmpeg installed on your computer


  • yt-dlp installed on your computer

Installing Tafrigh

Using pip

You can install Tafrigh using pip with the command: pip install tafrigh[wit,whisper]

You can specify the dependencies you want to install based on the technology you want to use by writing wit or whisper in square brackets as shown in the previous command.

From the Source Code


  • Download this repository by clicking on Code then Download ZIP or by executing the following command: git clone [email protected]:ieasybooks/tafrigh.git

  • Extract the file if downloaded as ZIP and navigate to the project folder

  • Execute the following command to install Tafrigh: poetry install

Add -E wit or -E whisper to specify the dependencies to install.

Using Tafrigh

Available Options



  • Inputs

    • Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example: tafrigh "https://yout..." "https://yout..." "C:\Users\ieasybooks\leactue.wav"

    • Skip transcription if output exists: Use the --skip_if_output_exist option to skip transcription if the required outputs already exist in the specified output folder

    • Number of download retries: If downloading a full playlist using the yt-dlp library, some items may fail to download. The --download_retries option can be used to specify the number of retry attempts if a download fails. The default value is 3

    • Additional options for yt-dlp: You can pass additional options to the yt-dlp library using the --yt_dlp_options option in valid JSON format. For example, to download only the first 10 items from a playlist, pass --yt_dlp_options '{"playlist_items": "1-10"}'



  • Whisper Options


    • Model: You can specify the model using the --model_name_or_path option. Available models:


      • tiny.en (English only)


      • tiny (least accurate)


      • base.en (English only)

      • base


      • small.en (English only)


      • small (default)


      • medium.en (English only)

      • medium

      • large-v1

      • large-v2

      • large-v3


      • large (most accurate)

      • Whisper model name on HuggingFace Hub

      • Path to a pre-downloaded Whisper model

      • Path to a Whisper model converted using the ct2-transformers-converter tool for use with the fast library faster-whisper




    • Task: You can specify the task using the --task option. Available tasks:


      • transcribe: Convert speech to text (default)


      • translation: Translate speech to text in English



    • Language: You can specify the audio language using the --language option. For example, to specify Arabic, pass ar. If not specified, the language will be detected automatically

    • Use faster version of Whisper models: By passing the --use_faster_whisper option, the faster version of Whisper models will be used

    • Beam size: You can improve results using the --beam_size option, which allows the model to search a wider range of words during text generation. The default value is 5


    • Model compression type: You can specify the compression method used during the model conversion using the ct2-transformers-converter tool by passing the --ct2_compute_type option. Available methods:


      • default (default)

      • int8

      • int8_float16

      • int16

      • float16





  • Wit Options

    • Wit.ai keys: You can use wit.ai technologies to transcribe materials into text by passing your wit.ai client access tokens to the --wit_client_access_tokens option. If this option is passed, wit.ai will be used for transcription. Otherwise, Whisper models will be used

    • Maximum cutting duration: You can specify the maximum cutting duration, which will affect the length of sentences in SRT and VTT files, by passing the --max_cutting_duration option. The default value is 15



  • Outputs

    • Merge segments: You can use the --min_words_per_segment option to control the minimum number of words that can be in a single transcription segment. The default value is 1. Pass 0 to disable this feature

    • Save original files before merging: Use the --save_files_before_compact option to save the original files before merging segments based on the --min_words_per_segment option

    • Save yt-dlp library responses: You can save the yt-dlp library responses in JSON format by passing the --save_yt_dlp_responses option

    • Output sample segments: You can pass a value to the --output_sample option to get a random sample of all transcribed segments from each material after merging based on the --min_words_per_segment option. The default value is 0, meaning no samples will be output


    • Output formats: You can specify the output formats using the --output_formats option. Available formats:

      • txt

      • srt

      • vtt

      • csv

      • tsv

      • json


      • all (default)


      • none (No file will be created if this format is passed)



    • Output folder: You can specify the output folder using the --output_dir option. By default, the current folder will be the output folder if not specified



```
➜ tafrigh --help
usage: tafrigh [-h] [--version] [--skip_if_output_exist | --no-skip_if_output_exist] [--download_retries DOWNLOAD_RETRIES] [--yt_dlp_options YT_DLP_OPTIONS] [--verbose | --no-verbose] [-m MODEL_NAME_OR_PATH] [-t {transcribe,translate}]
[-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}]
[--use_faster_whisper | --no-use_faster_whisper] [--beam_size BEAM_SIZE] [--ct2_compute_type {default,int8,int8_float16,int16,float16}] [-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]] [--max_cutting_duration [1-17]]
[--min_words_per_segment MIN_WORDS_PER_SEGMENT] [--save_files_before_compact | --no-save_files_before_compact] [--save_yt_dlp_responses | --no-save_yt_dlp_responses] [--output_sample OUTPUT_SAMPLE]
[-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]] [-o OUTPUT_DIR]
urls_or_paths [urls_or_paths ...]

options:
-h, --help show this help message and exit
--version show program's version number and exit

Input:
urls_or_paths Video/Playlist URLs or local folder/file(s) to transcribe.
--skip_if_output_exist, --no-skip_if_output_exist
Whether to skip generating the output if the output file already exists.
--download_retries DOWNLOAD_RETRIES
Number of retries for yt-dlp downloads that fail.
--yt_dlp_options YT_DLP_OPTIONS
Additional options to pass to yt-dlp in valid JSON format (e.g. `'{"playlist_items": "1-10"}'`).
--verbose, --no-verbose
Whether to print out the progress and debug messages.

Whisper:
-m MODEL_NAME_OR_PATH, --model_name_or_path MODEL_NAME_OR_PATH
Name or path of the Whisper model to use.
-t {transcribe,translate}, --task {transcribe,translate}
Whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate').
-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}, --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}
Language spoken in the audio, skip to perform language detection.
--use_faster_whisper, --no-use_faster_whisper
Whether to use Faster Whisper implementation.
--beam_size BEAM_SIZE
Number of beams in beam search, only applicable when temperature is zero.
--ct2_compute_type {default,int8,int8_float16,int16,float16}
Quantization type applied while converting the model to CTranslate2 format.

Wit:
-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...], --wit_client_access_tokens WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]
List of wit.ai client access tokens. If provided, wit.ai APIs will be used to do the transcription, otherwise whisper will be used.
--max_cutting_duration [1-17]
The maximum allowed cutting duration. It should be between 1 and 17.

Output:
--min_words_per_segment MIN_WORDS_PER_SEGMENT
The minimum number of words should appear in each transcript segment. Any segment have words count less than this threshold will be merged with the next one. Pass 0 to disable this behavior.
--save_files_before_compact, --no-save_files_before_compact
Saves the output files before applying the compact logic that is based on --min_words_per_segment.
--save_yt_dlp_responses, --no-save_yt_dlp_responses
Whether to save the yt-dlp library JSON responses or not.
--output_sample OUTPUT_SAMPLE
Samples random compacted segments from the output and generates a CSV file contains the sampled data. Pass 0 to disable this behavior.
-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...], --output_formats {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]
Format of the output file; if not specified, all available formats will be produced.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to save the outputs.
```

Transcription from command line

Transcribing using Whisper models

Transcribing a single material

```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

Transcribing a full playlist

```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

Transcribing multiple materials

```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

Speeding up the transcription process

You can use the faster_whisper library, which provides faster transcription, by passing the --use_faster_whisper option as follows:

```bash
tafrigh "https://youtu.be/3K5Jh_-UYeA" \
--model_name_or_path large \
--task transcribe \
--language ar \
--use_faster_whisper \
--output_dir . \
--output_formats txt srt
```

Transcribing using wit.ai technology

Transcribing a single material

```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

Transcribing a full playlist

```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

Transcribing multiple materials

```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

Transcribing using code

You can use Tafrigh through code as follows:

```python
from tafrigh import farrigh, Config

if __name__ == '__main__':
config = Config(
input=Config.Input(
urls_or_paths=['https://youtu.be/qFsUwp5iomU'],
skip_if_output_exist=False,
download_retries=3,
yt_dlp_options='{}',
verbose=False,
),
whisper=Config.Whisper(
model_name_or_path='tiny',
task='transcribe',
language='ar',
use_faster_whisper=True,
beam_size=5,
ct2_compute_type='default',
),
wit=Config.Wit(
wit_client_access_tokens=[],
max_cutting_duration=10,
),
output=Config.Output(
min_words_per_segment=10,
save_files_before_compact=False,
save_yt_dlp_responses=False,
output_sample=0,
output_formats=['txt', 'srt'],
output_dir='.',
),
)

for progress in farrigh(config):
print(progress)
```

The farrigh function is a generator that produces the current transcription state and the progress of the process. If you do not need to track this, you can skip the loop by using deque as follows:

```python
from collections import deque

from tafrigh import farrigh, Config

if __name__ == '__main__':
config = Config(...)

deque(farrigh(config), maxlen=0)
```

Transcribing using Docker

If you have Docker on your computer, the easiest way to use Tafrigh is through Docker. The following command downloads the Tafrigh Docker image and transcribes a YouTube material using wit.ai technologies, outputting the results in the current folder:

```bash
docker run -it --rm -v "$PWD:/tafrigh" ghcr.io/ieasybooks/tafrigh \
"https://www.youtube.com/watch?v=qFsUwp5iomU" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
-f txt srt
```

You can pass any option from the Tafrigh library options mentioned above.

There are multiple Docker images you can use for Tafrigh based on the dependencies you want to use:




  • ghcr.io/ieasybooks/tafrigh: Contains dependencies for both wit.ai technologies and Whisper models


  • ghcr.io/ieasybooks/tafrigh-whisper: Contains dependencies for Whisper models only


  • ghcr.io/ieasybooks/tafrigh-wit: Contains dependencies for wit.ai technologies only

One drawback is that Whisper models cannot use your computer's GPU when used through Docker, which is something we are working on resolving in the future.


A significant part of this project is based on the yt-whisper repository to achieve Tafrigh faster.