Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/akhkim/babel

Real-time Internal Audio Translate and Transcriber that uses Whisper model
https://github.com/akhkim/babel

ai internal-audio real-time transcription translation whisper

Last synced: about 1 month ago
JSON representation

Real-time Internal Audio Translate and Transcriber that uses Whisper model

Host: GitHub
URL: https://github.com/akhkim/babel
Owner: akhkim
Created: 2024-07-27T06:05:27.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-09-21T02:32:50.000Z (4 months ago)
Last Synced: 2024-10-31T17:44:56.502Z (3 months ago)
Topics: ai, internal-audio, real-time, transcription, translation, whisper
Language: Python
Homepage:
Size: 25.4 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Real-time Internal Audio Translate & Transcriber

**Babel** is a Real-time internal audio translate & transcriber that uses [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper), a reimplementation of OpenAI's Whisper model.

This script can recognize the volume of the recording, allowing the user to leave out the background noise and focus on louder sound if desired.
It also can automatically detect **57** different languages that Whisper model supports, and translate them into **134** different languages that Google Translate supports.

## Requirements
- Python 3.8 or greater

### GPU
GPU execution requires the following NVIDIA libraries to be installed:

- [cuBLAS for CUDA 12](https://developer.nvidia.com/cublas)
- [cuDNN 8 for CUDA 12](https://developer.nvidia.com/cudnn)

## Installation
```
git clone https://github.com/akhkim/babel.git
cd babel
pip install -r requirements.txt
```

## Command-line Usage
```
python3 main.py --model large --translation-lang English --threshold 0.0005
```

## To be Implemented
- Overlay of Transcribed Text
- Speech to Translated Speech Function
- Simple GUI
- Optimizing Memory Usage
- Increased Translation Accuracy
- Faster Translation / Transcription