{"id":26603978,"url":"https://github.com/abdullahhendy/live-translation","last_synced_at":"2025-04-09T17:14:15.075Z","repository":{"id":283764723,"uuid":"936922708","full_name":"AbdullahHendy/live-translation","owner":"AbdullahHendy","description":"This project provides a real-time speech-to-text translation solution. It captures audio from the microphone, processes it, transcribes it into text, and translates it to a target language. Multiple output formats are supported.","archived":false,"fork":false,"pushed_at":"2025-04-06T11:11:42.000Z","size":1654,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T17:14:00.615Z","etag":null,"topics":["ai","nlp","opus-mt","silero-vad","transcription","translation","whisper-ai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AbdullahHendy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-21T23:46:44.000Z","updated_at":"2025-03-28T06:45:20.000Z","dependencies_parsed_at":"2025-03-22T03:31:53.232Z","dependency_job_id":null,"html_url":"https://github.com/AbdullahHendy/live-translation","commit_stats":null,"previous_names":["abdullahhendy/live-translation"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbdullahHendy%2Flive-translation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbdullahHendy%2Flive-translation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbdullahHendy%2Flive-translation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbdullahHendy%2Flive-translation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AbdullahHendy","download_url":"https://codeload.github.com/AbdullahHendy/live-translation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248074931,"owners_count":21043490,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","nlp","opus-mt","silero-vad","transcription","translation","whisper-ai"],"created_at":"2025-03-23T19:19:04.606Z","updated_at":"2025-04-09T17:14:15.067Z","avatar_url":"https://github.com/AbdullahHendy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Real-time Speech-to-Text Translation\n\nThis project provides a real-time speech-to-text translation solution. It captures audio from the microphone, processes it, transcribes it into text, and translates it to a target language. It uses the **Silero** model for processing (Voice Activity Detection), **Whisper** model for transcription and **Opus-MT** for translation. The output can be through ***stdout***, a ***JSON file***, or ***websockets***. \n\n#### 🖥️ Print Output Demo\n\n\u003ca href=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/print.gif?raw=true\" target=\"_blank\"\u003e\n  \u003cimg src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/print.gif?raw=true\" alt=\"Print Demo\" /\u003e\n\u003c/a\u003e\n\n#### 🌍 WebSocket Output Demo\n\n\u003ca href=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/websocket.gif?raw=true\" target=\"_blank\"\u003e\n  \u003cimg src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/websocket.gif?raw=true\" alt=\"WebSocket Demo\" /\u003e\n\u003c/a\u003e\n\n## Architecture Overview\n\u003cimg src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/live-translation-piepline.png?raw=true\" alt=\"Architecture Diagram\" /\u003e\n\n\n## Features\n\n- Real-time speech capture and processing using **Silero** VAD (Voice Activity Detection)\n- Speech-to-text transcription using the Whisper model\n- Translation of transcriptions from a source language to a target language\n- Multithreaded design for efficient processing\n- Different output modes: stdout, **JSON** file, websocket server\n\n## Prerequisites\n\nBefore running the project, you need to install the following system dependencies:\n\n- **PortAudio** (for audio input handling)\n- **FFmpeg** (for audio and video processing)\n    - On Ubuntu/Debian-based systems, you can install it with:\n      ```bash\n      sudo apt-get install portaudio19-dev ffmpeg\n      ```\n\n## Installation\n\n**(RECOMMENDED)**: install this package inside a virtual environment to avoid dependency conflicts.\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n```\n\n**Install** the [PyPI package](https://pypi.org/project/live-translation/0.3.2/):\n```bash\npip install live-translation\n```\n\n**Verify** the installation:\n```bash\npython -c \"import live_translation; print('live-translation installed successfully!')\"\n```\n\n## Usage\n\n\u003e **NOTE**: One can safely ignore the following warning that might appear on **Linux** systems:\n\u003e\n\u003e ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave\n\u003e ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave\n\u003e ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear\n\u003e ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe\n\u003e ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side\n\u003e ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave\n\u003e Cannot connect to server socket err = No such file or directory\n\u003e Cannot connect to server request channel\n\u003e jack server is not running or cannot be started\n\u003e JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock\n\u003e JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock\n\u003e\n\n### CLI \nlive-translation can be run directly from the command line:\n```bash\nlive-translate [OPTIONS]\n```\n\n**[OPTIONS]**\n```bash\nusage: live-translate [-h] [--silence_threshold SILENCE_THRESHOLD] [--vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}] [--max_buffer_duration {5,6,7,8,9,10}] [--device {cpu,cuda}] [--whisper_model {tiny,base,small,medium,large,large-v2}]\n                      [--trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}] [--src_lang SRC_LANG] [--tgt_lang TGT_LANG] [--output {print,file,websocket}] [--ws_port WS_PORT] [--transcribe_only]\n\nLive Translation Pipeline - Configure runtime settings.\n\noptions:\n  -h, --help            show this help message and exit\n  --silence_threshold SILENCE_THRESHOLD\n                        Number of consecutive 32ms silent chunks to detect SILENCE.\n                        SILENCE clears the audio buffer for transcription/translation.\n                        NOTE: Minimum value is 16.\n                        Default is 65 (~ 2s).\n  --vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}\n                        Voice Activity Detection (VAD) aggressiveness level (0-9).\n                        Higher values mean VAD has to be more confident to detect speech vs silence.\n                        Default is 8.\n  --max_buffer_duration {5,6,7,8,9,10}\n                        Max audio buffer duration in seconds before trimming it.\n                        Default is 7 seconds.\n  --device {cpu,cuda}   Device for processing ('cpu', 'cuda').\n                        Default is 'cpu'.\n  --whisper_model {tiny,base,small,medium,large,large-v2}\n                        Whisper model size ('tiny', 'base', 'small', 'medium', 'large', 'large-v2').\n                        Default is 'base'.\n  --trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}\n                        Translation model ('Helsinki-NLP/opus-mt', 'Helsinki-NLP/opus-mt-tc-big'). \n                        NOTE: Don't include source and target languages here.\n                        Default is 'Helsinki-NLP/opus-mt'.\n  --src_lang SRC_LANG   Source/Input language for transcription (e.g., 'en', 'fr').\n                        Default is 'en'.\n  --tgt_lang TGT_LANG   Target language for translation (e.g., 'es', 'de').\n                        Default is 'es'.\n  --output {print,file,websocket}\n                        Output method ('print', 'file', 'websocket').\n                          - 'print': Prints transcriptions and translations to stdout.\n                          - 'file': Saves structured JSON data (see below) in ./transcripts/transcriptions.json.\n                          - 'websocket': Sends structured JSON data (see below) over WebSocket.\n                        JSON format for 'file' and 'websocket':\n                        {\n                            \"timestamp\": \"2025-03-06T12:34:56.789Z\",\n                            \"transcription\": \"Hello world\",\n                            \"translation\": \"Hola mundo\"\n                        }.\n                        Default is 'print'.\n  --ws_port WS_PORT     WebSocket port for sending transcriptions.\n                        Required if --output is 'websocket'.\n  --transcribe_only     Transcribe only mode. No translations are performed.\n```\n\n- in case of **websockets**, one can connect to the server using **curl**, **wscat**, etc.. \n  ```bash\n  curl --include --no-buffer ws://localhost:\u003cPORT_NUM\u003e\n  ```\n  ```bash\n  wscat -c ws://localhost:\u003cPORT_NUM\u003e\n  ```\n\n### API\nYou can also import and use live_translation directly in your Python code.\nThe following is a ***simple*** example of running *live_translation* in a server/client fashion.\nFor more detailed examples see [examples/](/examples/).\n\n- **Server**\n  ```python\n  from live_translation.config import Config\n  from live_translation.app import LiveTranslationApp\n\n  def main():\n      config = Config(\n          device=\"cpu\",\n          output=\"websocket\",\n          ws_port=8765\n      )\n\n      # Create and start the Live Translation App\n      app = LiveTranslationApp(config)\n      app.run()\n\n  # Main guard is CRITICAL for systems that uses spawn method to create new processes\n  # This is the case for Windows and MacOS\n  if __name__ == \"__main__\":\n      main()\n  ```\n\n- **Client**\n  ```python\n  import asyncio\n  import websockets\n  import json\n\n  async def listen():\n      uri = \"ws://localhost:8765\"\n      async with websockets.connect(uri) as websocket:\n          print(\"🔌 Connected to Live Translation WebSocket server.\")\n\n          try:\n              while True:\n                  message = await websocket.recv()\n                  data = json.loads(message)\n\n                  print(f\"⏳ Timestamp: {data['timestamp']}\")\n                  print(f\"📝 Transcription: {data['transcription']}\")\n                  print(f\"🌍 Translation: {data['translation']}\\n\")\n\n          except websockets.exceptions.ConnectionClosed:\n              print(\"WebSocket connection closed.\")\n\n  asyncio.run(listen())\n  ```\n\n## Development\n\nTo contribute or modify this project, these steps might be helpful:\n\u003e **NOTE**: This workflow below is made for Linux-based systems. One might need to do some step manually on other systems. For example run test manually using `python -m pytest -s tests/` instead of `make test`. \n\u003e See **Makefile** for more details.\n\n\n**Clone** the repository:\n```bash\ngit clone git@github.com:AbdullahHendy/live-translation.git\ncd live-translation\n```\n\n**Ceate** a virtual environment:\n```bash\npython -m venv .venv\nsource .venv/bin/activate \n```\n\n**Install** Dependencies:\n```bash\npip install --upgrade pip\npip install -r requirements.txt\n```\n\n**Test** the package:\n```bash\nmake test\n```\n\n**Build** the package:\n```bash\nmake build\n```\n\u003e **NOTE**: Building does ***lint*** and checks for ***formatting*** using [ruff](https://docs.astral.sh/ruff/). One can do that seprately using `make format` and `make lint`. For linting and formatting rules, see the [ruff config](/ruff.toml).\n\n\u003e **NOTE**: Building generates a ***.whl*** file that can be ***pip*** installed in a new environment for testing\n\n**If needed**, run the program within the virtual environment:\n```bash\npython -m live_translation.cli [OPTIONS]\n```\n\n## Tested Environment\n\nThis project was tested and developed on the following system configuration:\n\n- **Architecture**: x86_64 (64-bit)\n- **Operating System**: Ubuntu 24.10 (Oracular Oriole)\n- **Kernel Version**: 6.11.0-18-generic\n- **Python Version**: 3.12.7\n- **Processor**: 13th Gen Intel(R) Core(TM) i9-13900HX\n- **GPU**: GeForce RTX 4070 Max-Q / Mobile [^1]\n- **RAM**: 16GB DDR5\n- **Dependencies**: All required dependencies are listed in `requirements.txt` and [Prerequisites](#prerequisites)\n\n[^1]: CUDA not utilized, as the `DEVICE` configuration is set to `\"cpu\"`. Additional Nvidia drivers, CUDA, cuDNN installation needed if option `\"cuda\"` were to be used.\n\n## Improvements\n\n- **Better Error Handling**: Improve error handling across various components (audio, transcription, translation) to ensure the system is robust and can handle unexpected scenarios gracefully.\n- **Performance Optimization**: Investigate performance bottlenecks including checking sleep durations and optimizing concurrency management to minimize lag.\n- **Concurrency Design Check**: Review and optimize the threading design to ensure thread safety and prevent issues like race conditions or deadlocks, etc., revisit the current design of ***AudioRecorder*** being a thread while ***AudioProcessor***, ***Transcriber***, and ***Translator*** being processes.\n- **Logging**: Integrate detailed logging to track system activity, errors, and performance metrics using a more formal logging framework.\n\n## Citations\n ```bibtex\n  @article{Whisper,\n    title = {Robust Speech Recognition via Large-Scale Weak Supervision},\n    url = {https://arxiv.org/abs/2212.04356},\n    author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},\n    publisher = {arXiv},\n    year = {2022}\n  }\n\n  @misc{Silero VAD,\n    author = {Silero Team},\n    title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},\n    year = {2021},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https://github.com/snakers4/silero-vad}},\n    email = {hello@silero.ai}\n  }\n\n  @article{tiedemann2023democratizing,\n    title={Democratizing neural machine translation with {OPUS-MT}},\n    author={Tiedemann, J{\\\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\\\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},\n    journal={Language Resources and Evaluation},\n    number={58},\n    pages={713--755},\n    year={2023},\n    publisher={Springer Nature},\n    issn={1574-0218},\n    doi={10.1007/s10579-023-09704-w}\n  }\n\n  @InProceedings{TiedemannThottingal:EAMT2020,\n    author = {J{\\\"o}rg Tiedemann and Santhosh Thottingal},\n    title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld},\n    booktitle = {Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)},\n    year = {2020},\n    address = {Lisbon, Portugal}\n  }\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdullahhendy%2Flive-translation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdullahhendy%2Flive-translation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdullahhendy%2Flive-translation/lists"}