{"id":13486910,"url":"https://github.com/zh-plus/openlrc","last_synced_at":"2025-10-06T00:23:49.798Z","repository":{"id":173902768,"uuid":"651267183","full_name":"zh-plus/openlrc","owner":"zh-plus","description":"Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT，Claude等)来转录、翻译你的音频为字幕文件。","archived":false,"fork":false,"pushed_at":"2025-04-23T10:00:38.000Z","size":8631,"stargazers_count":535,"open_issues_count":8,"forks_count":34,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-23T10:35:02.718Z","etag":null,"topics":["auto-subtitle","faster-whisper","lyrics","lyrics-generator","openai-api","openlrc","python","speech-to-text","subtitle-translation","transcribe","voice-to-text","whisper"],"latest_commit_sha":null,"homepage":"https://zh-plus.github.io/openlrc/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zh-plus.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-08T21:58:15.000Z","updated_at":"2025-04-22T04:53:13.000Z","dependencies_parsed_at":"2023-09-04T15:19:25.017Z","dependency_job_id":"0eb79504-4b39-4986-b185-c965bfe05b3b","html_url":"https://github.com/zh-plus/openlrc","commit_stats":{"total_commits":319,"total_committers":4,"mean_commits":79.75,"dds":"0.025078369905956133","last_synced_commit":"ba39bc6bed1d90550c98d7e12986042eeb41418c"},"previous_names":["zh-plus/open-lyrics"],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zh-plus%2Fopenlrc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zh-plus%2Fopenlrc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zh-plus%2Fopenlrc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zh-plus%2Fopenlrc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zh-plus","download_url":"https://codeload.github.com/zh-plus/openlrc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254404356,"owners_count":22065641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-subtitle","faster-whisper","lyrics","lyrics-generator","openai-api","openlrc","python","speech-to-text","subtitle-translation","transcribe","voice-to-text","whisper"],"created_at":"2024-07-31T18:00:52.833Z","updated_at":"2025-10-06T00:23:44.773Z","avatar_url":"https://github.com/zh-plus.png","language":"Python","funding_links":[],"categories":["Python","Chatbots"],"sub_categories":[],"readme":"# Open-Lyrics\n\n[![PyPI](https://img.shields.io/pypi/v/openlrc)](https://pypi.org/project/openlrc/)\n[![PyPI - License](https://img.shields.io/pypi/l/openlrc)](https://pypi.org/project/openlrc/)\n[![Downloads](https://static.pepy.tech/badge/openlrc)](https://pepy.tech/project/openlrc)\n![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/zh-plus/Open-Lyrics/ci.yml)\n\nOpen-Lyrics is a Python library that transcribes voice files using\n[faster-whisper](https://github.com/guillaumekln/faster-whisper), and translates/polishes the resulting text\ninto `.lrc` files in the desired language using LLM,\ne.g. [OpenAI-GPT](https://github.com/openai/openai-python), [Anthropic-Claude](https://github.com/anthropics/anthropic-sdk-python).\n\n#### Key Features:\n\n- Well preprocessed audio to reduce hallucination (Loudness Norm \u0026 optional Noise Suppression).\n- Context-aware translation to improve translation quality.\n  Check [prompt](https://github.com/zh-plus/openlrc/blob/master/openlrc/prompter.py) for details.\n- Check [here](#how-it-works) for an overview of the architecture.\n\n## New 🚨\n\n- 2024.5.7:\n    - Add custom endpoint (base_url) support for OpenAI \u0026 Anthropic:\n        ```python\n        lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',\n                                       'anthropic': 'https://example/api'})\n        ```\n    - Generating bilingual subtitles\n        ```python\n        lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)\n        ``` \n- 2024.5.11: Add glossary into prompt, which is confirmed to improve domain specific translation.\n  Check [here](#glossary) for details.\n- 2024.5.17: You can route model to arbitrary Chatbot SDK (either OpenAI or Anthropic) by setting `chatbot_model` to\n  `provider: model_name` together with base_url_config:\n    ```python\n    lrcer = LRCer(chatbot_model='openai: claude-3-haiku-20240307',\n                  base_url_config={'openai': 'https://api.g4f.icu/v1/'})\n    ```\n- 2024.6.25: Support Gemini as translation engine LLM, try using `gemini-1.5-flash`:\n    ```python\n    lrcer = LRCer(chatbot_model='gemini-1.5-flash')\n    ```\n- 2024.9.10: Now openlrc depends on\n  a [specific commit](https://github.com/SYSTRAN/faster-whisper/commit/d57c5b40b06e59ec44240d93485a95799548af50) of\n  faster-whisper, which is not published on PyPI. Install it from source:\n    ```shell\n    pip install \"faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz\"\n    ```\n- 2024.12.19: Add `ModelConfig` for chat model routing, which is more flexible than model name string, The ModelConfig\n  can be ModelConfig(provider='\u003cprovider\u003e', model_name='\u003cmodel-name\u003e', base_url='\u003curl\u003e', proxy='\u003cproxy\u003e'), e.g.:\n    ```python\n  \n    from openlrc import LRCer, ModelConfig, ModelProvider\n  \n    chatbot_model1 = ModelConfig(\n        provider=ModelProvider.OPENAI, \n        name='deepseek-chat', \n        base_url='https://api.deepseek.com/beta', \n        api_key='sk-APIKEY'\n    )\n    chatbot_model2 = ModelConfig(\n        provider=ModelProvider.OPENAI, \n        name='gpt-4o-mini', \n        api_key='sk-APIKEY'\n    )\n    lrcer = LRCer(chatbot_model=chatbot_model1, retry_model=chatbot_model2)\n    ```\n\n## Installation ⚙️\n\n1. Please install CUDA 11.x and [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn) first according\n   to https://opennmt.net/CTranslate2/installation.html to enable `faster-whisper`.\n\n   `faster-whisper` also needs [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas) installed.\n   \u003cdetails\u003e\n   \u003csummary\u003eFor Windows Users (click to expand)\u003c/summary\u003e \n\n   (For Windows Users only) Windows user can Download the libraries from Purfview's repository:\n\n   Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA\n   libraries for Windows in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs).\n   Decompress the archive and place the libraries in a directory included in the `PATH`.\n\n   \u003c/details\u003e\n\n\n2. Add LLM API keys, you can either:\n    - Add your [OpenAI API key](https://platform.openai.com/account/api-keys) to environment variable `OPENAI_API_KEY`.\n   - Add your [Anthropic API key](https://console.anthropic.com/settings/keys) to environment variable\n     `ANTHROPIC_API_KEY`.\n   - Add your [Google API Key](https://aistudio.google.com/app/apikey) to environment variable `GOOGLE_API_KEY`.\n\n3. Install [ffmpeg](https://ffmpeg.org/download.html) and add `bin` directory\n   to your `PATH`.\n\n4. This project can be installed from PyPI:\n\n    ```shell\n    pip install openlrc\n    ```\n\n   or install directly from GitHub:\n\n    ```shell\n    pip install git+https://github.com/zh-plus/openlrc\n    ```\n\n5. Install latest [fast-whisper](https://github.com/guillaumekln/faster-whisper) from source:\n   ```shell\n   pip install \"faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz\"\n   ```\n\n6. Install [PyTorch](https://pytorch.org/get-started/locally/):\n   ```shell\n   pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124\n   ```\n\n7. Fix the `typing-extensions` issue:\n   ```shell\n   pip install typing-extensions -U\n   ```\n\n## Usage 🐍\n\n[//]: # (### GUI)\n\n[//]: # ()\n\n[//]: # (\u003e [!NOTE])\n\n[//]: # (\u003e We are migrating the GUI from streamlit to Gradio. The GUI is still under development.)\n\n[//]: # ()\n\n[//]: # (```shell)\n\n[//]: # (openlrc gui)\n\n[//]: # (```)\n\n[//]: # ()\n\n[//]: # (![]\u0026#40;https://github.com/zh-plus/openlrc/blob/master/resources/streamlit_app.jpg?raw=true\u0026#41;)\n\n### Python code\n\n```python\nfrom openlrc import LRCer\n\nif __name__ == '__main__':\n    lrcer = LRCer()\n\n    # Single file\n    lrcer.run('./data/test.mp3',\n              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.\n\n    # Multiple files\n    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')\n    # Note we run the transcription sequentially, but run the translation concurrently for each file.\n\n    # Path can contain video\n    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')\n    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt\n\n    # Use glossary to improve translation\n    lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')\n\n    # To skip translation process\n    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)\n\n    # Change asr_options or vad_options, check openlrc.defaults for details\n    vad_options = {\"threshold\": 0.1}\n    lrcer = LRCer(vad_options=vad_options)\n    lrcer.run('./data/test.mp3', target_lang='zh-cn')\n\n    # Enhance the audio using noise suppression (consume more time).\n    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)\n\n    # Change the LLM model for translation\n    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')\n    lrcer.run('./data/test.mp3', target_lang='zh-cn')\n\n    # Clear temp folder after processing done\n    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp=True)\n\n    # Change base_url\n    lrcer = LRCer(base_url_config={'openai': 'https://api.g4f.icu/v1',\n                                   'anthropic': 'https://example/api'})\n\n    # Route model to arbitrary Chatbot SDK\n    lrcer = LRCer(chatbot_model='openai: claude-3-sonnet-20240229',\n                  base_url_config={'openai': 'https://api.g4f.icu/v1/'})\n\n    # Bilingual subtitle\n    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)\n```\n\nCheck more details in [Documentation](https://zh-plus.github.io/openlrc/#/).\n\n### Glossary\n\nAdd glossary to improve domain specific translation. For example `aoe4-glossary.yaml`:\n\n```json\n{\n  \"aoe4\": \"帝国时代4\",\n  \"feudal\": \"封建时代\",\n  \"2TC\": \"双TC\",\n  \"English\": \"英格兰文明\",\n  \"scout\": \"侦察兵\"\n}\n```\n\n```python\nlrcer = LRCer(glossary='./data/aoe4-glossary.yaml')\nlrcer.run('./data/test.mp3', target_lang='zh-cn')\n```\n\nor directly use dictionary to add glossary:\n\n```python\nlrcer = LRCer(glossary={\"aoe4\": \"帝国时代4\", \"feudal\": \"封建时代\"})\nlrcer.run('./data/test.mp3', target_lang='zh-cn')\n```\n\n## Pricing 💰\n\n*pricing data from [OpenAI](https://openai.com/pricing)\nand [Anthropic](https://docs.anthropic.com/claude/docs/models-overview#model-comparison)*\n\n| Model Name                   | Pricing for 1M Tokens \u003cbr/\u003e(Input/Output) (USD) | Cost for 1 Hour Audio \u003cbr/\u003e(USD) |\n|------------------------------|-------------------------------------------------|----------------------------------|\n| `gpt-3.5-turbo`              | 0.5, 1.5                                        | 0.01                             |\n| `gpt-4o-mini`                | 0.5, 1.5                                        | 0.01                             |\n| `gpt-4-0125-preview`         | 10, 30                                          | 0.5                              |\n| `gpt-4-turbo-preview`        | 10, 30                                          | 0.5                              |\n| `gpt-4o`                     | 5, 15                                           | 0.25                             |\n| `claude-3-haiku-20240307`    | 0.25, 1.25                                      | 0.015                            |\n| `claude-3-sonnet-20240229`   | 3, 15                                           | 0.2                              |\n| `claude-3-opus-20240229`     | 15, 75                                          | 1                                |\n| `claude-3-5-sonnet-20240620` | 3, 15                                           | 0.2                              |\n| `gemini-1.5-flash`           | 0.175, 2.1                                      | 0.01                             |\n| `gemini-1.0-pro`             | 0.5, 1.5                                        | 0.01                             |\n| `gemini-1.5-pro`             | 1.75, 21                                        | 0.1                              |\n| `deepseek-chat`              | 0.18, 2.2                                       | 0.01                             |\n\n**Note the cost is estimated based on the token count of the input and output text.\nThe actual cost may vary due to the language and audio speed.**\n\n### Recommended translation model\n\nFor english audio, we recommend using `deepseek-chat`, `gpt-4o-mini` or `gemini-1.5-flash`.\n\nFor non-english audio, we recommend using `claude-3-5-sonnet-20240620`.\n\n## How it works\n\n![](https://github.com/zh-plus/openlrc/blob/master/resources/how-it-works.png?raw=true)\n\nTo maintain context between translation segments, the process is sequential for each audio file.\n\n\n[//]: # (## Comparison to https://microsoft.github.io/autogen/docs/notebooks/agentchat_video_transcript_translate_with_whisper/)\n\n## Development Guide\n\nI'm using [uv](https://github.com/astral-sh/uv) for package management.\nInstall uv with our standalone installers:\n\n#### On macOS and Linux.\n\n```shell\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n#### On Windows.\n\n```shell\npowershell -ExecutionPolicy ByPass -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n```\n\n### Install deps\n\n```shell\nuv venv\nuv sync\n```\n\n## Todo\n\n- [x] [Efficiency] Batched translate/polish for GPT request (enable contextual ability).\n- [x] [Efficiency] Concurrent support for GPT request.\n- [x] [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.\n- [x] [Feature] Automatically fix json encoder error using GPT.\n- [x] [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.\n- [x] [Quality] Improve batched translation/polish prompt according\n  to [gpt-subtrans](https://github.com/machinewrapped/gpt-subtrans).\n- [x] [Feature] Input video support.\n- [X] [Feature] Multiple output format support.\n- [x] [Quality] Speech enhancement for input audio.\n- [ ] [Feature] Preprocessor: Voice-music separation.\n- [ ] [Feature] Align ground-truth transcription with audio.\n- [ ] [Quality]\n  Use [multilingual language model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models) to assess\n  translation quality.\n- [ ] [Efficiency] Add Azure OpenAI Service support.\n- [ ] [Quality] Use [claude](https://www.anthropic.com/index/introducing-claude) for translation.\n- [ ] [Feature] Add local LLM support.\n- [X] [Feature] Multiple translate engine (Anthropic, Microsoft, DeepL, Google, etc.) support.\n- [ ] [**Feature**] Build\n  a [electron + fastapi](https://ivanyu2021.hashnode.dev/electron-django-desktop-app-integrate-javascript-and-python)\n  GUI for cross-platform application.\n- [x] [Feature] Web-based [streamlit](https://streamlit.io/) GUI.\n- [ ] Add [fine-tuned whisper-large-v2](https://huggingface.co/models?search=whisper-large-v2) models for common\n  languages.\n- [x] [Feature] Add custom OpenAI \u0026 Anthropic endpoint support.\n- [ ] [Feature] Add local translation model support (e.g. [SakuraLLM](https://github.com/SakuraLLM/Sakura-13B-Galgame)).\n- [ ] [Quality] Construct translation quality benchmark test for each patch.\n- [ ] [Quality] Split subtitles using\n  LLM ([ref](https://github.com/Huanshere/VideoLingo/blob/ff520309e958dd3048586837d09ce37d3e9ebabd/core/prompts_storage.py#L6)).\n- [ ] [Quality] Trim extra long subtitle using\n  LLM ([ref](https://github.com/Huanshere/VideoLingo/blob/ff520309e958dd3048586837d09ce37d3e9ebabd/core/prompts_storage.py#L311)).\n- [ ] [Others] Add transcribed examples.\n    - [ ] Song\n    - [ ] Podcast\n    - [ ] Audiobook\n\n## Credits\n\n- https://github.com/guillaumekln/faster-whisper\n- https://github.com/m-bain/whisperX\n- https://github.com/openai/openai-python\n- https://github.com/openai/whisper\n- https://github.com/machinewrapped/gpt-subtrans\n- https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Python\n- https://github.com/streamlit/streamlit\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=zh-plus/Open-Lyrics\u0026type=Date)](https://star-history.com/#zh-plus/Open-Lyrics\u0026Date)\n\n## Citation\n\n```\n@book{openlrc2024zh,\n\ttitle = {zh-plus/openlrc},\n\turl = {https://github.com/zh-plus/openlrc},\n\tauthor = {Hao, Zheng},\n\tdate = {2024-09-10},\n\tyear = {2024},\n\tmonth = {9},\n\tday = {10},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzh-plus%2Fopenlrc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzh-plus%2Fopenlrc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzh-plus%2Fopenlrc/lists"}