{"id":13631833,"url":"https://github.com/kadirnar/whisper-plus","last_synced_at":"2025-05-15T14:05:54.987Z","repository":{"id":208427559,"uuid":"721555647","full_name":"kadirnar/whisper-plus","owner":"kadirnar","description":"WhisperPlus: Faster, Smarter, and More Capable 🚀","archived":false,"fork":false,"pushed_at":"2024-08-12T23:45:05.000Z","size":1012,"stargazers_count":1710,"open_issues_count":11,"forks_count":137,"subscribers_count":19,"default_branch":"main","last_synced_at":"2024-10-29T15:27:10.790Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kadirnar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"kadirnar"}},"created_at":"2023-11-21T09:59:50.000Z","updated_at":"2024-10-27T04:43:02.000Z","dependencies_parsed_at":"2024-05-02T22:35:55.957Z","dependency_job_id":"bba24338-a37f-4cc8-bbb5-df338495d52e","html_url":"https://github.com/kadirnar/whisper-plus","commit_stats":null,"previous_names":["kadirnar/whisper-plus"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadirnar%2Fwhisper-plus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadirnar%2Fwhisper-plus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadirnar%2Fwhisper-plus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kadirnar%2Fwhisper-plus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kadirnar","download_url":"https://codeload.github.com/kadirnar/whisper-plus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254355334,"owners_count":22057354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T22:02:40.135Z","updated_at":"2025-05-15T14:05:49.978Z","avatar_url":"https://github.com/kadirnar.png","language":"Python","funding_links":["https://github.com/sponsors/kadirnar"],"categories":["Python","语音识别与合成_其他"],"sub_categories":["网络服务_其他"],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch2\u003e\n    WhisperPlus: Faster, Smarter, and More Capable 🚀\n\u003c/h2\u003e\n\u003cdiv\u003e\n    \u003cimg width=\"500\" alt=\"teaser\" src=\"doc\\openai-whisper.jpg\"\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n    \u003ca href=\"https://pypi.org/project/whisperplus\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/pyversions/whisperplus.svg?color=%2334D058\" alt=\"Supported Python versions\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://badge.fury.io/py/whisperplus\"\u003e\u003cimg src=\"https://badge.fury.io/py/whisperplus.svg\" alt=\"pypi version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/spaces/ArtGAN/Audio-WebUI\"\u003e\u003cimg src=\"https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg\" alt=\"HuggingFace Spaces\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\n## 🛠️ Installation\n\n```bash\npip install whisperplus git+https://github.com/huggingface/transformers\npip install flash-attn --no-build-isolation\n```\n\n## 🤗 Model Hub\n\nYou can find the models on the [HuggingFace Model Hub](https://huggingface.co/models?search=whisper)\n\n## 🎙️ Usage\n\nTo use the whisperplus library, follow the steps below for different tasks:\n\n### 🎵 Youtube URL to Audio\n\n```python\nfrom whisperplus import SpeechToTextPipeline, download_youtube_to_mp3\nfrom transformers import BitsAndBytesConfig, HqqConfig\nimport torch\n\nurl = \"https://www.youtube.com/watch?v=di3rHkEZuUw\"\naudio_path = download_youtube_to_mp3(url, output_dir=\"downloads\", filename=\"test\")\n\nhqq_config = HqqConfig(\n    nbits=4,\n    group_size=64,\n    quant_zero=False,\n    quant_scale=False,\n    axis=0,\n    offload_meta=False,\n)  # axis=0 is used by default\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_compute_dtype=torch.bfloat16,\n    bnb_4bit_use_double_quant=True,\n)\n\npipeline = SpeechToTextPipeline(\n    model_id=\"distil-whisper/distil-large-v3\",\n    quant_config=hqq_config,\n    flash_attention_2=True,\n)\n\ntranscript = pipeline(\n    audio_path=audio_path,\n    chunk_length_s=30,\n    stride_length_s=5,\n    max_new_tokens=128,\n    batch_size=100,\n    language=\"english\",\n    return_timestamps=False,\n)\n\nprint(transcript)\n```\n\n### 🍎 Apple MLX\n\n```python\nfrom whisperplus.pipelines import mlx_whisper\nfrom whisperplus import download_youtube_to_mp3\n\nurl = \"https://www.youtube.com/watch?v=1__CAdTJ5JU\"\naudio_path = download_youtube_to_mp3(url)\n\ntext = mlx_whisper.transcribe(\n    audio_path, path_or_hf_repo=\"mlx-community/whisper-large-v3-mlx\"\n)[\"text\"]\nprint(text)\n```\n\n### 🍏 Lightning Mlx Whisper\n\n```python\nfrom whisperplus.pipelines.lightning_whisper_mlx import LightningWhisperMLX\nfrom whisperplus import download_youtube_to_mp3\n\nurl = \"https://www.youtube.com/watch?v=1__CAdTJ5JU\"\naudio_path = download_youtube_to_mp3(url)\n\nwhisper = LightningWhisperMLX(model=\"distil-large-v3\", batch_size=12, quant=None)\noutput = whisper.transcribe(audio_path=audio_path)[\"text\"]\n```\n\n### 📰 Summarization\n\n```python\nfrom whisperplus.pipelines.summarization import TextSummarizationPipeline\n\nsummarizer = TextSummarizationPipeline(model_id=\"facebook/bart-large-cnn\")\nsummary = summarizer.summarize(transcript)\nprint(summary[0][\"summary_text\"])\n```\n\n### 📰 Long Text Support Summarization\n\n```python\nfrom whisperplus.pipelines.long_text_summarization import LongTextSummarizationPipeline\n\nsummarizer = LongTextSummarizationPipeline(model_id=\"facebook/bart-large-cnn\")\nsummary_text = summarizer.summarize(transcript)\nprint(summary_text)\n```\n\n### 💬 Speaker Diarization\n\nYou must confirm the licensing permissions of these two models.\n\n- https://huggingface.co/pyannote/speaker-diarization-3.1\n- https://huggingface.co/pyannote/segmentation-3.0\n\n```bash\npip install -r requirements/speaker_diarization.txt\npip install -U \"huggingface_hub[cli]\"\nhuggingface-cli login\n```\n\n```python\nfrom whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline\nfrom whisperplus import download_youtube_to_mp3, format_speech_to_dialogue\n\naudio_path = download_youtube_to_mp3(\"https://www.youtube.com/watch?v=mRB14sFHw2E\")\n\ndevice = \"cuda\"  # cpu or mps\npipeline = ASRDiarizationPipeline.from_pretrained(\n    asr_model=\"openai/whisper-large-v3\",\n    diarizer_model=\"pyannote/speaker-diarization-3.1\",\n    use_auth_token=False,\n    chunk_length_s=30,\n    device=device,\n)\n\noutput_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)\ndialogue = format_speech_to_dialogue(output_text)\nprint(dialogue)\n```\n\n### ⭐ RAG - Chat with Video(LanceDB)\n\n```bash\npip install sentence-transformers ctransformers langchain\n```\n\n```python\nfrom whisperplus.pipelines.chatbot import ChatWithVideo\n\nchat = ChatWithVideo(\n    input_file=\"trascript.txt\",\n    llm_model_name=\"TheBloke/Mistral-7B-v0.1-GGUF\",\n    llm_model_file=\"mistral-7b-v0.1.Q4_K_M.gguf\",\n    llm_model_type=\"mistral\",\n    embedding_model_name=\"sentence-transformers/all-MiniLM-L6-v2\",\n)\n\nquery = \"what is this video about ?\"\nresponse = chat.run_query(query)\nprint(response)\n```\n\n### 🌠 RAG - Chat with Video(AutoLLM)\n\n```bash\npip install autollm\u003e=0.1.9\n```\n\n```python\nfrom whisperplus.pipelines.autollm_chatbot import AutoLLMChatWithVideo\n\n# service_context_params\nsystem_prompt = \"\"\"\nYou are an friendly ai assistant that help users find the most relevant and accurate answers\nto their questions based on the documents you have access to.\nWhen answering the questions, mostly rely on the info in documents.\n\"\"\"\nquery_wrapper_prompt = \"\"\"\nThe document information is below.\n---------------------\n{context_str}\n---------------------\nUsing the document information and mostly relying on it,\nanswer the query.\nQuery: {query_str}\nAnswer:\n\"\"\"\n\nchat = AutoLLMChatWithVideo(\n    input_file=\"input_dir\",  # path of mp3 file\n    openai_key=\"YOUR_OPENAI_KEY\",  # optional\n    huggingface_key=\"YOUR_HUGGINGFACE_KEY\",  # optional\n    llm_model=\"gpt-3.5-turbo\",\n    llm_max_tokens=\"256\",\n    llm_temperature=\"0.1\",\n    system_prompt=system_prompt,\n    query_wrapper_prompt=query_wrapper_prompt,\n    embed_model=\"huggingface/BAAI/bge-large-zh\",  # \"text-embedding-ada-002\"\n)\n\nquery = \"what is this video about ?\"\nresponse = chat.run_query(query)\nprint(response)\n```\n\n### 🎙️ Text to Speech\n\n```python\nfrom whisperplus.pipelines.text2speech import TextToSpeechPipeline\n\ntts = TextToSpeechPipeline(model_id=\"suno/bark\")\naudio = tts(text=\"Hello World\", voice_preset=\"v2/en_speaker_6\")\n```\n\n### 🎥 AutoCaption\n\n```bash\npip install moviepy\napt install imagemagick libmagick++-dev\ncat /etc/ImageMagick-6/policy.xml | sed 's/none/read,write/g'\u003e /etc/ImageMagick-6/policy.xml\n```\n\n```python\nfrom whisperplus.pipelines.whisper_autocaption import WhisperAutoCaptionPipeline\nfrom whisperplus import download_youtube_to_mp4\n\nvideo_path = download_youtube_to_mp4(\n    \"https://www.youtube.com/watch?v=di3rHkEZuUw\",\n    output_dir=\"downloads\",\n    filename=\"test\",\n)  # Optional\n\ncaption = WhisperAutoCaptionPipeline(model_id=\"openai/whisper-large-v3\")\ncaption(video_path=video_path, output_path=\"output.mp4\", language=\"english\")\n```\n\n## 😍 Contributing\n\n```bash\npip install pre-commit\npre-commit install\npre-commit run --all-files\n```\n\n## 📜 License\n\nThis project is licensed under the terms of the Apache License 2.0.\n\n## 🤗 Citation\n\n```bibtex\n@misc{radford2022whisper,\n  doi = {10.48550/ARXIV.2212.04356},\n  url = {https://arxiv.org/abs/2212.04356},\n  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},\n  title = {Robust Speech Recognition via Large-Scale Weak Supervision},\n  publisher = {arXiv},\n  year = {2022},\n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkadirnar%2Fwhisper-plus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkadirnar%2Fwhisper-plus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkadirnar%2Fwhisper-plus/lists"}