{"id":13494384,"url":"https://github.com/argmaxinc/WhisperKit","last_synced_at":"2025-03-28T14:31:05.540Z","repository":{"id":220017562,"uuid":"748528018","full_name":"argmaxinc/WhisperKit","owner":"argmaxinc","description":"On-device Speech Recognition for Apple Silicon","archived":false,"fork":false,"pushed_at":"2025-02-22T02:45:54.000Z","size":2407,"stargazers_count":4421,"open_issues_count":70,"forks_count":376,"subscribers_count":43,"default_branch":"main","last_synced_at":"2025-03-24T12:53:44.028Z","etag":null,"topics":["inference","ios","macos","speech-recognition","swift","transformers","visionos","watchos","whisper"],"latest_commit_sha":null,"homepage":"http://argmaxinc.com/blog/whisperkit","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/argmaxinc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-26T07:11:52.000Z","updated_at":"2025-03-24T11:49:30.000Z","dependencies_parsed_at":"2024-02-14T06:32:13.378Z","dependency_job_id":"638595df-7dd1-4461-b1e0-9ea355af1d59","html_url":"https://github.com/argmaxinc/WhisperKit","commit_stats":{"total_commits":128,"total_committers":26,"mean_commits":4.923076923076923,"dds":0.640625,"last_synced_commit":"c03017fd592ab3865ae008f59bac0442f19c5ca5"},"previous_names":["argmaxinc/whisperkit"],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2FWhisperKit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2FWhisperKit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2FWhisperKit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2FWhisperKit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/argmaxinc","download_url":"https://codeload.github.com/argmaxinc/WhisperKit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245276154,"owners_count":20588893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","ios","macos","speech-recognition","swift","transformers","visionos","watchos","whisper"],"created_at":"2024-07-31T19:01:24.502Z","updated_at":"2025-03-28T14:31:05.534Z","avatar_url":"https://github.com/argmaxinc.png","language":"Swift","readme":"\n\u003cdiv align=\"center\"\u003e\n  \n\u003ca href=\"https://github.com/argmaxinc/WhisperKit#gh-light-mode-only\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/f0699c07-c29f-45b6-a9c6-f6d491b8f791\" alt=\"WhisperKit\" width=\"20%\" /\u003e\n\u003c/a\u003e\n\n\u003ca href=\"https://github.com/argmaxinc/WhisperKit#gh-dark-mode-only\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/1be5e31c-de42-40ab-9b85-790cb911ed47\" alt=\"WhisperKit\" width=\"20%\" /\u003e\n\u003c/a\u003e\n\n# WhisperKit\n\n[![Tests](https://github.com/argmaxinc/whisperkit/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/argmaxinc/whisperkit/actions/workflows/pre-release-tests.yml)\n[![License](https://img.shields.io/github/license/argmaxinc/whisperkit?logo=github\u0026logoColor=969da4\u0026label=License\u0026labelColor=353a41\u0026color=32d058)](LICENSE.md)\n[![Supported Swift Version](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fargmaxinc%2FWhisperKit%2Fbadge%3Ftype%3Dswift-versions\u0026labelColor=353a41\u0026color=32d058)](https://swiftpackageindex.com/argmaxinc/WhisperKit) [![Supported Platforms](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fargmaxinc%2FWhisperKit%2Fbadge%3Ftype%3Dplatforms\u0026labelColor=353a41\u0026color=32d058)](https://swiftpackageindex.com/argmaxinc/WhisperKit)\n[![Discord](https://img.shields.io/discord/1171912382512115722?style=flat\u0026logo=discord\u0026logoColor=969da4\u0026label=Discord\u0026labelColor=353a41\u0026color=32d058\u0026link=https%3A%2F%2Fdiscord.gg%2FG5F5GZGecC)](https://discord.gg/G5F5GZGecC)\n\n\n\u003c/div\u003e\n\nWhisperKit is an [Argmax](https://www.takeargmax.com) framework for deploying state-of-the-art speech-to-text systems (e.g. [Whisper](https://github.com/openai/whisper)) on device with advanced features such as real-time streaming, word timestamps, voice activity detection, and more.\n\n[[TestFlight Demo App]](https://testflight.apple.com/join/LPVOyJZW) [[Python Tools]](https://github.com/argmaxinc/whisperkittools) [[Benchmarks \u0026 Device Support]](https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks) [[WhisperKit Android]](https://github.com/argmaxinc/WhisperKitAndroid)\n\n\u003e [!IMPORTANT]\n\u003e If you are looking for more features such as speaker diarization and upgraded performance, check out [WhisperKit Pro](https://huggingface.co/argmaxinc/whisperkit-pro) and [SpeakerKit Pro](https://huggingface.co/argmaxinc/speakerkit-pro)! For commercial use or evaluation, please reach out to [whisperkitpro@argmaxinc.com](mailto:whisperkitpro@argmaxinc.com).\n\n## Table of Contents\n\n- [Installation](#installation)\n  - [Swift Package Manager](#swift-package-manager)\n  - [Prerequisites](#prerequisites)\n  - [Xcode Steps](#xcode-steps)\n  - [Package.swift](#packageswift)\n  - [Homebrew](#homebrew)\n- [Getting Started](#getting-started)\n  - [Quick Example](#quick-example)\n  - [Model Selection](#model-selection)\n  - [Generating Models](#generating-models)\n  - [Swift CLI](#swift-cli)\n- [Contributing \\\u0026 Roadmap](#contributing--roadmap)\n- [License](#license)\n- [Citation](#citation)\n\n## Installation\n\n### Swift Package Manager\n\nWhisperKit can be integrated into your Swift project using the Swift Package Manager.\n\n### Prerequisites\n\n- macOS 14.0 or later.\n- Xcode 15.0 or later.\n\n### Xcode Steps\n\n1. Open your Swift project in Xcode.\n2. Navigate to `File` \u003e `Add Package Dependencies...`.\n3. Enter the package repository URL: `https://github.com/argmaxinc/whisperkit`.\n4. Choose the version range or specific version.\n5. Click `Finish` to add WhisperKit to your project.\n\n### Package.swift\n\nIf you're using WhisperKit as part of a swift package, you can include it in your Package.swift dependencies as follows:\n\n```swift\ndependencies: [\n    .package(url: \"https://github.com/argmaxinc/WhisperKit.git\", from: \"0.9.0\"),\n],\n```\n\nThen add `WhisperKit` as a dependency for your target:\n\n```swift\n.target(\n    name: \"YourApp\",\n    dependencies: [\"WhisperKit\"]\n),\n```\n\n### Homebrew\n\nYou can install `WhisperKit` command line app using [Homebrew](https://brew.sh) by running the following command:\n\n```bash\nbrew install whisperkit-cli\n```  \n\n## Getting Started\n\nTo get started with WhisperKit, you need to initialize it in your project.\n\n### Quick Example\n\nThis example demonstrates how to transcribe a local audio file:\n\n```swift\nimport WhisperKit\n\n// Initialize WhisperKit with default settings\nTask {\n   let pipe = try? await WhisperKit()\n   let transcription = try? await pipe!.transcribe(audioPath: \"path/to/your/audio.{wav,mp3,m4a,flac}\")?.text\n    print(transcription)\n}\n```\n\n### Model Selection\n\nWhisperKit automatically downloads the recommended model for the device if not specified. You can also select a specific model by passing in the model name:\n\n```swift\nlet pipe = try? await WhisperKit(WhisperKitConfig(model: \"large-v3\"))\n```\n\nThis method also supports glob search, so you can use wildcards to select a model:\n\n```swift\nlet pipe = try? await WhisperKit(WhisperKitConfig(model: \"distil*large-v3\"))\n```\n\nNote that the model search must return a single model from the source repo, otherwise an error will be thrown.\n\nFor a list of available models, see our [HuggingFace repo](https://huggingface.co/argmaxinc/whisperkit-coreml).\n\n### Generating Models\n\nWhisperKit also comes with the supporting repo [`whisperkittools`](https://github.com/argmaxinc/whisperkittools) which lets you create and deploy your own fine tuned versions of Whisper in CoreML format to HuggingFace. Once generated, they can be loaded by simply changing the repo name to the one used to upload the model:\n\n```swift\nlet config = WhisperKitConfig(model: \"large-v3\", modelRepo: \"username/your-model-repo\")\nlet pipe = try? await WhisperKit(config)\n```\n\n### Swift CLI\n\nThe Swift CLI allows for quick testing and debugging outside of an Xcode project. To install it, run the following:\n\n```bash\ngit clone https://github.com/argmaxinc/whisperkit.git\ncd whisperkit\n```\n\nThen, setup the environment and download your desired model.\n\n```bash\nmake setup\nmake download-model MODEL=large-v3\n```\n\n**Note**:\n\n1. This will download only the model specified by `MODEL` (see what's available in our [HuggingFace repo](https://huggingface.co/argmaxinc/whisperkit-coreml), where we use the prefix `openai_whisper-{MODEL}`)\n2. Before running `download-model`, make sure [git-lfs](https://git-lfs.com) is installed\n\nIf you would like download all available models to your local folder, use this command instead:\n\n```bash\nmake download-models\n```\n\nYou can then run them via the CLI with:\n\n```bash\nswift run whisperkit-cli transcribe --model-path \"Models/whisperkit-coreml/openai_whisper-large-v3\" --audio-path \"path/to/your/audio.{wav,mp3,m4a,flac}\" \n```\n\nWhich should print a transcription of the audio file. If you would like to stream the audio directly from a microphone, use:\n\n```bash\nswift run whisperkit-cli transcribe --model-path \"Models/whisperkit-coreml/openai_whisper-large-v3\" --stream\n```\n\n## Contributing \u0026 Roadmap\n\nOur goal is to make WhisperKit better and better over time and we'd love your help! Just search the code for \"TODO\" for a variety of features that are yet to be built. Please refer to our [contribution guidelines](CONTRIBUTING.md) for submitting issues, pull requests, and coding standards, where we also have a public roadmap of features we are looking forward to building in the future.\n\n## License\n\nWhisperKit is released under the MIT License. See [LICENSE](LICENSE) for more details.\n\n## Citation\n\nIf you use WhisperKit for something cool or just find it useful, please drop us a note at [info@argmaxinc.com](mailto:info@argmaxinc.com)!\n\nIf you use WhisperKit for academic work, here is the BibTeX:\n\n```bibtex\n@misc{whisperkit-argmax,\n   title = {WhisperKit},\n   author = {Argmax, Inc.},\n   year = {2024},\n   URL = {https://github.com/argmaxinc/WhisperKit}\n}\n```\n","funding_links":[],"categories":["Swift","其他_机器学习与深度学习","Repos","Related Projects","Libraries"],"sub_categories":["AI"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fargmaxinc%2FWhisperKit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fargmaxinc%2FWhisperKit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fargmaxinc%2FWhisperKit/lists"}