{"id":49036928,"url":"https://github.com/argmaxinc/argmax-oss-swift","last_synced_at":"2026-04-19T12:01:19.067Z","repository":{"id":220017562,"uuid":"748528018","full_name":"argmaxinc/argmax-oss-swift","owner":"argmaxinc","description":"On-device Speech AI for Apple Silicon","archived":false,"fork":false,"pushed_at":"2026-04-14T20:39:53.000Z","size":4273,"stargazers_count":6000,"open_issues_count":109,"forks_count":544,"subscribers_count":43,"default_branch":"main","last_synced_at":"2026-04-16T09:41:54.952Z","etag":null,"topics":["diarization","inference","ios","macos","pyannote","qwen3-tts","speech-recognition","speech-to-text","swift","text-to-speech","transformers","visionos","watchos","whisper"],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/argmaxinc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-01-26T07:11:52.000Z","updated_at":"2026-04-16T09:38:51.000Z","dependencies_parsed_at":"2024-02-14T06:32:13.378Z","dependency_job_id":"638595df-7dd1-4461-b1e0-9ea355af1d59","html_url":"https://github.com/argmaxinc/argmax-oss-swift","commit_stats":{"total_commits":128,"total_committers":26,"mean_commits":4.923076923076923,"dds":0.640625,"last_synced_commit":"c03017fd592ab3865ae008f59bac0442f19c5ca5"},"previous_names":["argmaxinc/whisperkit","argmaxinc/argmax-oss-swift"],"tags_count":37,"template":false,"template_full_name":null,"purl":"pkg:github/argmaxinc/argmax-oss-swift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2Fargmax-oss-swift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2Fargmax-oss-swift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2Fargmax-oss-swift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2Fargmax-oss-swift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/argmaxinc","download_url":"https://codeload.github.com/argmaxinc/argmax-oss-swift/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/argmaxinc%2Fargmax-oss-swift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31926495,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T10:35:34.458Z","status":"ssl_error","status_checked_at":"2026-04-17T10:35:09.472Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diarization","inference","ios","macos","pyannote","qwen3-tts","speech-recognition","speech-to-text","swift","text-to-speech","transformers","visionos","watchos","whisper"],"created_at":"2026-04-19T12:00:45.949Z","updated_at":"2026-04-19T12:01:18.984Z","avatar_url":"https://github.com/argmaxinc.png","language":"Swift","readme":"\n\u003cdiv align=\"center\"\u003e\n\n\u003ca href=\"https://github.com/argmaxinc/argmax-oss-swift#gh-light-mode-only\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/2ef4d2b4-b4f1-4b9b-9590-4e57432633ed\" alt=\"Argmax Logo\" width=\"20%\" /\u003e\n\u003c/a\u003e\n\n\u003ca href=\"https://github.com/argmaxinc/argmax-oss-swift#gh-dark-mode-only\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/6f2c77c4-94b5-4ce5-8647-b177641e6f02\" alt=\"Argmax Logo\" width=\"20%\" /\u003e\n\u003c/a\u003e\n\n# Argmax Open-Source SDK\n\n[![Tests](https://github.com/argmaxinc/argmax-oss-swift/actions/workflows/release-tests.yml/badge.svg)](https://github.com/argmaxinc/argmax-oss-swift/actions/workflows/release-tests.yml)\n[![Supported Swift Version](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fargmaxinc%2Fargmax-oss-swift%2Fbadge%3Ftype%3Dswift-versions\u0026labelColor=353a41\u0026color=32d058)](https://swiftpackageindex.com/argmaxinc/argmax-oss-swift) [![Supported Platforms](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fargmaxinc%2Fargmax-oss-swift%2Fbadge%3Ftype%3Dplatforms\u0026labelColor=353a41\u0026color=32d058)](https://swiftpackageindex.com/argmaxinc/argmax-oss-swift)\n[![License](https://img.shields.io/github/license/argmaxinc/argmax-oss-swift?logo=github\u0026logoColor=969da4\u0026label=License\u0026labelColor=353a41\u0026color=32d058)](LICENSE.md)\n\u003cbr/\u003e\n[![Discord](https://img.shields.io/discord/1171912382512115722?style=flat\u0026logo=discord\u0026logoColor=969da4\u0026label=Discord\u0026labelColor=353a41\u0026color=32d058\u0026link=https%3A%2F%2Fdiscord.gg%2FG5F5GZGecC)](https://discord.gg/G5F5GZGecC)\n[![Hugging Face](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhuggingface.co%2Fapi%2Fmodels%2Fargmaxinc%2Fwhisperkit-coreml\u0026query=%24.downloads\u0026suffix=%2Fmonth\u0026logo=huggingface\u0026logoColor=969da4\u0026label=Downloads\u0026labelColor=353a41\u0026color=32d058)](https://huggingface.co/argmaxinc/whisperkit-coreml)\n\n\n\u003c/div\u003e\n\n[Argmax](https://argmaxinc.com/blog) Open-Source SDK Swift is a collection of turn-key on-device inference frameworks:\n- **WhisperKit** for speech-to-text with OpenAI Whisper\n- **SpeakerKit** for speaker diarization Pyannote\n- **TTSKit** for text-to-speech with Qwen-TTS\n\n\u003e [!IMPORTANT]\n\u003e [Argmax Pro SDK](https://www.argmaxinc.com/blog/argmax-sdk-2) supports additional models and advanced features such as:\n\u003e - Real-time transcription with speakers\n\u003e - Frontier accuracy for your use case with custom vocabulary\n\u003e - Argmax Local Server for non-native apps\n\u003e - Android support with Argmax Pro SDK Kotlin\n\u003e \n\u003e Further resources:\n\u003e - [Open-source vs Pro SDK](https://app.argmaxinc.com/docs/wiki/open-source-vs-pro-sdk)\n\u003e - [Try Pro SDK on TestFlight](https://testflight.apple.com/join/Q1cywTJw)\n\u003e - [Model Gallery](https://app.argmaxinc.com/docs/models)\n\n## Table of Contents\n\n- [Installation](#installation)\n  - [Swift Package Manager](#swift-package-manager)\n  - [Prerequisites](#prerequisites)\n  - [Xcode Steps](#xcode-steps)\n  - [Package.swift](#packageswift)\n  - [Homebrew](#homebrew)\n- [Getting Started](#getting-started)\n  - [Quick Example](#quick-example)\n  - [Model Selection](#model-selection)\n  - [Generating Models](#generating-models)\n  - [Swift CLI](#swift-cli)\n  - [Local Server](#local-server)\n    - [Building the Server](#building-the-server)\n    - [Starting the Server](#starting-the-server)\n    - [API Endpoints](#api-endpoints)\n    - [Supported Parameters](#supported-parameters)\n    - [Client Examples](#client-examples)\n    - [Generating the API Specification](#generating-the-api-specification)\n    - [Client Generation](#client-generation)\n    - [API Limitations](#api-limitations)\n    - [Fully Supported Features](#fully-supported-features)\n- [TTSKit](#ttskit)\n  - [Quick Example](#quick-example-1)\n  - [Model Selection](#model-selection-1)\n    - [Custom Voices](#custom-voices)\n    - [Real-Time Streaming Playback](#real-time-streaming-playback)\n  - [Generation Options](#generation-options)\n    - [Style Instructions (1.7B only)](#style-instructions-17b-only)\n  - [Saving Audio](#saving-audio)\n  - [Progress Callbacks](#progress-callbacks)\n  - [Swift CLI](#swift-cli-1)\n  - [Demo App](#demo-app)\n- [SpeakerKit](#speakerkit)\n  - [Quick Example](#quick-example-2)\n  - [Diarization Options](#diarization-options)\n  - [Combining with Transcription](#combining-with-transcription)\n  - [RTTM Output](#rttm-output)\n  - [Swift CLI](#swift-cli-2)\n- [Contributing \\\u0026 Roadmap](#contributing--roadmap)\n- [License](#license)\n- [Citation](#citation)\n\n## Installation\n\n### Swift Package Manager\n\nWhisperKit, TTSKit, and SpeakerKit are separate library products in the same Swift package. Add the package once and pick the products you need. You can also use the `ArgmaxOSS` umbrella product to import everything at once.\n\n### Prerequisites\n\n- macOS 14.0 or later.\n- Xcode 16.0 or later.\n\n### Xcode Steps\n\n1. Open your Swift project in Xcode.\n2. Navigate to `File` \u003e `Add Package Dependencies...`.\n3. Enter the package repository URL: `https://github.com/argmaxinc/argmax-oss-swift`.\n4. Choose the version range or specific version.\n5. When prompted to choose library products, select **ArgmaxOSS** (all kits), or individual kits: **WhisperKit**, **TTSKit**, **SpeakerKit**.\n\n### Package.swift\n\nAdd the package dependency:\n\n```swift\ndependencies: [\n    .package(url: \"https://github.com/argmaxinc/argmax-oss-swift.git\", from: \"0.9.0\"),\n],\n```\n\nThen add the products you need as target dependencies:\n\n```swift\n.target(\n    name: \"YourApp\",\n    dependencies: [\n        // Import everything at once:\n        .product(name: \"ArgmaxOSS\", package: \"argmax-oss-swift\"),\n\n        // Or pick individual kits:\n        // .product(name: \"WhisperKit\", package: \"argmax-oss-swift\"),   // speech-to-text\n        // .product(name: \"TTSKit\", package: \"argmax-oss-swift\"),       // text-to-speech\n        // .product(name: \"SpeakerKit\", package: \"argmax-oss-swift\"),   // speaker diarization\n    ]\n),\n```\n\n### Homebrew\n\nYou can install the command line app using [Homebrew](https://brew.sh) by running the following command:\n\n```bash\nbrew install whisperkit-cli\n```  \n\n## Getting Started\n\nTo get started with WhisperKit, you need to initialize it in your project.\n\n### Quick Example\n\nThis example demonstrates how to transcribe a local audio file:\n\n```swift\nimport WhisperKit\n\n// Initialize WhisperKit with default settings\nTask {\n   let pipe = try? await WhisperKit()\n   let transcription = try? await pipe!.transcribe(audioPath: \"path/to/your/audio.{wav,mp3,m4a,flac}\")?.text\n    print(transcription)\n}\n```\n\n### Model Selection\n\n\u003e [!NOTE]\n\u003e Argmax recommends `large-v3-v20240930_626MB` for maximum multilingual accuracy and `tiny` for the fastest debugging workflow.\n\n\nWhisperKit automatically downloads the recommended model for the device if not specified. You can also select a specific model by passing in the model name:\n\n\n```swift\nlet pipe = try? await WhisperKit(WhisperKitConfig(model: \"large-v3-v20240930_626MB\"))\n```\n\nThis method also supports glob search, so you can use wildcards to select a model:\n\n```swift\nlet pipe = try? await WhisperKit(WhisperKitConfig(model: \"large-v3-v20240930_626MB\"))\n```\n\nNote that the model search must return a single model from the source repo, otherwise an error will be thrown.\n\nFor a list of available models, see our [HuggingFace repo](https://huggingface.co/argmaxinc/whisperkit-coreml).\n\n### Generating Models\n\nWhisperKit also comes with the supporting repo [`whisperkittools`](https://github.com/argmaxinc/whisperkittools) which lets you create and deploy your own fine tuned versions of Whisper in CoreML format to HuggingFace. Once generated, they can be loaded by simply changing the repo name to the one used to upload the model:\n\n```swift\nlet config = WhisperKitConfig(model: \"large-v3-v20240930_626MB\", modelRepo: \"username/your-model-repo\")\nlet pipe = try? await WhisperKit(config)\n```\n\n### Swift CLI\n\nThe Swift CLI allows for quick testing and debugging outside of an Xcode project. To install it, run the following:\n\n```bash\ngit clone https://github.com/argmaxinc/argmax-oss-swift.git\ncd argmax-oss-swift\n```\n\nThen, setup the environment and download your desired model.\n\n```bash\nmake setup\nmake download-model MODEL=large-v3-v20240930_626MB\n```\n\n**Note**:\n\n1. This will download only the model specified by `MODEL` (see what's available in our [HuggingFace repo](https://huggingface.co/argmaxinc/whisperkit-coreml), where we use the prefix `openai_whisper-{MODEL}`)\n2. Before running `download-model`, make sure [git-lfs](https://git-lfs.com) is installed\n\nIf you would like download all available models to your local folder, use this command instead:\n\n```bash\nmake download-models\n```\n\nYou can then run them via the CLI with:\n\n```bash\nswift run argmax-cli transcribe --model-path \"Models/whisperkit-coreml/openai_whisper-large-v3-v20240930_626MB\" --audio-path \"path/to/your/audio.{wav,mp3,m4a,flac}\"\n```\n\nWhich should print a transcription of the audio file. If you would like to stream the audio directly from a microphone, use:\n\n```bash\nswift run argmax-cli transcribe --model-path \"Models/whisperkit-coreml/openai_whisper-large-v3-v20240930_626MB\" --stream\n```\n\n### Local Server\n\nThe Argmax CLI includes a local server that implements the OpenAI Audio API, allowing you to use existing OpenAI SDK clients or generate new ones. The server supports transcription and translation with **output streaming** capabilities (real-time transcription results as they're generated).\n\n\u003e [!NOTE]\n\u003e [Argmax Pro Local Server](https://www.argmaxinc.com/blog/argmax-local-server) provides a real-time streaming transcription with a WebSocket local server that is API-compatible with cloud-based providers such as Deepgram.\n\n\n#### Building the Server\n\n```bash\n# Build with server support\nmake build-local-server\n\n# Or manually with the build flag\nBUILD_ALL=1 swift build --product argmax-cli\n```\n\n#### Starting the Server\n\n```bash\n# Start server with default settings\nBUILD_ALL=1 swift run argmax-cli serve\n\n# Custom host and port\nBUILD_ALL=1 swift run argmax-cli serve --host 0.0.0.0 --port 8080\n\n# With specific model and verbose logging\nBUILD_ALL=1 swift run argmax-cli serve --model tiny --verbose\n\n# See all configurable parameters\nBUILD_ALL=1 swift run argmax-cli serve --help\n```\n\n#### API Endpoints\n\n- **POST** `/v1/audio/transcriptions` - Transcribe audio to text\n- **POST** `/v1/audio/translations` - Translate audio to English\n\n#### Supported Parameters\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `file` | Audio file (wav, mp3, m4a, flac) | Required |\n| `model` | Model identifier | Server default |\n| `language` | Source language code | Auto-detect |\n| `prompt` | Text to guide transcription | None |\n| `response_format` | Output format (json, verbose_json) | verbose_json |\n| `temperature` | Sampling temperature (0.0-1.0) | 0.0 |\n| `timestamp_granularities[]` | Timing detail (word, segment) | segment |\n| `stream` | Enable streaming | false |\n\n#### Client Examples\n\n**Python Client (OpenAI SDK)**\n```bash\ncd Examples/ServeCLIClient/Python\nuv sync\npython whisperkit_client.py transcribe --file audio.wav --language en\npython whisperkit_client.py translate --file audio.wav\n```\n\nQuick Python example:\n```python\nfrom openai import OpenAI\nclient = OpenAI(base_url=\"http://localhost:50060/v1\")\nresult = client.audio.transcriptions.create(\n    file=open(\"audio.wav\", \"rb\"),\n    model=\"tiny\"  # Model parameter is required\n)\nprint(result.text)\n```\n\n**Swift Client (Generated from OpenAPI Spec, see ServeCLIClient/Swift/updateClient.sh)**\n```bash\ncd Examples/ServeCLIClient/Swift\nswift run whisperkit-client transcribe audio.wav --language en\nswift run whisperkit-client translate audio.wav\n```\n\n**CurlClient (Shell Scripts)**\n```bash\ncd Examples/ServeCLIClient/Curl\nchmod +x *.sh\n./transcribe.sh audio.wav --language en\n./translate.sh audio.wav --language es\n./test.sh  # Run comprehensive test suite\n```\n\n#### Generating the API Specification\n\nThe server's OpenAPI specification and code are generated from the official OpenAI API:\n\n```bash\n# Generate latest spec and server code\nmake generate-server\n```\n\n#### Client Generation\n\nYou can generate clients for any language using the OpenAPI specification, for example:\n\n```bash\n# Generate Python client\nswift run swift-openapi-generator generate scripts/specs/localserver_openapi.yaml \\\n  --output-directory python-client \\\n  --mode client \\\n  --mode types\n\n# Generate TypeScript client\nnpx @openapitools/openapi-generator-cli generate \\\n  -i scripts/specs/localserver_openapi.yaml \\\n  -g typescript-fetch \\\n  -o typescript-client\n```\n\n#### API Limitations\n\nCompared to the official OpenAI API, the local server has these limitations:\n\n- **Response formats**: Only `json` and `verbose_json` supported (no plain text, SRT, VTT formats)\n- **Model selection**: Client must launch server with desired model via `--model` flag\n\n#### Fully Supported Features\n\nThe local server fully supports these OpenAI API features:\n\n- **Include parameters**: `logprobs` parameter for detailed token-level log probabilities\n- **Streaming responses**: Server-Sent Events (SSE) for real-time transcription\n- **Timestamp granularities**: Both `word` and `segment` level timing\n- **Language detection**: Automatic language detection or manual specification\n- **Temperature control**: Sampling temperature for transcription randomness\n- **Prompt text**: Text guidance for transcription style and context\n\n## TTSKit\n\nTTSKit is an on-device text-to-speech framework built on Core ML. It runs [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) models entirely on Apple silicon with real-time streaming playback, no server required.\n\n- macOS 15.0 or later.\n- iOS 18.0 or later.\n\n### Quick Example\n\nThis example demonstrates how to generate speech from text:\n\n```swift\nimport TTSKit\n\nTask {\n    let tts = try await TTSKit()\n    let result = try await tts.generate(text: \"Hello from TTSKit!\")\n    print(\"Generated \\(result.audioDuration)s of audio at \\(result.sampleRate)Hz\")\n}\n```\n\n`TTSKit()` automatically downloads the default 0.6B model on first run. The tokenizer and CoreML models are loaded lazily on the first `generate()` call.\n\n### Model Selection\n\nTTSKit ships two model sizes. You can select the model by passing a variant to `TTSKitConfig`:\n\n```swift\n// Fast, runs on all platforms (~1 GB download)\nlet tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_0_6b))\n\n// Higher quality, macOS only (~2.2 GB download, supports style instructions)\nlet tts = try await TTSKit(TTSKitConfig(model: .qwen3TTS_1_7b))\n```\n\nModels are hosted on [HuggingFace](https://huggingface.co/argmaxinc/ttskit-coreml) and cached locally after the first download.\n\n#### Custom Voices\n\nYou can choose from 9 built-in voices and 10 languages:\n\n```swift\nlet result = try await tts.generate(\n    text: \"こんにちは世界\",\n    speaker: .onoAnna,\n    language: .japanese\n)\n```\n\n**Voices:** `.ryan`, `.aiden`, `.onoAnna`, `.sohee`, `.eric`, `.dylan`, `.serena`, `.vivian`, `.uncleFu`\n\n**Languages:** `.english`, `.chinese`, `.japanese`, `.korean`, `.german`, `.french`, `.russian`, `.portuguese`, `.spanish`, `.italian`\n\n#### Real-Time Streaming Playback\n\n`play` streams audio to the device speakers frame-by-frame as it is generated:\n\n```swift\ntry await tts.play(text: \"This starts playing before generation finishes.\")\n```\n\nYou can control how much audio is buffered before playback begins. The default `.auto` strategy measures the first generation step and pre-buffers just enough to avoid underruns:\n\n```swift\ntry await tts.play(\n    text: \"Long passage...\",\n    playbackStrategy: .auto\n)\n```\n\nOther strategies include `.stream` (immediate, no buffer), `.buffered(seconds:)` (fixed pre-buffer), and `.generateFirst` (generate all audio first, then play).\n\n### Generation Options\n\nYou can customize sampling, chunking, and concurrency via `GenerationOptions`:\n\n```swift\n// Defaults recommended by Qwen\nvar options = GenerationOptions()\noptions.temperature = 0.9\noptions.topK = 50\noptions.repetitionPenalty = 1.05\noptions.maxNewTokens = 245\n\n// Long text is automatically split at sentence boundaries\noptions.chunkingStrategy = .sentence\noptions.concurrentWorkerCount = nil  // nil = all chunks run concurrently with a good default for the device\n\nlet result = try await tts.generate(text: longArticle, options: options)\n```\n\n#### Style Instructions (1.7B only)\n\nThe 1.7B model accepts a natural-language style instruction that controls prosody:\n\n```swift\nvar options = GenerationOptions()\noptions.instruction = \"Speak slowly and warmly, like a storyteller.\"\n\nlet result = try await tts.generate(\n    text: \"Once upon a time...\",\n    speaker: .ryan,\n    options: options\n)\n```\n\n### Saving Audio\n\nGenerated audio can be saved to WAV or M4A:\n\n```swift\nlet result = try await tts.generate(text: \"Save me!\")\nlet outputDir = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]\n\n// Save as .wav or .m4a (AAC)\ntry await AudioOutput.saveAudio(result.audio, toFolder: outputDir, filename: \"output\", format: .m4a)\n```\n\n### Progress Callbacks\n\nYou can receive per-step audio during generation. Return `false` from the callback to cancel early:\n\n```swift\nlet result = try await tts.generate(text: \"Hello!\") { progress in\n    print(\"Audio chunk: \\(progress.audio.count) samples\")\n    if let stepTime = progress.stepTime {\n        print(\"First step took \\(stepTime)s\")\n    }\n    return true  // return false to cancel\n}\n```\n\n### Swift CLI\n\nThe TTS command is available through the `argmax-cli` tool. You can generate speech and optionally play it back in real time:\n\n```bash\nswift run argmax-cli tts --text \"Hello from the command line\" --play\nswift run argmax-cli tts --text \"Save to file\" --output-path output.wav\nswift run argmax-cli tts --text \"日本語テスト\" --speaker ono-anna --language japanese\nswift run argmax-cli tts --text-file article.txt --model 1.7b --instruction \"Read cheerfully\"\nswift run argmax-cli tts --help\n```\n\n### Demo App\n\nThe [TTSKitExample](Examples/TTS/TTSKitExample/) example app showcases real-time streaming, model management, waveform visualization, and generation history on macOS and iOS. See the [TTSKitExample README](Examples/TTS/TTSKitExample/README.md) for build instructions.\n\n## SpeakerKit\n\nSpeakerKit is an on-device speaker diarization framework built on Core ML. It runs [Pyannote v4 (community-1)](https://huggingface.co/argmaxinc/speakerkit-coreml) on Apple silicon to label speakers in audio. Read the [blog post](https://www.argmaxinc.com/blog/speakerkit) for architecture details and benchmarks.\n\n- macOS 13.0 or later.\n- iOS 16.0 or later.\n\n### Quick Example\n\nThis example demonstrates how to diarize an audio file:\n\n```swift\nimport SpeakerKit\n\nTask {\n    let speakerKit = try await SpeakerKit()\n\n    let audioArray = try AudioProcessor.loadAudioAsFloatArray(fromPath: \"audio.wav\")\n    let result = try await speakerKit.diarize(audioArray: audioArray)\n\n    print(\"Detected \\(result.speakerCount) speakers\")\n    for segment in result.segments {\n        print(segment)\n    }\n}\n```\n\n`SpeakerKit()` uses `PyannoteConfig()` defaults, automatically downloading models from [HuggingFace](https://huggingface.co/argmaxinc/speakerkit-coreml) on first run. The segmenter and embedder CoreML models are loaded lazily (unless `load` is set on config) on the first `diarize()` call.\n\n### Diarization Options\n\nYou can control speaker detection via `PyannoteDiarizationOptions`:\n\n```swift\nlet audioArray = try AudioProcessor.loadAudioAsFloatArray(fromPath: \"audio.wav\")\nlet options = PyannoteDiarizationOptions(\n    numberOfSpeakers: 2,               // nil = automatic detection\n    clusterDistanceThreshold: 0.6,     // clustering threshold\n    useExclusiveReconciliation: false   // exclusive speaker assignment per frame\n)\nlet result = try await speakerKit.diarize(audioArray: audioArray, options: options)\n```\n\nFor local models, skip the download step:\n\n```swift\nlet config = PyannoteConfig(modelFolder: \"/path/to/models\")\nlet speakerKit = try await SpeakerKit(config)\n```\n\n### Combining with Transcription\n\nSpeakerKit can merge diarization results with WhisperKit transcriptions to produce speaker-attributed segments:\n\n```swift\nimport WhisperKit\nimport SpeakerKit\n\nlet whisperKit = try await WhisperKit()\nlet speakerKit = try await SpeakerKit()\n\nlet audioArray = try AudioProcessor.loadAudioAsFloatArray(fromPath: \"audio.wav\")\nlet transcription = try await whisperKit.transcribe(audioArray: audioArray)\nlet diarization = try await speakerKit.diarize(audioArray: audioArray)\n\nlet speakerSegments = diarization.addSpeakerInfo(to: transcription)\n\nfor group in speakerSegments {\n    for segment in group {\n        print(\"\\(segment.speaker): \\(segment.text)\")\n    }\n}\n```\n\nTwo strategies are available for matching speakers to transcription:\n- `.subsegment` (default) -- splits segments at word gaps, then assigns speakers\n- `.segment` -- assigns a speaker to each transcription segment as a whole\n\n### RTTM Output\n\nGenerate RTTM output:\n\n```swift\nlet speakerKit = try await SpeakerKit()\n\nlet audioArray = try AudioProcessor.loadAudioAsFloatArray(fromPath: \"meeting.wav\")\nlet diarization = try await speakerKit.diarize(audioArray: audioArray)\n\nlet rttmLines = SpeakerKit.generateRTTM(from: diarization, fileName: \"meeting\")\nfor line in rttmLines {\n    print(line)\n}\n```\n\n### Swift CLI\n\nThe diarization commands are available through the `argmax-cli` tool:\n\n```bash\n# Standalone diarization\nswift run argmax-cli diarize --audio-path audio.wav --verbose\n\n# Save RTTM output\nswift run argmax-cli diarize --audio-path audio.wav --rttm-path output.rttm\n\n# Specify number of speakers\nswift run argmax-cli diarize --audio-path audio.wav --num-speakers 3\n\n# Transcription with diarization\nswift run argmax-cli transcribe --audio-path audio.wav --diarization\n\n# See all options\nswift run argmax-cli diarize --help\n```\n\n## Contributing \u0026 Roadmap\n\nOur goal is to make this SDK better and better over time and we'd love your help! Just search the code for \"TODO\" for a variety of features that are yet to be built. Please refer to our [contribution guidelines](CONTRIBUTING.md) for submitting issues, pull requests, and coding standards, where we also have a public roadmap of features we are looking forward to building in the future.\n\n**External dependencies:** `Sources/ArgmaxCore/External/` contains a copy of [swift-transformers](https://github.com/huggingface/swift-transformers) (Hub and Tokenizers modules, v1.1.6) with Jinja-dependent code removed. When updating to a newer version, copy the fresh sources over that directory and re-apply the patches marked with `// Argmax-modification:` (`grep -r \"Argmax-modification:\" Sources/ArgmaxCore/External/`).\n\n## License\n\nArgmax OSS is released under the MIT License. See [LICENSE](LICENSE) for more details.\n\nThis project incorporates third-party software under their own license terms. See [NOTICES](NOTICES) for attributions.\n\n## Citation\n\nIf you use this SDK for something cool or just find it useful, please drop us a note at [info@argmaxinc.com](mailto:info@argmaxinc.com)!\n\nIf you use WhisperKit for academic work, here is the BibTeX:\n\n```bibtex\n@misc{whisperkit-argmax,\n   title = {Argmax OSS: WhisperKit, SpeakerKit and TTSKit},\n   author = {Argmax, Inc.},\n   year = {2024},\n   URL = {https://github.com/argmaxinc/argmax-oss-swift}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fargmaxinc%2Fargmax-oss-swift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fargmaxinc%2Fargmax-oss-swift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fargmaxinc%2Fargmax-oss-swift/lists"}