https://github.com/compiler-inc/transcriber
A modern, Swift-native wrapper around Apple's Speech framework and SFSpeechRecognizer that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.
https://github.com/compiler-inc/transcriber
actor asr ios macos sfspeechrecognizer swift
Last synced: 12 months ago
JSON representation
A modern, Swift-native wrapper around Apple's Speech framework and SFSpeechRecognizer that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.
- Host: GitHub
- URL: https://github.com/compiler-inc/transcriber
- Owner: Compiler-Inc
- Created: 2025-02-19T23:45:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-30T00:53:48.000Z (about 1 year ago)
- Last Synced: 2025-06-03T09:17:07.296Z (about 1 year ago)
- Topics: actor, asr, ios, macos, sfspeechrecognizer, swift
- Language: Swift
- Homepage:
- Size: 112 KB
- Stars: 14
- Watchers: 4
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Transcriber
A modern, Swift-native wrapper around Apple's `Speech` framework and `SFSpeechRecognizer` that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.
## Features
- ✨ Modern Swift concurrency with async/await
- 🔒 Thread-safe actor-based design
- 🎯 Automatic silence detection using RMS power analysis
- 🔊 Support for custom language models
- 📱 Works across iOS, macOS, and other Apple platforms
- 💻 SwiftUI-ready with MVVM support
- 🔍 Comprehensive error handling
- 📊 Debug logging support
## Requirements
- iOS 17.0+ / macOS 14.0+
- Swift 5.9+
- Xcode 15.0+
## Installation
### Swift Package Manager
Add the following to your `Package.swift` file:
```swift
dependencies: [
.package(url: "https://github.com/Compiler-Inc/Transcriber.git", from: "0.1.1")
]
```
Or in Xcode:
1. File > Add Packages...
2. Enter `https://github.com/Compiler-Inc/Transcriber.git`
3. Select "Up to Next Major Version" with "0.1.1"
### Privacy Keys
The service requires microphone and speech recognition access. Add these keys to your `Info.plist`:
```xml
NSMicrophoneUsageDescription
We need microphone access to transcribe your speech.
NSSpeechRecognitionUsageDescription
We need speech recognition to convert your voice to text.
```
Or in Xcode:
1. Select your project in the sidebar
2. Select your target
3. Select the "Info" tab
4. Add `Privacy - Microphone Usage Description` and `Privacy - Speech Recognition Usage Description`
## Usage
### Basic Implementation
The simplest way to use the service is with the default configuration:
```swift
func startRecording() async throws {
// Initialize with default configuration
let transcriber = Transcriber()
// Request authorization
let status = await transcriber.requestAuthorization()
guard status == .authorized else {
throw TranscriberError.notAuthorized
}
// Start recording and receive transcriptions
let stream = try await transcriber.startStream()
for try await transcription in stream {
print("Transcribed text: \(transcription)")
}
}
```
### Configuration Options
The service is highly configurable through defining your own `TranscriberConfiguration`.
```swift
let myConfig = TranscriberConfiguration(
appIdentifier: "com.myapp.speech",
locale: .current, // Recognition language
silenceThreshold: 0.01, // RMS power threshold (0.0 to 1.0)
silenceDuration: 2, // Duration of silence before stopping
languageModelInfo: nil, // For domain-specific recognition
requiresOnDeviceRecognition: false, // Force on-device processing
shouldReportPartialResults: true, // Get results as they're processed
contextualStrings: ["Custom", "Words"], // Improve recognition of specific terms
taskHint: .unspecified, // Optimize for specific speech types
addsPunctuation: true // Automatic punctuation
)
```
### Using in SwiftUI
For SwiftUI applications, we provide a protocol-based MVVM pattern:
```swift
// 1. Create your view model
@Observable
@MainActor
class MyViewModel: Transcribable {
public var isRecording = false
public var transcribedText = ""
public var rmsLevel: Float = 0
public var authStatus: SFSpeechRecognizerAuthorizationStatus = .notDetermined
public var error: Error?
public let transcriber: Transcriber?
private var recordingTask: Task?
init() {
self.transcriber = Transcriber()
}
// Required protocol methods
public func requestAuthorization() async throws {
guard let transcriber else {
throw TrannscriberError.noRecognizer
}
authStatus = await transcriber.requestAuthorization()
guard authStatus == .authorized else {
throw TranscriberError.notAuthorized
}
}
public func toggleRecording() {
guard let transcriber else {
error = TranscriberError.noRecognizer
return
}
if isRecording {
recordingTask?.cancel()
recordingTask = nil
isRecording = false
} else {
recordingTask = Task {
do {
isRecording = true
let stream = try await transcriber.startRecordingStream()
for try await signal in stream {
switch signal {
case .rms(let float):
rmsLevel = float
case .transcription(let string):
transcribedText = string
}
}
isRecording = false
} catch {
self.error = error
isRecording = false
}
}
}
}
}
// 2. Use in your SwiftUI view
struct MySpeechView: View {
@State private var viewModel = MyViewModel()
var body: some View {
VStack {
Text(viewModel.transcribedText)
Button(viewModel.isRecording ? "Stop" : "Start") {
viewModel.toggleRecording()
}
.disabled(viewModel.authStatus != .authorized)
if let error = viewModel.error {
Text(error.localizedDescription)
.foregroundColor(.red)
}
}
.task {
try? await viewModel.requestAuthorization()
}
}
}
```
## Advanced Features
### Debug Logging
Enable detailed logging for debugging:
```swift
let transcriber = Transcriber(debugLogging: true)
```
### Custom Language Models
Support for custom language models with version tracking:
```swift
let model = LanguageModelInfo(url: modelURL,version: "2.0-beta")
let config = TranscriberConfiguration(languageModelInfo: model)
```
You can easily build `SFCustomLanguageModelData` models with our [SpeechModelBuilder CLI Tool](https://github.com/Compiler-Inc/SpeechModelBuilder)
### Silence Detection
Automatic silence detection using RMS power analysis with configurable threshold and duration:
```swift
struct SensitiveConfig: TranscriberConfiguration {
var silenceThreshold: Float = 0.001 // Very sensitive
var silenceDuration: TimeInterval = 2.0 // Longer confirmation
// ... other properties
}
```
## License
This project is licensed under the MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.