https://github.com/compiler-inc/transcriber

A modern, Swift-native wrapper around Apple's Speech framework and SFSpeechRecognizer that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.
https://github.com/compiler-inc/transcriber

actor asr ios macos sfspeechrecognizer swift

Last synced: 12 months ago
JSON representation

Host: GitHub
URL: https://github.com/compiler-inc/transcriber
Owner: Compiler-Inc
Created: 2025-02-19T23:45:49.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-30T00:53:48.000Z (about 1 year ago)
Last Synced: 2025-06-03T09:17:07.296Z (about 1 year ago)
Topics: actor, asr, ios, macos, sfspeechrecognizer, swift
Language: Swift
Homepage:
Size: 112 KB
Stars: 14
Watchers: 4
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Transcriber

A modern, Swift-native wrapper around Apple's `Speech` framework and `SFSpeechRecognizer` that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.

## Features

- ✨ Modern Swift concurrency with async/await

- 🔒 Thread-safe actor-based design

- 🎯 Automatic silence detection using RMS power analysis

- 🔊 Support for custom language models

- 📱 Works across iOS, macOS, and other Apple platforms

- 💻 SwiftUI-ready with MVVM support

- 🔍 Comprehensive error handling

- 📊 Debug logging support

## Requirements

- iOS 17.0+ / macOS 14.0+

- Swift 5.9+

- Xcode 15.0+

## Installation

### Swift Package Manager

Add the following to your `Package.swift` file:

```swift

dependencies: [

    .package(url: "https://github.com/Compiler-Inc/Transcriber.git", from: "0.1.1")

]

```

Or in Xcode:

1. File > Add Packages...

2. Enter `https://github.com/Compiler-Inc/Transcriber.git`

3. Select "Up to Next Major Version" with "0.1.1"

### Privacy Keys

The service requires microphone and speech recognition access. Add these keys to your `Info.plist`:

```xml

NSMicrophoneUsageDescription

We need microphone access to transcribe your speech.

NSSpeechRecognitionUsageDescription

We need speech recognition to convert your voice to text.

```

Or in Xcode:

1. Select your project in the sidebar

2. Select your target

3. Select the "Info" tab

4. Add `Privacy - Microphone Usage Description` and `Privacy - Speech Recognition Usage Description`

## Usage

### Basic Implementation

The simplest way to use the service is with the default configuration:

```swift

func startRecording() async throws {

    // Initialize with default configuration

    let transcriber = Transcriber()

    

    // Request authorization

    let status = await transcriber.requestAuthorization()

    guard status == .authorized else {

        throw TranscriberError.notAuthorized

    }

    

    // Start recording and receive transcriptions

    let stream = try await transcriber.startStream()

    for try await transcription in stream {

        print("Transcribed text: \(transcription)")

    }

}

```

### Configuration Options

The service is highly configurable through defining your own `TranscriberConfiguration`.

```swift

    let myConfig = TranscriberConfiguration(

        appIdentifier: "com.myapp.speech",

        locale: .current,                       // Recognition language

        silenceThreshold: 0.01,                 // RMS power threshold (0.0 to 1.0)

        silenceDuration: 2,                     // Duration of silence before stopping

        languageModelInfo: nil,                 // For domain-specific recognition

        requiresOnDeviceRecognition: false,     // Force on-device processing

        shouldReportPartialResults: true,       // Get results as they're processed

        contextualStrings: ["Custom", "Words"], // Improve recognition of specific terms

        taskHint: .unspecified,                 // Optimize for specific speech types

        addsPunctuation: true                   // Automatic punctuation

    )    

```

### Using in SwiftUI

For SwiftUI applications, we provide a protocol-based MVVM pattern:

```swift

// 1. Create your view model

@Observable

@MainActor

class MyViewModel: Transcribable {

    public var isRecording = false

    public var transcribedText = ""

    public var rmsLevel: Float = 0

    public var authStatus: SFSpeechRecognizerAuthorizationStatus = .notDetermined

    public var error: Error?

    

    public let transcriber: Transcriber?

    private var recordingTask: Task?

    

    init() {

        self.transcriber = Transcriber()

    }

    

    // Required protocol methods

    public func requestAuthorization() async throws {

        guard let transcriber else {

            throw TrannscriberError.noRecognizer

        }

        authStatus = await transcriber.requestAuthorization()

        guard authStatus == .authorized else {

            throw TranscriberError.notAuthorized

        }

    }

    

    public func toggleRecording() {

        guard let transcriber else {

            error = TranscriberError.noRecognizer

            return

        }

        

        if isRecording {

            recordingTask?.cancel()

            recordingTask = nil

            isRecording = false

        } else {

            recordingTask = Task {

                do {

                    isRecording = true

                    let stream = try await transcriber.startRecordingStream()

                    

                    for try await signal in stream {

                        switch signal {

                            case .rms(let float):

                                rmsLevel = float

                            case .transcription(let string):

                                transcribedText = string

                        }

                    }

                    

                    isRecording = false

                } catch {

                    self.error = error

                    isRecording = false

                }

            }

        }

    }

}

// 2. Use in your SwiftUI view

struct MySpeechView: View {

    @State private var viewModel = MyViewModel()

    

    var body: some View {

        VStack {

            Text(viewModel.transcribedText)

            Button(viewModel.isRecording ? "Stop" : "Start") {

                viewModel.toggleRecording()

            }

            .disabled(viewModel.authStatus != .authorized)

            

            if let error = viewModel.error {

                Text(error.localizedDescription)

                    .foregroundColor(.red)

            }

        }

        .task {

            try? await viewModel.requestAuthorization()

        }

    }

}

```

## Advanced Features

### Debug Logging

Enable detailed logging for debugging:

```swift

let transcriber = Transcriber(debugLogging: true)

```

### Custom Language Models

Support for custom language models with version tracking:

```swift

let model = LanguageModelInfo(url: modelURL,version: "2.0-beta")

let config = TranscriberConfiguration(languageModelInfo: model)

```

You can easily build `SFCustomLanguageModelData` models with our [SpeechModelBuilder CLI Tool](https://github.com/Compiler-Inc/SpeechModelBuilder)

### Silence Detection

Automatic silence detection using RMS power analysis with configurable threshold and duration:

```swift

struct SensitiveConfig: TranscriberConfiguration {

    var silenceThreshold: Float = 0.001  // Very sensitive

    var silenceDuration: TimeInterval = 2.0  // Longer confirmation

    // ... other properties

}

```

## License

This project is licensed under the MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/compiler-inc/transcriber

Awesome Lists containing this project

README