https://github.com/tornikegomareli/talkify

🦸🏻‍♂️🎺 Talkify is a comprehensive, cross-platform Swift library for adding advanced speech features to your applications. It efficiently manages voice-to-text and text-to-speech capabilities using the power of AVFoundation and Speech frameworks.
https://github.com/tornikegomareli/talkify
audiototext avfoundation ios macos swift synthesizer texttospeech
Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/tornikegomareli/talkify
Owner: tornikegomareli
License: mit
Created: 2023-08-05T08:00:51.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-08-08T17:21:23.000Z (almost 3 years ago)
Last Synced: 2025-07-08T22:45:03.450Z (12 months ago)
Topics: audiototext, avfoundation, ios, macos, swift, synthesizer, texttospeech
Language: Swift
Homepage:
Size: 68.4 KB
Stars: 19
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Talkify: Swift Speech Recognition and Synthesis Library for iOS & macOS













Talkify is a Swift library designed to streamline the process of integrating speech recognition and synthesis capabilities into iOS and macOS applications. The library harnesses the power of native APIs such as SFSpeechRecognizer and AVSpeechSynthesizer, providing a high-level interface that simplifies their usage and handles common tasks, such as managing audio sessions and checking microphone permissions.

The primary component is the Talkify class. This class provides a comprehensive set of methods for managing speech recognition tasks. It establishes and manages an AVAudioEngine instance for audio operations, handles speech recognition requests and tasks, and provides delegate methods to keep your application informed about the status of speech recognition processes. It also integrates with TalkifyRecordingSession to facilitate the audio recording process.

## Requirements

- Swift 5.0 or higher

- iOS 13.0 or higher

- macOS 10.15 or higher

- SPM

## Supported Languages

For text to speech and as well for speech to text

| Language                        | Flag           |

|---------------------------------|----------------|

| English (Australia)             | 🇦🇺             |

| English (United Kingdom)        | 🇬🇧             |

| English (United States)         | 🇺🇸             |

| English (Ireland)               | 🇮🇪             |

| English (South Africa)          | 🇿🇦             |

| 中文(中国)                       | 🇨🇳             |

| 中文(香港)                       | 🇭🇰             |

| 中文(台灣)                       | 🇹🇼             |

| Nederlands (België)             | 🇧🇪             |

| Nederlands (Nederland)          | 🇳🇱             |

| Français (Canada)               | 🇨🇦             |

| Français (France)               | 🇫🇷             |

| Deutsch (Deutschland)           | 🇩🇪             |

| Deutsch (Österreich)            | 🇦🇹             |

| Deutsch (Schweiz)               | 🇨🇭             |

| Italiano (Italia)               | 🇮🇹             |

| 日本語 (日本)                    | 🇯🇵             |

| 한국어 (대한민국)                 | 🇰🇷             |

| Norsk (Norge)                   | 🇳🇴             |

| Polski (Polska)                 | 🇵🇱             |

| Português (Brasil)              | 🇧🇷             |

| Português (Portugal)            | 🇵🇹             |

| Română (România)                | 🇷🇴             |

| Русский (Россия)                | 🇷🇺             |

| Slovenčina (Slovenská republika)| 🇸🇰             |

| Español (Argentina)             | 🇦🇷             |

| Español (México)                | 🇲🇽             |

| Español (España)                | 🇪🇸             |

| Español (Estados Unidos)        | 🇺🇸             |

| Svenska (Sverige)               | 🇸🇪             |

| ไทย (ประเทศไทย)               | 🇹🇭             |

| Türkçe (Türkiye)                | 🇹🇷             |

## Supported Voices

| Language             | Voices |

|----------------------|--------|

| Arabic | Maged |

| Bulgarian | Daria |

| Catalan | Montserrat |

| Czech | Zuzana |

| Danish | Sara |

| German | Anna |

| Greek | Melina |

| Australian English | Karen |

| British English | Daniel |

| Irish English | Moira |

| Indian English | Rishi |

| US English | Samantha, Whisper, Princess, Bells, Organ, BadNews, Bubbles, Junior, Bahh, Deranged, Boing, GoodNews, Zarvox, Ralph, Cellos, Kathy, Fred |

| South African English | Tessa, Trinoids, Albert, Hysterical |

| Spanish | Monica (Neutral), Paulina (Mexican) |

| Finnish | Satu |

| French | Amelie (Canadian), Thomas |

| Hebrew | Carmit |

| Hindi | Lekha |

| Croatian | Lana |

| Hungarian | Mariska |

| Indonesian | Damayanti |

| Italian | Alice |

| Japanese | Kyoko |

| Korean | Yuna |

| Malay | Amira |

| Norwegian | Nora |

| Dutch | Ellen (Belgium), Xander (Netherlands) |

| Polish | Zosia |

| Portuguese | Luciana (Brazil), Joana (Portugal) |

| Romanian | Ioana |

| Russian | Milena |

| Slovak | Laura |

| Swedish | Alva |

| Thai | Kanya |

| Turkish | Yelda |

| Ukrainian | Lesya |

| Vietnamese | Linh |

| Chinese | Tingting (China), Sinji (Hong Kong), Meijia (Taiwan) |

## Features

- [x] Text to Speech on different languages with different type of voice models.

- [x] Listens to your voice and provides text, based on your setup.

- [x] You can get all available list of voices programatically

- [x] With Ergonomics while using

- [x] Dedicated delegates to control recording/speaking/reading states on your side.

- [ ] RxSwift, Combine, TCA Support

## Installation

Talkify is available through the [Swift Package Manager](https://swift.org/package-manager/). 

## Swift Package Manager

To integrate Talkify into your project using SPM, you can add the package dependency to your `Package.swift` file:

```swift

dependencies: [

    .package(url: "https://github.com/tornikegomareli/Talkify.git", .upToNextMajor(from: "0.1.0"))

]

```

## Prerequisites

Before you start using Talkify, there are a few setup steps you need to ensure:

### 1. Permissions in Info.plist

To use the recording features of Talkify, you need to request microphone access. Additionally, for speech recognition, you must request speech recognition authorization. Add the following keys to your `Info.plist`:

```xml

NSMicrophoneUsageDescription

We need access to the microphone to record your voice.

NSSpeechRecognitionUsageDescription

We need access to speech recognition to convert your voice into text.

```

### 2. Enabling Audio Input (macOS Only)

For macOS users:

Open your Xcode project.

Navigate to the "Signing & Capabilities" tab.

In the "Resource Access" section, ensure that "Audio Input" is selected. This allows recording of audio using the built-in microphone and grants access to audio inputs using any Core Audio API that supports audio input. This step is not required for iOS.

## How to Use

The `Talkify` class provides a high-level API for managing speech synthesis, recognition tasks and reading text with different voices. 

Here's a guide on how to use it:

### 1. Setup

To start with, you'll need to initialize a `Talkify` instance:

```swift

let talkify = Talkify()

```

Setup delegates

```swift

talkify.recordingDelegate = self

talkify.speakingDelegate = self

```

Your class should then conform to the `TalkifyRecordingDelegate` and `TalkifySpeakingDelegate` protocols and implement their respective methods.

### 2. Recording Voice 

Before starting recording, ensure to set up the recorder:

```swift

talkify

  .setupRecording()

  .startRecording()

```

You can stop recording programatically with 

```swift

talkify

  .stopRecording()

```

The recognized text will be available through the `recordingDidFinishWithResults(text:)` delegate method.

### 3. Speech Synthesis

To start speaking a text, you need to setup speaker

Initialize the `TalkifySpeaker`:

```swift

let speaker = TalkifySpeaker()

```

Customizing Voice:

```swift

speaker.withVoice(customVoice: .kyoko) // Sets the voice to Kyoko (Japanese Female voice)

```

Customizing Voice Rate:

This adjusts the speed at which the text is spoken. The value range typically is between 0.0 (slowest) and 1.0 (fastest), with 0.5 being the default rate.

```swift

speaker.withVoiceRate(value: 0.7) // Sets a faster speaking rate

```

Customizing Pitch Multiplier:

This adjusts the pitch of the synthesized voice. A value of 1.0 means a regular pitch. Values above or below this can be used to raise or lower the pitch, respectively.

```swift

speaker.withMultiplier(value: 1.2) // Raises the pitch slightly

```

Customizing Volume:

This adjusts the volume of the synthesized voice, with 1.0 being the loudest and 0.0 being muted.

```swift

speaker.withVolume(value: 0.8) // Slightly quieter than the default volume

```

Set speaker to Talkify instance:

```swift

talkify.setSpaker(wih: speaker) // Pass above created speaker instance

```

Start Speaking:

```swift

talkify.speak(text: "Hello, this is Talkify!")

```

You can pause or continue the speech synthesis using:

```swift

talkify.pauseSpeaking()

talkify.continueSpeaking()

```

Remember to handle the delegate methods for `TalkifySpeakingDelegate` to get callbacks about the speech synthesis status.

### 4. Setting a Specific Voice for Synthesis

With Talkify, you can choose a particular voice for speech synthesis. Here's how to set a voice:

```swift

let voice = TalkifyVoice(voice: .samantha, quality: .default)

talkify.voice = voice

```

 

Replace .samantha with the desired voice identifier from the `TalkifyVoiceIdentifier` enum. The quality parameter lets you set the voice's quality; you can choose between `.default` and other available options.

### 5. Choosing a Language for Recognition and Synthesis

To set a specific language for speech recognition and synthesis, you can leverage the `TalkifyLanguage` enum:

```swift

let language: TalkifyLanguage = .englishUS

talkify.recognitionLanguage = language

talkify.synthesisLanguage = language

```

Replace `.englishUS` with your desired language option from the `TalkifyLanguage` enum.

---

For detailed usage and advanced functionalities, refer to the inline documentation provided within the Talkify class and its extensions.

## Contribution 🤝

I will appreciate your contributions! Whether you're fixing bugs, improving the documentation, or enhancing the features, I'd love to have your help. Here's how you can contribute:

1. **Fork the repository**: Start by forking the [Talkify repository](https://github.com/tornikegomareli/Talkify/tree/main).

2. **Clone your fork**: `git clone https://github.com/YOUR_USERNAME/Talkify.git`

3. **Create a branch**: `git checkout -b your-branch-name`

4. **Make your changes**: Improve the codebase, add features, fix bugs, or enhance the documentation.

5. **Commit your changes**: `git commit -m "Your descriptive commit message"`

6. **Push to your fork**: `git push origin your-branch-name`

7. **Submit a pull request**: Go to the Talkify repository and create a new pull request. Describe your changes in detail and ensure it's directed from your branch to the main Talkify branch.

## Issues 🐞

Encountered a bug or an unexpected behavior? I appreciate your feedback. Just Open a new issue on the [GitHub repository](https://github.com/tornikegomareli/Talkify/tree/main/issues), providing as much as u can. This helps me address and fix issues faster.

## Future Plans ⚒️

Because this repository is more for educational purpose, I will happily add new functionalities step by step

- **watchOS Support**: Aim to extend Talkify's capabilities to watchOS, allowing for seamless integration with Apple Watch applications.

- **Rx and Combine Listeners**: In addition to the delegate pattern, I'm planning on introducing listeners using popular reactive frameworks like RxSwift and Combine.

- **Unit Tests**: To ensure the robustness and reliability of Talkify, unit tests are on the way. This will boost confidence in the library's functionality and make future changes safer.

- **Third party integrations**: I have idea to add some third party APIS, for example ChatGPT Speech recognition api with ergonomics to use, but I don't know I need to still think about it, if it will be worth at all.

# Why ?

Just to beat my procrastination 😄

But it really aims to be a comprehensive solution for developers looking to incorporate speech recognition and synthesis into their apps. It abstracts away the complexity of the underlying APIs.

## License

Talkify is licensed under the MIT License. See [LICENSE](https://github.com/tornikegomareli/Talkify/blob/main/LICENSE) for more information.

---

If you've found the README helpful or you like the project idea, please give it a ⭐️ (star) on GitHub.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tornikegomareli/talkify

Awesome Lists containing this project

README