Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cmusphinx/TLSphinx
Swift wrapper around Pocketsphinx
https://github.com/cmusphinx/TLSphinx
Last synced: about 2 months ago
JSON representation
Swift wrapper around Pocketsphinx
- Host: GitHub
- URL: https://github.com/cmusphinx/TLSphinx
- Owner: cmusphinx
- License: mit
- Fork: true (tryolabs/TLSphinx)
- Created: 2015-06-23T18:13:09.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-01-04T23:42:05.000Z (about 6 years ago)
- Last Synced: 2024-08-03T04:08:23.363Z (5 months ago)
- Language: C
- Homepage:
- Size: 27.7 MB
- Stars: 15
- Watchers: 7
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- low-resource-languages - TLSphinx - Swift wrapper around Pocketsphinx. (Software / Utilities)
README
# TLSphinx
TLSphinx is a Swift wrapper around [Pocketsphinx], a portable library based on [CMU Sphinx], that allow an application to perform speech recognition **without the audio ever leaving the device**
This repository has two main parts. The first is a syntetized version of the [pocketsphinx](http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/) and [sphinx base] repositories with a module map to access the library as a [Clang module]. This module is accessed under the name `Shpinx` and has two submodules: `Pocket` and `Base` in reference to _pocketsphinx_ and _sphinx base_.
The second part is `TLSphinx`, a Swift framework that uses the `Sphinx` Clang module and exposes a Swift-like API that talks to _pocketsphinx_.
_Note: I write a blog post about `TLSphinx` [here](http://blog.tryolabs.com/2015/06/15/tlsphinx-automatic-speech-recognition-asr-in-swift/) at the [Tryolabs Blog]. Check it out for a short history about why I wrote this._
## Usage
The framework provides three classes:
- `Config` describe the configuration needed to recognize speech.
- `Decoder` is the main class that provides the API to perform all decoding.
- `Hypotesis` is the result of a decode attempt. It has a `text` and a `score` properties.#### Config
Represents the _cmd_ln_t_ opaque structure in `Sphinx`. The default constructor takes an array of tuples with the form `(param name, param value)` where _"param name"_ is the name of one of the parameters recognized by `Sphinx`. In this example we are passing the acustic model, the language model and the dictionary. For a complete list of recognized parameters check the [Sphinx docs].
The class has a public property to turn on/off the debug info from `Sphinx`:
```swift
public var showDebugInfo: Bool
```#### Decoder
Represent the _ps_decoder_t_ opaque struct in `Sphinx`. The default constructor take a `Config` object as parameter.
This has the functions to perform the decode from a file or from the mic. The result is returned in an optional `Hypotesis` object, following the naming convention of the _Pocketsphinx_ API. The functions are:
To decode speech from a file:
```swift
public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())
```
The audio pointed by `filePath` must have the following characteristics:
- single-channel (monaural)
- little-endian
- unheadered
- 16-bit signed
- PCM
- sampled at 16000 HzTo control the size of the buffer used to read the file, the `Decoder` class has a public property
```swift
public var bufferSize: Int
```To decode a live audio stream from the mic:
```swift
public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()
```You can use the same `Decoder` instance many times.
#### Hypotesis
This struct represents the result of a _decode_ attempt. It has a `text` property with the best scored text and a `score` with the score value. This struct implements `Printable` so you can print it with `println(hypotesis_value)`.
### Examples
#### Processing an Audio File
As an example let's see how to decode the speech in an audio file. To do so you first need to create a `Config` object and pass it to the `Decoder` constructor. With the decoder you can perform automatic speech recognition from an audio file like so:
```swift
import TLSphinxlet hmm = ... // Path to the acustic model
let lm = ... // Path to the languaje model
let dict = ... // Path to the languaje dictionaryif let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
let audioFile = ... // Path to an audio file
decoder.decodeSpeechAtPath(audioFile) {
if let hyp: Hypotesis = $0 {
// Print the decoder text and score
println("Text: \(hyp.text) - Score: \(hyp.score)")
} else {
// Can't decode any speech because of an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
```
The decode is performed with the `decodeSpeechAtPath` function in the bacground. Once the process finishes, the `complete` closure is called in the main thread.#### Speech from the Mic
```swift
import TLSphinxlet hmm = ... // Path to the acoustic model
let lm = ... // Path to the language model
let dict = ... // Path to the language dictionaryif let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
decoder.startDecodingSpeech {
if let hyp: Hypotesis = $0 {
println(hyp)
} else {
// Can't decode any speech because an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()```
## Installation
The easiest way to integrate `TLSphinx` is using [Carthage] or a similar method to get the framework bundle. This lets you integrate the framework and the `Sphinx` module without _magic_.
#### Carthage
In your `Cartfile` add a reference to the last version of `TLSphinx`:
````
github "Tryolabs/TLSphinx" ~> 0.0.4
````Then run `carthage update`, this should fetch and build the last version of `TLSphinx`. Once it's done, drag the _TLSphinx.framewok_ bundle to the XCode _Linked Frameworks and Libraries_. You must tell XCode where to find `Sphinx` module that is located in the Carthage checkout. To do so:
- add `$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include` to _Header Search Paths_ recursive
- add `$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib` to _Library Search Paths_ recursive
- in _Swift Compiler - Search Paths_ add `$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include` to _Import Paths_#### Manual
Download the project from this repository and drag the _TLSpinx_ project to your XCode project. If you encounter any errors about missing headers and/or libraries for _Sphinx_ please add the `Spinx/include` directory to your header search path and `Sphinx/lib` to the library search path and mark it as `recursive`.
## Author
BrunoBerisso, [email protected]
## License
TLSphinx is available under the MIT license. See the LICENSE file for more info.
[CMU Sphinx]: http://cmusphinx.sourceforge.net/
[Pocketsphinx]: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx
[sphinx base]: http://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/
[Clang module]: http://clang.llvm.org/docs/Modules.html
[Sphinx docs]: http://cmusphinx.sourceforge.net/wiki/
[Tryolabs Blog]: http://blog.tryolabs.com/
[Carthage]: https://github.com/Carthage/Carthage