https://github.com/cmusphinx/pocketsphinx-python

Python module installed with setup.py
https://github.com/cmusphinx/pocketsphinx-python

Last synced: about 1 year ago
JSON representation

Python module installed with setup.py

Host: GitHub
URL: https://github.com/cmusphinx/pocketsphinx-python
Owner: cmusphinx
License: other
Archived: true
Fork: true (bambocher/pocketsphinx-python)
Created: 2014-12-16T18:01:33.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2022-06-29T15:22:48.000Z (about 4 years ago)
Last Synced: 2024-08-03T04:08:11.716Z (almost 2 years ago)
Language: Python
Homepage:
Size: 24.5 MB
Stars: 338
Watchers: 25
Forks: 81
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

low-resource-languages - pocketsphinx-python - Python module installed with setup.py. (Software / Utilities)

README

          # Pocketsphinx Python

**This module is no longer relevant, and is being archived. Python bindings are included in the [pocketsphinx](https://github.com/cmusphinx/pocketsphinx) module. Alternatively, you may consider using [bambocher/pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python)**

Pocketsphinx is a part of the [CMU Sphinx](http://cmusphinx.sourceforge.net) Open Source Toolkit For Speech Recognition.

This package provides a python interface to CMU [Sphinxbase](https://github.com/cmusphinx/sphinxbase) and [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx) libraries created with [SWIG](http://www.swig.org) and [Setuptools](https://setuptools.readthedocs.io).

## Supported platforms

* Windows

* Linux

* Mac OS X

## Installation

```

git clone --recursive https://github.com/cmusphinx/pocketsphinx-python/

cd pocketsphinx-python

python setup.py install

```

## Usage

### LiveSpeech

An iterator class for continuous recognition or keyword search from a microphone.

Note that this is not supported (yet) in macOS Big Sur.

```python

from pocketsphinx import LiveSpeech

for phrase in LiveSpeech(): print(phrase)

```

An example of a keyword search:

```python

from pocketsphinx import LiveSpeech

speech = LiveSpeech(lm=False, keyphrase='forward', kws_threshold=1e-20)

for phrase in speech:

    print(phrase.segments(detailed=True))

```

With your model and dictionary:

```python

import os

from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()

speech = LiveSpeech(

    verbose=False,

    sampling_rate=16000,

    buffer_size=2048,

    no_search=False,

    full_utt=False,

    hmm=os.path.join(model_path, 'en-us'),

    lm=os.path.join(model_path, 'en-us.lm.bin'),

    dic=os.path.join(model_path, 'cmudict-en-us.dict')

)

for phrase in speech:

    print(phrase)

```

### AudioFile

An iterator class for continuous recognition or keyword search from a file.

```python

from pocketsphinx import AudioFile

for phrase in AudioFile(): print(phrase) # => "go forward ten meters"

```

An example of a keyword search:

```python

from pocketsphinx import AudioFile

audio = AudioFile(lm=False, keyphrase='forward', kws_threshold=1e-20)

for phrase in audio:

    print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"

```

With your model and dictionary:

```python

import os

from pocketsphinx import AudioFile, get_model_path, get_data_path

model_path = get_model_path()

data_path = get_data_path()

config = {

    'verbose': False,

    'audio_file': os.path.join(data_path, 'goforward.raw'),

    'buffer_size': 2048,

    'no_search': False,

    'full_utt': False,

    'hmm': os.path.join(model_path, 'en-us'),

    'lm': os.path.join(model_path, 'en-us.lm.bin'),

    'dict': os.path.join(model_path, 'cmudict-en-us.dict')

}

audio = AudioFile(**config)

for phrase in audio:

    print(phrase)

```

Convert frame into time coordinates:

```python

from pocketsphinx import AudioFile

# Frames per Second

fps = 100

for phrase in AudioFile(frate=fps):  # frate (default=100)

    print('-' * 28)

    print('| %5s |  %3s  |   %4s   |' % ('start', 'end', 'word'))

    print('-' * 28)

    for s in phrase.seg():

        print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))

    print('-' * 28)

# ----------------------------

# | start |  end  |   word   |

# ----------------------------

# |  0.0s | 0.24s |       |

# | 0.25s | 0.45s |     |

# | 0.46s | 0.63s | go       |

# | 0.64s | 1.16s | forward  |

# | 1.17s | 1.52s | ten      |

# | 1.53s | 2.11s | meters   |

# | 2.12s |  2.6s |      |

# ----------------------------

```

### Pocketsphinx

It's a simple and flexible proxy class to `pocketsphinx.Decode`.

```python

from pocketsphinx import Pocketsphinx

print(Pocketsphinx().decode()) # => "go forward ten meters"

```

A more comprehensive example:

```python

from __future__ import print_function

import os

from pocketsphinx import Pocketsphinx, get_model_path, get_data_path

model_path = get_model_path()

data_path = get_data_path()

config = {

    'hmm': os.path.join(model_path, 'en-us'),

    'lm': os.path.join(model_path, 'en-us.lm.bin'),

    'dict': os.path.join(model_path, 'cmudict-en-us.dict')

}

ps = Pocketsphinx(**config)

ps.decode(

    audio_file=os.path.join(data_path, 'goforward.raw'),

    buffer_size=2048,

    no_search=False,

    full_utt=False

)

print(ps.segments()) # => ['', '', 'go', 'forward', 'ten', 'meters', '']

print('Detailed segments:', *ps.segments(detailed=True), sep='\n') # => [

#     word, prob, start_frame, end_frame

#     ('', 0, 0, 24)

#     ('', -3778, 25, 45)

#     ('go', -27, 46, 63)

#     ('forward', -38, 64, 116)

#     ('ten', -14105, 117, 152)

#     ('meters', -2152, 153, 211)

#     ('', 0, 212, 260)

# ]

print(ps.hypothesis())  # => go forward ten meters

print(ps.probability()) # => -32079

print(ps.score())       # => -7066

print(ps.confidence())  # => 0.04042641466841839

print(*ps.best(count=10), sep='\n') # => [

#     ('go forward ten meters', -28034)

#     ('go for word ten meters', -28570)

#     ('go forward and majors', -28670)

#     ('go forward and meters', -28681)

#     ('go forward and readers', -28685)

#     ('go forward ten readers', -28688)

#     ('go forward ten leaders', -28695)

#     ('go forward can meters', -28695)

#     ('go forward and leaders', -28706)

#     ('go for work ten meters', -28722)

# ]

```

### Default config

If you don't pass any argument while creating an instance of the Pocketsphinx, AudioFile or LiveSpeech class, it will use next default values:

```python

verbose = False

logfn = /dev/null or nul

audio_file = site-packages/pocketsphinx/data/goforward.raw

audio_device = None

sampling_rate = 16000

buffer_size = 2048

no_search = False

full_utt = False

hmm = site-packages/pocketsphinx/model/en-us

lm = site-packages/pocketsphinx/model/en-us.lm.bin

dict = site-packages/pocketsphinx/model/cmudict-en-us.dict

```

Any other option must be passed into the config as is, without using symbol `-`.

If you want to disable default language model or dictionary, you can change the value of the corresponding options to False:

```python

lm = False

dict = False

```

### Verbose

Send output to stdout:

```python

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True)

ps.decode()

print(ps.hypothesis())

```

Send output to file:

```python

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True, logfn='pocketsphinx.log')

ps.decode()

print(ps.hypothesis())

```

### Compatibility

Parent classes are still available:

```python

import os

from pocketsphinx import DefaultConfig, Decoder, get_model_path, get_data_path

model_path = get_model_path()

data_path = get_data_path()

# Create a decoder with a certain model

config = DefaultConfig()

config.set_string('-hmm', os.path.join(model_path, 'en-us'))

config.set_string('-lm', os.path.join(model_path, 'en-us.lm.bin'))

config.set_string('-dict', os.path.join(model_path, 'cmudict-en-us.dict'))

decoder = Decoder(config)

# Decode streaming data

buf = bytearray(1024)

with open(os.path.join(data_path, 'goforward.raw'), 'rb') as f:

    decoder.start_utt()

    while f.readinto(buf):

        decoder.process_raw(buf, False, False)

    decoder.end_utt()

print('Best hypothesis segments:', [seg.word for seg in decoder.seg()])

```

## Install development version

### Install requirements

Windows requirements:

* [Python](https://www.python.org/downloads)

* [Git](http://git-scm.com/downloads)

* [Swig](http://www.swig.org/download.html)

* [Visual Studio Community](https://www.visualstudio.com/ru-ru/downloads/download-visual-studio-vs.aspx)

Ubuntu requirements:

```shell

sudo apt-get install -qq python python-dev python-pip build-essential swig git libpulse-dev libasound2-dev

```

Mac OS X requirements:

```shell

brew reinstall swig python

```

### Install UPSTREAM version with pip

Note that this is NOT the same as this version under github cmusphinx.

```shell

pip install https://github.com/bambocher/pocketsphinx-python/archive/master.zip

```

### Install with distutils

```shell

git clone --recursive https://github.com/cmusphinx/pocketsphinx-python

cd pocketsphinx-python

python setup.py install

```

## Projects using pocketsphinx-python

* [SpeechRecognition](https://github.com/Uberi/speech_recognition) - Library for performing speech recognition, with support for several engines and APIs, online and offline.

## License

[The BSD License](https://github.com/bambocher/pocketsphinx-python/blob/master/LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cmusphinx/pocketsphinx-python

Awesome Lists containing this project

README