https://github.com/rubyonworld/pocketsphinx-server

Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.
https://github.com/rubyonworld/pocketsphinx-server

gstreamer module pocket pocketsphinx ruby server sphinx

Last synced: 1 day ago
JSON representation

Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.

Host: GitHub
URL: https://github.com/rubyonworld/pocketsphinx-server
Owner: RubyOnWorld
License: other
Created: 2022-09-27T15:42:09.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2022-09-28T00:51:04.000Z (almost 3 years ago)
Last Synced: 2025-05-19T20:32:47.132Z (about 1 month ago)
Topics: gstreamer, module, pocket, pocketsphinx, ruby, server, sphinx
Language: Ruby
Homepage:
Size: 262 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.rdoc
- License: LICENSE

Awesome Lists containing this project

README

= Introduction

Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.

= Requirements

* Ruby 1.8
* Sinatra
* Rack
* Unicorn
* PocketSphinx (NOTE: some features of the server require patched PocketSphinx, see below)
* Some acoustic and language models for PocketSphinx

= Installing

== CMU Sphinx

* Install sphinxbase from SVN (make, make install)

=== Apply PocketSphinx patch

In cmusphinx/pocketsphinx directory:

wget http://www.phon.ioc.ee/~tanela/ps_gst.patch
patch -p0 -i ps_gst.patch

Make sure you have GStreamer devevelopment packages installed. In Debian Squeeze:

apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev

And configure, make, make install as usual.

== Install Ruby gems: Unicorn and Sinatra, UUID tools, JSON, locale

This assumes you have ruby and rubygems installed.

You might want to do this as root:

gem install unicorn
gem install sinatra
gem install uuidtools
gem install json
gem install locale

Install ruby-gstreamer package (might vary depending on your distribution):

apt-get install libgst-ruby1.8

== Additional tools

English GF-based recognizer also need:

* libtext-unidecode-perl
* Phonetisaurus, Phonetisaurus prebuilt model for English (http://code.google.com/p/phonetisaurus/downloads/detail?name=g014b2b.tgz)
* Python

== Run ruby-pocketsphinx-server

Clone the git repository:

git clone git://github.com/alumae/ruby-pocketsphinx-server.git

Before executing, add `/usr/local/lib` to the path where GStreamer plugins are looked for:

export GST_PLUGIN_PATH=/usr/local/lib

= Running

unicorn -c unicorn.conf.rb config.ru

If you installed Unicorn as a Ruby gem, you might need to execute:

/var/lib/gems/1.8/bin/unicorn -c unicorn.conf.rb config.ru

Test the default configuration (English WSJ language model with HUB4 acostic models), using a raw audio file in the PocketSphinx test directory
(replace `$(POCKETSPHINX_DIR)` with the Pocketsphinx source directory):

curl -T $(POCKETSPHINX_DIR)/test/data/wsj/n800_440c0207.wav -H "Content-Type: audio/x-wav" "http://localhost:8080/recognize"

Response should be:

{
"status": 0,
"hypotheses": [
{
"utterance": "the agency isn't likely to take any action until the union's rank and file votes on the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file puts on the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file funds from the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file for from the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file parts of the contract into three weeks"
}
],
"id": "8686a37b5674cbdc63deb13f73de81a5"
}

= Configuration

== Web service

Unicorn configuration is in file unicorn.conf.rb. See http://unicorn.bogomips.org/examples/unicorn.conf.rb for
more info.

== Recognizer

See conf.yaml

= Using the web service

Some of the more advanced examples below are specific to the Estonian configuration.

==Example 1

Record a sentence to a wav file, in mono (hit Ctrl-C when done speaking):

rec -c 1 sentence.wav

Send it to the web service:

curl -X POST --data-binary @sentence.wav -H "Content-Type: audio/x-wav" http://localhost:8080/recognize

Output (encoded using json, the example uses Estonian models):

{
"status": 0,
"hypotheses": [
{
"utterance": [
"t\u00e4na on v\u00e4ljas \u00fcsna ilus ilm"
]
}
],
"id": "e30f54561135d681599915562d77d240"
}

== Example 2

Record a raw file using arecord:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 > sentence2.raw

Send it to web service:

curl -X POST --data-binary @sentence2.raw -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize

== Example 3

Record a 5 second audio, pipe it to curl, which streams it directly to web service using PUT (and gets almost instant response):

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize

= Support for JSGF grammars

Users can use their own grammars to recognize certain sentences. The grammars should be in JSGF format.

Example JSGF (let's call it robot.jsgf)

#JSGF V1.0;

grammar robot;

public = (liigu | mine ) [ ( üks | kaks | kolm | neli | viis ) meetrit ] (edasi | tagasi);

NB! Grammars should be in the same charset that the server is using for dictionary, which currently is latin-1 (sorry for that).

You need to upload the JSGF file to somewhere where the server can fetch it, let's say http://www.example.com/robot.txt

Now, let the server download and compile it:

curl -vv http://localhost:8080/fetch-lm?url=http://www.example.com/robot.jsgf

This should result in HTTP/1.1 200 OK.

Now you can use the grammar to recognize a sentence that is accepted by the grammar:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | \
curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize?lm=http://www.example.com/robot.jsgf

Result:

{
"status": 0,
"hypotheses": [
{
"utterance": "mine viis meetrit tagasi"
}
],
"id": "9e3895e9ee0b5138e73c6fca30f51a58"
}

If you update the grammar on the server, you need to make the /fetch-jsgf request again, as the server doesn't check for changes every time
a recognition request is done (for efficiency reasons).

= Support for GF grammars

GF (Grammatical Framework) grammars are supported.

A GF grammar must be compiled into a .pgf file. To upload it to the server, use the fetch-pgf API call, e.g.:

curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&lang=Est"

The 'lang' attribute (defaults to 'Est') specifies input languages of the grammar. Many comma-separated languages can be specified, e.g lang=Est,Est2

To recognize with a GF, use similar request as with JSGF, e.g.:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf

You can also specify output language(s) that will be used to linearize the raw recognition result, e.g.:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&output-lang=App"

Output:

{
"status": 0,
"hypotheses": [
{
"utterance": "viis minutit sekundites",
"linearizations": [
{
"lang": "App",
"output": "5 ' IN \""
},
{
"lang": "App",
"output": "5 min IN s"
}
]
}
],
"id": "83486feaca30995401ed4a66951a3f23"
}

Multiple output languages can be used, by using comma-separated values: "..&output-lang=App,App2"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rubyonworld/pocketsphinx-server

Awesome Lists containing this project

README