https://github.com/rubyonworld/pocketsphinx-server
Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.
https://github.com/rubyonworld/pocketsphinx-server
gstreamer module pocket pocketsphinx ruby server sphinx
Last synced: 2 months ago
JSON representation
Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.
- Host: GitHub
- URL: https://github.com/rubyonworld/pocketsphinx-server
- Owner: RubyOnWorld
- License: other
- Created: 2022-09-27T15:42:09.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-09-28T00:51:04.000Z (over 2 years ago)
- Last Synced: 2024-12-28T14:26:21.366Z (4 months ago)
- Topics: gstreamer, module, pocket, pocketsphinx, ruby, server, sphinx
- Language: Ruby
- Homepage:
- Size: 262 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rdoc
- License: LICENSE
Awesome Lists containing this project
README
= Introduction
Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.
= Requirements
* Ruby 1.8
* Sinatra
* Rack
* Unicorn
* PocketSphinx (NOTE: some features of the server require patched PocketSphinx, see below)
* Some acoustic and language models for PocketSphinx= Installing
== CMU Sphinx
* Install sphinxbase from SVN (make, make install)
=== Apply PocketSphinx patch
In cmusphinx/pocketsphinx directory:
wget http://www.phon.ioc.ee/~tanela/ps_gst.patch
patch -p0 -i ps_gst.patchMake sure you have GStreamer devevelopment packages installed. In Debian Squeeze:
apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
And configure, make, make install as usual.== Install Ruby gems: Unicorn and Sinatra, UUID tools, JSON, locale
This assumes you have ruby and rubygems installed.
You might want to do this as root:
gem install unicorn
gem install sinatra
gem install uuidtools
gem install json
gem install locale
Install ruby-gstreamer package (might vary depending on your distribution):
apt-get install libgst-ruby1.8== Additional tools
English GF-based recognizer also need:
* libtext-unidecode-perl
* Phonetisaurus, Phonetisaurus prebuilt model for English (http://code.google.com/p/phonetisaurus/downloads/detail?name=g014b2b.tgz)
* Python== Run ruby-pocketsphinx-server
Clone the git repository:
git clone git://github.com/alumae/ruby-pocketsphinx-server.git
Before executing, add `/usr/local/lib` to the path where GStreamer plugins are looked for:export GST_PLUGIN_PATH=/usr/local/lib
= Running
unicorn -c unicorn.conf.rb config.ru
If you installed Unicorn as a Ruby gem, you might need to execute:
/var/lib/gems/1.8/bin/unicorn -c unicorn.conf.rb config.ru
Test the default configuration (English WSJ language model with HUB4 acostic models), using a raw audio file in the PocketSphinx test directory
(replace `$(POCKETSPHINX_DIR)` with the Pocketsphinx source directory):curl -T $(POCKETSPHINX_DIR)/test/data/wsj/n800_440c0207.wav -H "Content-Type: audio/x-wav" "http://localhost:8080/recognize"
Response should be:{
"status": 0,
"hypotheses": [
{
"utterance": "the agency isn't likely to take any action until the union's rank and file votes on the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file puts on the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file funds from the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file for from the contract into three weeks"
},
{
"utterance": "the agency isn't likely to take any action until the union's rank and file parts of the contract into three weeks"
}
],
"id": "8686a37b5674cbdc63deb13f73de81a5"
}= Configuration
== Web service
Unicorn configuration is in file unicorn.conf.rb. See http://unicorn.bogomips.org/examples/unicorn.conf.rb for
more info.== Recognizer
See conf.yaml
= Using the web service
Some of the more advanced examples below are specific to the Estonian configuration.
==Example 1
Record a sentence to a wav file, in mono (hit Ctrl-C when done speaking):
rec -c 1 sentence.wav
Send it to the web service:curl -X POST --data-binary @sentence.wav -H "Content-Type: audio/x-wav" http://localhost:8080/recognize
Output (encoded using json, the example uses Estonian models):
{
"status": 0,
"hypotheses": [
{
"utterance": [
"t\u00e4na on v\u00e4ljas \u00fcsna ilus ilm"
]
}
],
"id": "e30f54561135d681599915562d77d240"
}
== Example 2Record a raw file using arecord:
arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 > sentence2.raw
Send it to web service:
curl -X POST --data-binary @sentence2.raw -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize
== Example 3Record a 5 second audio, pipe it to curl, which streams it directly to web service using PUT (and gets almost instant response):
arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize
= Support for JSGF grammarsUsers can use their own grammars to recognize certain sentences. The grammars should be in JSGF format.
Example JSGF (let's call it robot.jsgf)
#JSGF V1.0;
grammar robot;
public = (liigu | mine ) [ ( üks | kaks | kolm | neli | viis ) meetrit ] (edasi | tagasi);
NB! Grammars should be in the same charset that the server is using for dictionary, which currently is latin-1 (sorry for that).
You need to upload the JSGF file to somewhere where the server can fetch it, let's say http://www.example.com/robot.txt
Now, let the server download and compile it:curl -vv http://localhost:8080/fetch-lm?url=http://www.example.com/robot.jsgf
This should result in HTTP/1.1 200 OK.
Now you can use the grammar to recognize a sentence that is accepted by the grammar:
arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | \
curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize?lm=http://www.example.com/robot.jsgfResult:
{
"status": 0,
"hypotheses": [
{
"utterance": "mine viis meetrit tagasi"
}
],
"id": "9e3895e9ee0b5138e73c6fca30f51a58"
}If you update the grammar on the server, you need to make the /fetch-jsgf request again, as the server doesn't check for changes every time
a recognition request is done (for efficiency reasons).= Support for GF grammars
GF (Grammatical Framework) grammars are supported.
A GF grammar must be compiled into a .pgf file. To upload it to the server, use the fetch-pgf API call, e.g.:
curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&lang=Est"
The 'lang' attribute (defaults to 'Est') specifies input languages of the grammar. Many comma-separated languages can be specified, e.g lang=Est,Est2To recognize with a GF, use similar request as with JSGF, e.g.:
arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf
You can also specify output language(s) that will be used to linearize the raw recognition result, e.g.:
arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&output-lang=App"
Output:{
"status": 0,
"hypotheses": [
{
"utterance": "viis minutit sekundites",
"linearizations": [
{
"lang": "App",
"output": "5 ' IN \""
},
{
"lang": "App",
"output": "5 min IN s"
}
]
}
],
"id": "83486feaca30995401ed4a66951a3f23"
}
Multiple output languages can be used, by using comma-separated values: "..&output-lang=App,App2"