An open API service indexing awesome lists of open source software.

https://github.com/dialogflow/asr-server

FastCGI support for Kaldi ASR
https://github.com/dialogflow/asr-server

Last synced: about 2 months ago
JSON representation

FastCGI support for Kaldi ASR

Awesome Lists containing this project

README

        

About
======
FastCGI support for [Kaldi](http://kaldi-asr.org/doc/). It allows Kaldi based speech recognition to be used though Apache or Nginx (or any other that support FastCGI) HTTP servers. It also contains simple HTML-based client, that allows testing Kaldi speech recognitionfrom a web page.

Licence
-------
Apache 2.0

Installation guide
==================

Summary
-------

This guide will help you to download and build your own simple ASR
web-service based on Kaldi ASR code.

Preparing prerequisites
-----------------------

### Creating a working dir

Let's create a directory where all data will be downloaded and built.

mkdir ~/apiai
cd ~/apiai

You are free to choose any other name and path you wish to, but will
have to keep in mind that your name differs from the name given in the
guide.

Due to server code is based on Kaldi almost all prerequisites matches
to Kaldi ones. Besides that a FastCGI library is required to communicate
with HTTP server.

### Getting Kaldi

As a first step you have to clone Kaldi source tree available at
:

git clone https://github.com/kaldi-asr/kaldi

This command will clone source tree to `kaldi` directory.
To configure and build Kaldi please refer to `kaldi/INSTALL` file.
For detailed information please look for Kaldi official instruction:

### Installing libraries

There are some extra libraries required. You may install them using
system packet manager.

In openSuSE you may run:

$ sudo zypper install FastCGI-devel

It you have Debian or Ubuntu:

$ sudo apt-get install libfcgi-dev

Getting the code
--------------

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

and then clone server source code

$ git clone https://github.com/api-ai/asr-server asr-server

It is recommended to checkout code to the same directory where
kaldi-apiai is located to allow `configure` tool to detect Kaldi
location automatically.

Building the app
--------------

$ cd asr-server

Before running a make process you have to configure build scripts
by running a special utility:

$ ./configure

It will check that all required libraries installed to your system and
also will look for Kaldi libraries in `../kaldi` folder. If you
have Kaldi installed somewhere else you may explicitly pass the
path via --kaldi-root option:

$ ./configure --kaldi-root=

If configuration process has finished successfully you may begin
the building process by running make script:

$ make

Getting a recognition model
------------------------

When application build complete you need to download language specific
data.

Return to your working directory where you put Kaldi sources

$ cd ~/apiai

Builded ASR application uses a Kaldi nnet3 models, which you can get
by training a neural network with your personal data set or use a
pretrained network provided by us. Currently it is only English model available
at .

$ wget https://github.com/api-ai/api-ai-english-asr-model/releases/download/1.0/api.ai-kaldi-asr-model.zip

Unzip the archive to `asr-server` directory.

$ unzip api.ai-kaldi-asr-model.zip

Running the app
--------------

Set the model directory as a working dir:

$ cd api.ai-kaldi-asr-model

There are several ways available to run application. The first one is
to run it as a standalone app listening on socket defined with
`--fcgi-socket` option:

$ ../asr-server/fcgi-nnet3-decoder --fcgi-socket=:8000

This command runs application listening on any IP address and port 8000.
You are also free to define a path Unix socket, or explicit IP
address (in a A.B.C.D:PORT form).

As an alternative way you may use special spawn-fcgi utility:

$ spawn-fcgi -n -p 8000 -- ../asr-server/fcgi-nnet3-decoder

Configuring HTTP service
---------------------

You may use any web-server which have FastCGI support: Apache, Nginx, Lighttpd etc.

### Installing Apache2

openSuSE:

$ sudo zypper in apache2

Debian and Ubuntu:

$ sudo apt-get install apache2

### Configuring Apache2

Enable FastCGI proxy module with `a2enmod`:

$ sudo a2enmod proxy_fcgi

Then you have to add to Apache2 configuration file following line:

ProxyPass "/asr" "fcgi://localhost:8000/"

If your Apache configured to include all .conf files from /etc/apache2/conf.d folder you may
create separate asr_proxy.conf file with following content:

ProxyPass "/asr" "fcgi://localhost:8000/"
Alias /asr-html/ "/home/username/apiai/asr-server/asr-html/"

Options Indexes MultiViews
AllowOverride None
Require all granted


Now restart Apache:

$ sudo /etc/init.d/apache2 restart

### Installing Nginx

You can download latest sources from official website and build Nginx
with yourself or use your system package manager.

openSuSE:

$ sudo zypper install nginx

Debian and Ubuntu:

$ sudo apt-get install nginx

### Configuring Nginx

Open nginx.conf and write down the following code:

http {
server {
location /asr {
fastcgi_pass 127.0.0.1:8000;
# Disabling this option invokes immediate sending replies to client
fastcgi_buffering off;
# Disabling this option invokes immediate decoding incoming audio data
fastcgi_request_buffering off;
include fastcgi_params;
}

location /asr-html {
root /home/username/apiai/asr-server/;
index index.html;
}
}
}

This will setup Nginx to pass all requests coming to url /asr directly
to ASR service listening 8000 port via FastCGI gate. For detailed
information please please refer to nginx documentation
(e.g. )

Speech Recognition
----------------

Server accepts raw mono 16-bits 16 KHz PCM data. You can convert your audio
using any popular encoding utilities, for instance, you can use ffmpeg:

$ ffmpeg -i audio.wav -f s16le -ar 16000 -ac 1 audio.raw

### Recognition using web browser

There is a simple JS implementation that allows you to recognize speech using system mic.
Open in your browser:

http://localhost/asr-html/

and follow the instructions on the page.

### Recognition from command line using curl

Now, let’s recognize `audio.raw` by calling web-service with `curl`
utility:

$ curl -H "Content-Type: application/octet-stream" --data-binary @audio.raw http://localhost/asr

On successfull recognition the command will return something like this:

{
"status":"ok",
"data":[{"confidence":0.900359,"text":"HELLO WORLD"}]
}

On error the return value will be like this:

{"status":"error","data":[{"text":"Failed to decode"}]}

### Recognition request parameters

There are several parameters to tune up recognition process. All parameters are expected to be passed via query string as web-form fields enumeration (e.g. `?name1=value1&name2=value2`).


Parameter
Description
Acceptable values
Default value


nbest
Set the number of possible returned values

{

"status":"ok",
"data":[
{"confidence":0.900359,"text":"HELLO WORLD"},
{"confidence":0.89012,"text":"HELLO WORD"}
]
}

1-10
1


endofspeech
Enable or disable end-of-speech points during recognition. If endpoint
detected all then current result have returned and the rest data would
be skipped. Also in case of interrupted recognition 2 fields would be added
to response: "interrupted" with value "endofspeech", and "time" with time point
showing the number of milliseconds have been processed.

{

"status":"ok",
"data":[{"confidence":0.900359,"text":"HELLO WORLD"}],
"interrupted":"endofspeech",
"time":3800
}

true or false
true


intermediate
Set time interval in milliseconds between intermediate results while
recognition being in progress.

The result returned as an simple sequence of JSON documents.
Each intermediate document have "status" field set to "intermediate",
last one will have "status" set to "ok".



{"status":"intermediate","data":[
{"confidence":0.908981,"text":"HELLO"}
]}
{"status":"intermediate","data":[
{"confidence":0.903025,"text":"HELLO WORLD"}
]}
{"status":"ok","data":[
{"confidence":0.903025,"text":"HELLO WORLD"}
]}

>500
0


multipart
If enabled the result would be returned as an
HTTP multipart response
with "content-type"
set to "multipart/x-mixed-replace" and each response part
has "Content-Disposition" header value equal to "form-data".
Intermediate parts named as "partial" and a final part is named as "result".



--ResponseBoundary
Content-Disposition: form-data; name="partial"
Content-type: application/json

{"status":"intermediate","data":[
{"confidence":0.908981,"text":"HELLO"}
]}

--ResponseBoundary
Content-Disposition: form-data; name="partial"
Content-type: application/json

{"status":"intermediate","data":[
{"confidence":0.903025,"text":"HELLO WORLD"}
]}

--ResponseBoundary
Content-Disposition: form-data; name="result"
Content-type: application/json

{"status":"ok","data":[
{"confidence":0.903025,"text":"HELLO WORLD"}
]}

--ResponseBoundary--

true or false
false