https://github.com/mainro/deepspeech-server

A testing server for a speech to text service based on coqui.ai
https://github.com/mainro/deepspeech-server

coqui-ai deepspeech reactive-extensions reactivex rxpy speech-recognition speech-to-text

Last synced: 3 months ago
JSON representation

A testing server for a speech to text service based on coqui.ai

Host: GitHub
URL: https://github.com/mainro/deepspeech-server
Owner: MainRo
License: mpl-2.0
Created: 2017-11-16T09:53:40.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-07-12T21:45:16.000Z (about 3 years ago)
Last Synced: 2025-03-29T20:05:22.659Z (4 months ago)
Topics: coqui-ai, deepspeech, reactive-extensions, reactivex, rxpy, speech-recognition, speech-to-text
Language: Python
Homepage:
Size: 80.1 KB
Stars: 215
Watchers: 15
Forks: 71
Open Issues: 0
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

        ==================

DeepSpeech Server

==================

.. image:: https://github.com/MainRo/deepspeech-server/actions/workflows/pythonpackage.yml/badge.svg

    :target: https://github.com/MainRo/deepspeech-server/actions/workflows/pythonpackage.yml

.. image:: https://badge.fury.io/py/deepspeech-server.svg

    :target: https://badge.fury.io/py/deepspeech-server

Key Features

============

This is an http server that can be used to test the Coqui STT project (the

successor of the Mozilla DeepSpeech project). You need an environment with

DeepSpeech or Coqui to run this server.

This code uses the Coqui STT 1.0 APIs.

Installation

=============

The server is available on pypi, so you can install it with pip:

.. code-block:: console

    pip3 install deepspeech-server

You can also install deepspeech server from sources:

.. code-block:: console

    python3 setup.py install

Note that python 3.6 is the minimum version required to run the server.

Starting the server

====================

.. code-block:: console

    deepspeech-server --config config.yaml

What is a STT model?

--------------------

The quality of the speech-to-text engine depends heavily on which models it

loads at runtime. Think of them as a sort of pattern that controls how the

engine works.

How to use a specific STT model

-------------------------------

You can use coqui without training a model. Pre-trained models are on

offer at the Coqui Model Zoo (Make sure the STT Models tab is selected):

https://coqui.ai/models

Once you've downloaded a pre-trained model, make a copy of the sample

configuration file. Edit the `"model"` and `"scorer"` fields in your new file

for the engine you want to use so that they match the downloaded files:

.. code-block:: console

    cp config.sample.yaml config.yaml

    $EDITOR config.yaml

Lastly, start the server:

.. code-block:: console

    deepspeech-server --config config.yaml

Server configuration

=====================

The configuration is done with a yaml file, provided with the "--config" argument.

Its structure is the following one:

.. code-block:: yaml

    coqui:

      model: coqui-1.0.tflite

      scorer: huge-vocabulary.scorer

      beam_width: 500

    server:

      http:

        host: "0.0.0.0"

        port: 8080

        request_max_size: 1048576

    log:

      level:

        - logger: deepspeech_server

          level: DEBUG

The configuration file contains several sections and sub-sections.

coqui section configuration

---------------------------

Section "coqui" contains configuration of the coqui-stt engine:

**model**: The model that was trained by coqui. Must be a tflite (TensorFlow Lite) file.

**scorer**: [Optional] The scorer file. Use this to tune the STT to understand certain phrases better.

**lm_alpha**: [Optional] alpha hyperparameter for the scorer.

**lm_beta**: [Optional] beta hyperparameter for the scorer.

**beam_width**: [Optional] The size of the beam search. Corresponds directly to how long decoding takes.

http section configuration

--------------------------

**request_max_size** (default value: 1048576, i.e. 1MiB) is the maximum payload

size allowed by the server. A received payload size above this threshold will

return a "413: Request Entity Too Large" error.

**host**  The listen address of the http server.

**port** The listening port of the http server.

log section configuration

-------------------------

The log section can be used to set the log levels of the server. This section

contains a list of log entries. Each log entry contains the name of a **logger** 

and its **level**. Both follow the convention of the python logging module.

Using the server

================

Inference on the model is done via http post requests. For example with the

following curl command:

.. code-block:: console

     curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mainro/deepspeech-server

Awesome Lists containing this project

README