https://github.com/zsais/selfservicekiosk-audio-streaming
A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
https://github.com/zsais/selfservicekiosk-audio-streaming
Last synced: 9 months ago
JSON representation
A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
- Host: GitHub
- URL: https://github.com/zsais/selfservicekiosk-audio-streaming
- Owner: zsais
- License: apache-2.0
- Fork: true (dialogflow/selfservicekiosk-audio-streaming)
- Created: 2022-05-19T18:10:43.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-06-03T17:55:13.000Z (about 4 years ago)
- Last Synced: 2025-01-27T05:33:54.907Z (over 1 year ago)
- Language: JavaScript
- Homepage:
- Size: 2.32 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://opensource.org/licenses/Apache-2.0)
# Google Cloud / Dialogflow - Self Service Kiosk Demo
[](https://console.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fdialogflow%2Fselfservicekiosk-audio-streaming&cloudshell_tutorial=TUTORIAL.md)
A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
Airport SelfService Kiosk demo, to demonstrate how microphone streaming to GCP works, from a web application.
It makes use of the following GCP resources:
* Dialogflow & Knowledge Bases
* Speech to Text
* Text to Speech
* Translate API
* (optionally) App Engine Flex
In this demo, you can start recording your voice, it will display answers on a screen and synthesize the speech.


# Live demo
A working demo can be found here: [http://selfservicedesk.appspot.com/](http://selfservicedesk.appspot.com/)
# Blog posts
I wrote very extensive blog articles on how to setup your streaming project. Want to exactly learn how this code works? Have a start here:
Blog 1: [Introduction to the GCP conversational AI components, and integrating your own voice AI in a web app.](https://medium.com/@ladysign/building-your-own-conversational-voice-ai-with-dialogflow-speech-to-text-in-web-apps-part-i-b92770bd8b47)
Blog 2: [Building a client-side web application which streams audio from a browser microphone to a server.](https://medium.com/@ladysign/building-a-client-side-web-app-which-streams-audio-from-a-browser-microphone-to-a-server-part-ii-df20ddb47d4e)
Blog 3: [Building a web server which receives a browser microphone stream and uses Dialogflow or the Speech to Text API for retrieving text results.](https://medium.com/google-cloud/building-a-web-server-which-receives-a-browser-microphone-stream-and-uses-dialogflow-or-the-speech-62b47499fc71)
Blog 4: [Getting Audio Data from Text (Text to Speech) and play it in your browser.](https://medium.com/@ladysign/getting-audio-data-from-text-text-to-speech-and-play-it-in-your-browser-part-iv-cd2d6ea71c6a)
# Slides & Video
There's a [presentation](https://speakerdeck.com/savelee/implementing-a-custom-ai-voice-assistant-by-streaming-webrtc-to-dialogflow-and-cloud-speech) and a [video](https://youtu.be/6JD8WC1LV7g) that accompanies the tutorial.
[](https://speakerdeck.com/savelee/implementing-a-custom-ai-voice-assistant-by-streaming-webrtc-to-dialogflow-and-cloud-speech)
# Setup Local Environment
## Get a Node.js environment
1. `apt-get install nodejs -y`
1. `apt-get npm`
## Get an Angular environment
1. `sudo npm install -g @angular/cli`
## Clone Repo
1. `git clone https://github.com/dialogflow/selfservicekiosk-audio-streaming.git selfservicekiosk`
2. Set the PROJECT_ID variable: export PROJECT_ID=[gcp-project-id]
3. Set the project: `gcloud config set project $PROJECT_ID`
4. Download the service account key.
5. Assign the key to environment var: **GOOGLE_APPLICATION_CREDENTIALS**
LINUX/MAC
`export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json`
WIN
`set GOOGLE_APPLICATION_CREDENTIALS=c:\path\to\service_account.json`
6. Login: `gcloud auth login`
7. Open **server/env.txt**, change the environment variables and rename the file to **server/.env**
8. Enable APIs:
```
gcloud services enable \
appengineflex.googleapis.com \
containerregistry.googleapis.com \
cloudbuild.googleapis.com \
cloudtrace.googleapis.com \
dialogflow.googleapis.com \
logging.googleapis.com \
monitoring.googleapis.com \
sourcerepo.googleapis.com \
speech.googleapis.com \
mediatranslation.googleapis.com \
texttospeech.googleapis.com \
translate.googleapis.com
```
9. Build the client-side Angular app:
```
cd client && sudo npm install
npm run-script build
```
10. Start the server Typescript app, which is exposed on port 8080:
```
cd ../server && sudo npm install
npm run-script watch
```
3. Browse to http://localhost:8080
## Setup Dialogflow
1. Create a Dialogflow agent at: http://console.dialogflow.com
1. Zip the contents of the *dialogflow* folder, from this repo.
1. Click **settings** > **Import**, and upload the Dialogflow agent zip, you just created.
1. *Caution: Knowledge connector settings are not currently included when exporting, importing, or restoring agents.*
Make sure you have enabled *Beta* features in settings.
1. Select *Knowledge* from the left menu.
1. Create a Knowledge Base: **Airports**
1. Add the following Knowledge Base **FAQs**, as **text/html** documents:
* https://www.panynj.gov/port-authority/en/help-center/faq/airports-faq-help-center.html
* https://www.schiphol.nl/en/before-you-take-off/
* https://www.flysfo.com/faqs
1. As a response it requires the following custom payload:
```
{
"knowledgebase": true,
"QUESTION": "$Knowledge.Question[1]",
"ANSWER": "$Knowledge.Answer[1]"
}
```
1. And to make the Text to Speech version of the answer working add the following Text SSML response:
```
$Knowledge.Answer[1]
```
# Deploy with App Engine Flex
This demo makes heavy use of websockets and
the microphone `getUserMedia()` HTML5 API requires
to run over HTTPS. Therefore, I deploy this demo
with a custom runtime, so I can include my own **Dockerfile**.
1. Edit the **app.yaml** to tweak the environment variables.
Set the correct Project ID.
1. Deploy with: `gcloud app deploy`
1. Browse: `gcloud app browse`
# Examples
The selfservice kiosk is a full end to end application.
To showcase smaller examples, I've created 6 small demos. Here's how you can get these running:
1. Install the required libraries, run the following command from the *examples* folder:
`npm install`
2. Start the simpleserver node app:
`npm --EXAMPLE=1 --PORT=8080 --PROJECT_ID=[your-gcp-project-id] run start`
To switch to the various examples, edit the EXAMPLE variable to one of these:
* Example **1**: Dialogflow Speech Intent Detection
* Example **2**: Dialogflow Speech Detection through streaming
* Example **3**: Dialogflow Speech Intent Detection with Text to Speech output
* Example **4**: Speech to Text Transcribe Recognize Call
* Example **5**: Speech to Text Transcribe Streaming Recognize
* Example **6**: Text to Speech in a browser
3. Browse to http://localhost:8080. Open the inspector, to preview the
Dialogflow results object.
The code required for these examples can be found in **simpleserver.js** for the different Dialogflow & STT calls. - **example1.html** - **example5.html** will show the client-side implementations.
# License
Apache 2.0
This is not an official Google product.