https://github.com/voice-engine/make-a-smart-speaker
A collection of resources to make a smart speaker
https://github.com/voice-engine/make-a-smart-speaker
aec beamforming kws nlu stt tts voice-assistant
Last synced: about 1 month ago
JSON representation
A collection of resources to make a smart speaker
- Host: GitHub
- URL: https://github.com/voice-engine/make-a-smart-speaker
- Owner: voice-engine
- Created: 2018-01-06T15:37:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-12-20T11:00:15.000Z (almost 6 years ago)
- Last Synced: 2023-11-07T18:24:53.137Z (almost 2 years ago)
- Topics: aec, beamforming, kws, nlu, stt, tts, voice-assistant
- Size: 176 KB
- Stars: 388
- Watchers: 32
- Forks: 85
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rainmana - voice-engine/make-a-smart-speaker - A collection of resources to make a smart speaker (Others)
README
To make a smart speaker
=======================[中文](zh.md)
Here is a collection of resources to make a smart speaker. ~~Hope we can make an open source one for daily use.~~
I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at [the progress of the project named `smart speaker from scratch` on hackaday](https://hackaday.io/project/164221-smart-speaker-from-scratch). [The first hardware kit is available now.](https://www.makerfabs.com/voicen-linear-4-mic-array-kit.html)
The simplified flowchart of a smart speaker is like:
```
+---+ +----------------+ +---+ +---+ +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+ +----------------+ +---+ +---+ +-+-+
|
|
+-------+ +---+ +----------------------+ |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+ +---+ +----------------------+
```+ Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
+ Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
+ Speech To Text (STT)
+ Natural Language Understanding (NLU) converts raw text into structured data.
+ Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
+ Text To Speech------------------
### KWS + STT + NLU + Skill + TTS
#### Active open source projects
+ [Snips :star:](https://snips.ai) - the first 100% on-device and private-by-design open-source Voice AI platform
+ [Mycroft :star:](https://github.com/MycroftAI/mycroft-core) - a hackable open source voice assistant
+ [SEPIA :robot:](https://sepia-framework.github.io/) - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
+ [Kalliope](https://github.com/kalliope-project/kalliope) - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
+ [dingdang robot](https://github.com/dingdang-robot/dingdang-robot) - a :cn: voice interaction robot based on [Jasper](https://github.com/jasperproject/jasper-client) and built with raspberry pi#### SDK
+ Amazon Alexa Voice Service - is the most widely used voice assistant
+ [C++ SDK](https://github.com/alexa/avs-device-sdk)
+ [Java Client](https://github.com/alexa/alexa-avs-sample-app)
+ [Python Client](https://github.com/respeaker/avs)+ [Google Assistant SDK](https://github.com/googlesamples/assistant-sdk-python)
It has the smartest brain, its extension called Google Action can be created on a few steps with digitalflow.ai and its Device Action is very suit for home smart devices.
+ [Baidu DuerOS](https://github.com/dueros)
+ [Snips](https://snips.ai)
+ [Install Snips](https://snips.gitbook.io/documentation/installing-snips) on Raspberry Pi 3, Linux, osX, iOS and Android+ [SEPIA Installation](https://medium.com/sepia-framework/hosting-your-own-private-virtual-assistant-533b86553d63), [SEPIA with Porcupine + ReSpeaker](https://github.com/SEPIA-Framework/sepia-wakeword-tools/tree/master/Porcupine)
### KWS
+ [Mycroft Precise](https://github.com/MycroftAI/mycroft-precise) - A lightweight, simple-to-use, RNN wake word listener
+ [Snowboy](https://github.com/Kitt-AI/snowboy) - DNN based hotword and wake word detection toolkit
+ [Honk](https://github.com/castorini/honk) - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
+ [ML-KWS-For-MCU](https://github.com/ARM-software/ML-KWS-for-MCU) - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
+ [Porcupine](https://picovoice.ai/products/#wake-word) - Lightweight, cross-platform engine to build custom wake words in seconds### STT
+ [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech) - A TensorFlow implementation of Baidu's DeepSpeech architecture
+ [Kaldi](https://github.com/kaldi-asr/kaldi)
+ [wav2letter++](https://github.com/facebookresearch/wav2letter) - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
+ [Zamia Speech](https://github.com/gooofy/zamia-speech) - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
+ [PocketSphinx](https://github.com/cmusphinx/pocketsphinx) - a lightweight speech recognition engine using HMM + GMM### NLU
+ [Rasa NLU](https://github.com/RasaHQ/rasa_nlu)+ [Rasa NLU for Chinese](https://github.com/crownpku/Rasa_NLU_Chi)
+ [Snips NLU](https://github.com/snipsco/snips-nlu) - a Python library that allows to parse sentences written in natural language and extracts structured information.### TTS
+ [Mozilla TTS](https://github.com/mozilla/TTS) - Deep learning for Text to Speech
+ [Mimic](https://github.com/MycroftAI/mimic) - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
+ [manytts](https://github.com/marytts/marytts) - an open-source, multilingual text-to-speech synthesis system written in pure java
+ [espeak-ng](https://github.com/espeak-ng/espeak-ng) - an open source speech synthesizer that supports 99 languages and accents.
+ [ekho](https://github.com/hgneng/ekho) - Chinese text-to-speech engine
+ WaveNet, Tacotron 2### Audio Processing
+ Acoustic Echo Cancellation+ [SpeexDSP](https://github.com/xiph/speexdsp), its python binding [speexdsp-python](github.com/xiongyihui/speexdsp-python)
+ [EC](https://github.com/voice-engine/ec) - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.+ Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT
+ [tdoa](https://github.com/xiongyihui/tdoa)
+ [odas](https://github.com/introlab/odas) - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.+ [Beamforming](https://github.com/search?utf8=%E2%9C%93&q=beamforming&type=)
+ [BeamformIt](https://github.com/xanguera/BeamformIt) - filter&sum beamforming
+ CGMM Beamforming - [a reference implementation](https://github.com/funcwj/CGMM-MVDR)
+ MVDR Beamforming
+ GSC Beamforming+ Voice Activity Detection
+ WebRTC VAD, [py-webrtcvad](https://github.com/wiseman/py-webrtcvad)
+ DNN VAD+ Noise Suppresion
+ NS of WebRTC audio processing, [python-webrtc-audio-processing](https://github.com/xiongyihui/python-webrtc-audio-processing)
### Audio I/O
+ PortAudio, pyaudio
+ [libsoundio](https://github.com/andrewrk/libsoundio)
+ ALSA
+ PulseAudio
+ [Pipewire](https://github.com/PipeWire/pipewire)