Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/team-re-verb/RE-VERB
speaker diarization system using an LSTM
https://github.com/team-re-verb/RE-VERB
ai docker express lstm machine-learning ml neural-network pytorch redis reverb speaker-diarization vue
Last synced: about 1 month ago
JSON representation
speaker diarization system using an LSTM
- Host: GitHub
- URL: https://github.com/team-re-verb/RE-VERB
- Owner: team-re-verb
- License: mit
- Created: 2019-07-16T21:21:44.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-04T04:38:27.000Z (over 1 year ago)
- Last Synced: 2024-02-21T04:40:40.387Z (5 months ago)
- Topics: ai, docker, express, lstm, machine-learning, ml, neural-network, pytorch, redis, reverb, speaker-diarization, vue
- Language: Python
- Homepage:
- Size: 135 MB
- Stars: 45
- Watchers: 5
- Forks: 8
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Lists
- awesome-diarization - RE-VERB - re-verb/RE-VERB?style=social) | Python & JavaScript | RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when. | (Software / Framework)
README
# About the project
RE: VERB is [speaker diarization](https://en.wikipedia.org/wiki/Speaker_diarisation) system,
it allows the user to send/record audio of a conversation and receive timestamps of who spoke whenRE:VERB is our final project in [Magshimim](https://www.magshimim.cyber.org.il/), and consists of a web client and a server.
* The [client](#Client) can record audio and show the the timestamp results graphically
* The [server](#Server) can be used with many other clients with the simple REST API it has.
## Built With
### client
* [Vue.js](http://https://vuejs.org/) - The front end framework used
* [Wavesurfer.js](https://wavesurfer-js.org) - A library for waveform visualization
### server* [Pytorch](https://pytorch.org/) - library for deep learning with python that has great support for GPUs with CUDA
* [Express.js](https://maven.apache.org/) - Node.js web server framework
## Getting Started
The project contains the server and the web client(a CLI client also exists for debug purposes).the server is located at ```./server```
and the web client is located at ```./client/website```.***
### **Server**The model alongside the scripts for downloading, training and the weights from our training is located at ```./server/speech_diarization/model```
we used Docker to create a cross-platform environment to run the server on.
The server is made up of:
* a container for the web server
* a container for the diarization process
* a container for a redis database that will allow the others to communicatedocker compose will run and manage all 3 at once
**Docker and docker-compose need to be installed** in order to build and run the server, all the rest will be taken care of.
### Installing
```bash
cd server
docker-compose up
```This will run all 3 containers and install dependencies.
If you make a change in the server, use
```bash
docker-compose up --build
```
to rebuild.>### __usage:__
>
>sending a HTTP POST request with an audio file to the server at ```http://localhost:1337/upload``` (default port and url) will return a JSON file with the timestamps in milliseconds.
>
>```json
>{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}
>```
***## __Client__
**The client needs npm or yarn to be installed**, more info about the client can be found [here](client/website/README.md).to install:
```bash
cd client/website
npm install
```afterwards you can use
```
npm run serve
```
to run a development server***
## Authors* **Ofir Naccache** - [ofirnaccache](https://github.com/ofirnaccache)
* **Matan Yesharim** - [Tralfazz](https://github.com/Tralfazz)
## LicenseThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
## Acknowledgments
* The diarization algorithm is an implementation of [this research](https://google.github.io/speaker-id/publications/LstmDiarization/), we also used their implementation of the spectral clustering
* We took inspiration and some code from [Harry volek's implementation](https://github.com/HarryVolek/PyTorch_Speaker_Verification) of a different but similar problem - Speaker Verification
## Future Plans
* We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.
* We plan to train again on the [VoxCeleb 1 and 2](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) datasets which contain a lot more data and hopefully improve feature extraction
* We want to add integration with a speech-to-text service and transcribe the created segments