
An open API service indexing awesome lists of open source software.

speaker diarization system using an LSTM

ai docker express lstm machine-learning ml neural-network pytorch redis reverb speaker-diarization vue

Last synced: about 1 month ago
JSON representation

speaker diarization system using an LSTM






# About the project
RE: VERB is [speaker diarization]( system,
it allows the user to send/record audio of a conversation and receive timestamps of who spoke when

RE:VERB is our final project in [Magshimim](, and consists of a web client and a server.

* The [client](#Client) can record audio and show the the timestamp results graphically

* The [server](#Server) can be used with many other clients with the simple REST API it has.

## Built With

### client
* [Vue.js](http:// - The front end framework used
* [Wavesurfer.js]( - A library for waveform visualization
### server

* [Pytorch]( - library for deep learning with python that has great support for GPUs with CUDA

* [Express.js]( - Node.js web server framework

## Getting Started
The project contains the server and the web client(a CLI client also exists for debug purposes).

the server is located at ```./server```
and the web client is located at ```./client/website```.

### **Server**

The model alongside the scripts for downloading, training and the weights from our training is located at ```./server/speech_diarization/model```

we used Docker to create a cross-platform environment to run the server on.

The server is made up of:
* a container for the web server
* a container for the diarization process
* a container for a redis database that will allow the others to communicate

docker compose will run and manage all 3 at once

**Docker and docker-compose need to be installed** in order to build and run the server, all the rest will be taken care of.

### Installing

cd server
docker-compose up

This will run all 3 containers and install dependencies.

If you make a change in the server, use

docker-compose up --build
to rebuild.

>### __usage:__
>sending a HTTP POST request with an audio file to the server at ```http://localhost:1337/upload``` (default port and url) will return a JSON file with the timestamps in milliseconds.
>{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}

## __Client__
**The client needs npm or yarn to be installed**, more info about the client can be found [here](client/website/

to install:
cd client/website
npm install

afterwards you can use
npm run serve
to run a development server

## Authors

* **Ofir Naccache** - [ofirnaccache](
* **Matan Yesharim** - [Tralfazz](
## License

This project is licensed under the MIT License - see the []( file for details

## Acknowledgments

* The diarization algorithm is an implementation of [this research](, we also used their implementation of the spectral clustering

* We took inspiration and some code from [Harry volek's implementation]( of a different but similar problem - Speaker Verification

## Future Plans

* We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.

* We plan to train again on the [VoxCeleb 1 and 2]( datasets which contain a lot more data and hopefully improve feature extraction

* We want to add integration with a speech-to-text service and transcribe the created segments