https://github.com/omine-me/laughtersegmentation
Latest laughter detection & segmentaion model.
https://github.com/omine-me/laughtersegmentation
laugh-detection laughter laughter-detection laughter-segmentaion sound-event-detection sound-synthesis speech
Last synced: 10 months ago
JSON representation
Latest laughter detection & segmentaion model.
- Host: GitHub
- URL: https://github.com/omine-me/laughtersegmentation
- Owner: omine-me
- License: mit
- Created: 2024-06-05T15:27:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-20T11:26:39.000Z (over 1 year ago)
- Last Synced: 2024-07-20T12:29:58.577Z (over 1 year ago)
- Topics: laugh-detection, laughter, laughter-detection, laughter-segmentaion, sound-event-detection, sound-synthesis, speech
- Language: Python
- Homepage:
- Size: 1.09 MB
- Stars: 12
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Laughter Segmentation
## Overview
You can extract a exact segment of laughter from various talking audio using trained model and code. You can also train your own model.
Code, annotations, and model are described in the following paper:
[Taisei Omine, Kenta Akita, and Reiji Tsuruno, "Robust Laughter Segmentation with Automatic Diverse Data Synthesis", Interspeech 2024.](https://doi.org/10.21437/Interspeech.2024-1644)
## Installation
```Batchfile
git clone https://github.com/omine-me/LaughterSegmentation.git
cd LaughterSegmentation
python -m pip install -r requirements.txt
# ↓ Depends on your environment. See https://pytorch.org/get-started/locally/
python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
```
Run in Venv environment is recommended. Also, download `model.safetensors` from [Huggingface](https://huggingface.co/omine-me/LaughterSegmentation/tree/main) (1.26 GB) and place it in `models` directory and make sure the name is `model.safetensors`.
Python<=3.11 is required ([#2](https://github.com/omine-me/LaughterSegmentation/issues/2)).
Tested on Windows 11 with GeForce RTX 2060 SUPER.
## Usage
1. Prepare audio file.
1. Open Terminal and go to the directory where `inference.py` is located.
1. Run `python inference.py --audio_path audio.wav`. You have to change *audio.wav* to your own audio path. You can use common audio format like `mp3`, `wav`, `opus`, etc. 16kHz wav audio is faster. If the audio fails to load, run the following command and also download FFmpeg and add it to the PATH.
```Batchfile
python -m pip uninstall pysoundfile
python -m pip uninstall soundfile
python -m pip install soundfile
```
1. If you want to change output directory, use `--output_dir` option. If you want to use your own model, use `--model_path` option.
1. Result will be saved in output directory in json format. To visualize the results, you can use [this site](https://omine-me.github.io/AudioDatasetChecker/compare.html) (not perfect because it's for debugging).
## Training
Read [README](/train/README.md) in train directory.
## Evaluation (Includes our evaluation dataset)
Read [README](/evaluation/README.md) in evaluavtion directory.
## License
This repository is MIT-licensed, but [the publicly available trained model](https://huggingface.co/omine-me/LaughterSegmentation/tree/main) is currently available for research use only.
## Citation
Cite as: `Omine, T., Akita, K., Tsuruno, R. (2024) Robust Laughter Segmentation with Automatic Diverse Data Synthesis. Proc. Interspeech 2024, 4748-4752, doi: 10.21437/Interspeech.2024-1644`
or
```
@inproceedings{omine24_interspeech,
title = {Robust Laughter Segmentation with Automatic Diverse Data Synthesis},
author = {Taisei Omine and Kenta Akita and Reiji Tsuruno},
year = {2024},
booktitle = {Interspeech 2024},
pages = {4748--4752},
doi = {10.21437/Interspeech.2024-1644},
}
```
## Contact
Use Issues or reach out my [X(Twitter)](https://x.com/mineBeReal).