Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/labbeti/dcase2022task6a
DCASE2022 Challenge Task6a: IRIT-UPS DCASE 2022 TASK6A SYSTEM: STOCHASTIC DECODING METHODS FOR AUDIO CAPTIONING
https://github.com/labbeti/dcase2022task6a
Last synced: 8 days ago
JSON representation
DCASE2022 Challenge Task6a: IRIT-UPS DCASE 2022 TASK6A SYSTEM: STOCHASTIC DECODING METHODS FOR AUDIO CAPTIONING
- Host: GitHub
- URL: https://github.com/labbeti/dcase2022task6a
- Owner: Labbeti
- License: mit
- Created: 2022-06-13T15:29:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-06T11:13:25.000Z (over 2 years ago)
- Last Synced: 2024-10-28T04:48:26.138Z (about 2 months ago)
- Language: Python
- Homepage: https://dcase.community/documents/challenge2022/technical_reports/DCASE2022_Labbe_87_t6a.pdf
- Size: 121 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# IRIT-UPS DCASE 2022 TASK6A SYSTEM: STOCHASTIC DECODING METHODS FOR AUDIO CAPTIONING
Automated Audio Captioning experiment source code on **Clotho** dataset for DCASE2022 task6a challenge.
## TLDR
**Installation with conda** :
```bash
git clone https://github.com/Labbeti/dcase2022task6a
cd dcase2022task6a
conda create -n env_task6a -f environment_full.yaml
conda activate env_task6a
pip install -e aac_datasets --no-dependencies
pip install -e . --no-dependencies
```
**Reproduce results of the submission** :
```bash
# Download & prepare Clotho
python -m dcase2022task6a.prepare
# Train a model on Clotho
python -m dcase2022task6a.train pl.beam_size=9
# Select the path where the training has saved data
logdir="/absolute/path/to/train/logdir"
# Test decoding methods
python -m dcase2022task6a.train trainer=test resume=${logdir} pl.beam_size=1 pl.top_k=4 pl.generator=1234
python -m dcase2022task6a.train trainer=test resume=${logdir} pl.beam_size=1 pl.top_p=0.3 pl.generator=1234
python -m dcase2022task6a.train trainer=test resume=${logdir} pl.beam_size=1 pl.typical_p=0.8 pl.generator=1234
```## Installation details
This repository contains `environment_full.yaml` and a `requirements.txt` files for installing dependencies via conda or pip.
The `environment_full.yaml` contains the exact same environment than used for development.### External Requirements
- **Java >= 1.8.0** to compute the **SPICE** metric. (you can specify a path with `path.java` option)
- On Ubuntu : `sudo apt install default-jre`
- **unzip** for extract the JAR file from the SPICE zip file.### Dataset and models preparation
You can install the datasets with `python -m dcase2022task6a.prepare`. The default root path is `./data/`, but you can change it with the option `data.root=/my/root`.
You can choose a dataset with the option `data=DATASET`.
This script also install language models for NLTK, spaCy and LanguageTool to process captions and download PANN pre-trained models.Example : (download basic models and Clotho v2.1)
```shell
python -m dcase2022task6a.prepare data=clotho
```Note : Clotho is fast to install (several minutes), but AudioCaps requires to download and extract youtube audios with ffmpeg and can take several days.
### Other installation mode with pip
You can also use the `requirements.txt` file for install dependencies :
```bash
pip install https://github.com/Labbeti/dcase2022task6a
```The installation is simplier and faster than with conda, but the packages like pytorch, torchaudio, etc will be installed from pip instead of conda, which means the results can be different.
The program can also be slower due to conda optimizations like numpy with MKL.## Usage
### Example
```shell
python -m dcase2022task6a.train pl=cnn10_transformer data=clotho epochs=50
```
For training CNN10Transformer model with Clotho dataset during 50 epochs.
The testing is automatically done at the end of the training.## External authors
- Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley for **PANN models** (CNN10, CNN14...)
- Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley. "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition." arXiv preprint arXiv:1912.10211 (2019).
- [source code](https://github.com/qiuqiangkong/audioset_tagging_cnn)