{"id":15906771,"url":"https://github.com/labbeti/dcase2021task6","last_synced_at":"2025-04-02T23:26:20.948Z","repository":{"id":98318725,"uuid":"377414427","full_name":"Labbeti/dcase2021task6","owner":"Labbeti","description":"IRIT-UPS DCASE 2021 AUDIO CAPTIONING SYSTEM","archived":false,"fork":false,"pushed_at":"2021-07-05T10:16:28.000Z","size":81,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-08T13:42:56.530Z","etag":null,"topics":["audio-captioning","dcase","dcase2021","dcase2021task6","deep-learning","machine-learning"],"latest_commit_sha":null,"homepage":"http://dcase.community/documents/challenge2021/technical_reports/DCASE2021_Labbe_102_t6.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Labbeti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-16T07:49:14.000Z","updated_at":"2021-09-16T09:19:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"2cd9737f-bd33-4ac3-8021-27b58fe21fab","html_url":"https://github.com/Labbeti/dcase2021task6","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2Fdcase2021task6","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2Fdcase2021task6/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2Fdcase2021task6/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Labbeti%2Fdcase2021task6/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Labbeti","download_url":"https://codeload.github.com/Labbeti/dcase2021task6/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246907935,"owners_count":20853163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-captioning","dcase","dcase2021","dcase2021task6","deep-learning","machine-learning"],"created_at":"2024-10-06T13:41:40.568Z","updated_at":"2025-04-02T23:26:20.924Z","avatar_url":"https://github.com/Labbeti.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Automated Audio Captioning (AAC)\n\nAutomated Audio Captioning training code on Clotho datasets with Listen-Attend-Spell and CNN-Tell models.\n\n## TLDR for DCASE 2021 Task 6 challenge\n```shell\ngit clone https://github.com/Labbeti/dcase2021task6\ncd AAC\nconda create -n env_aac python=3.9 pip\nconda activate env_aac\npip install -e .\ncd standalone\npython download.py\ncd ../slurm\n./dcase.sh\n```\n\n## Installation\n### Requirements\n- Anaconda \u003e= 4.8,\n- java \u003e= 1.8.0 for SPICE metric,\n- Python dependencies can be installed with setup.py (if you use requirements.txt only you must run the shell script \"post_setup.sh\" manually).\n\n### Environment\n```shell\ngit clone https://github.com/Labbeti/dcase2021task6\ncd AAC\nconda create -n env_aac python=3.9 pip\nconda activate env_aac\npip install -e .\n```\n\nThis repository requires Java +1.8.0 and Stanford-CoreNLP for compute the 'Cider' and 'Spider' metrics.\nOn Ubuntu, Java can be installed with the following command :\n```shell\nsudo apt install default-jre\n```\n\n### Dataset and models installation\nYou can install the datasets with the script `standalone/download.py`. The default root path is `data`.\nYou can choose a dataset with the option `data=DATASET`.\nThis script also install language models for NLTK, spaCy and LanguageTool for process captions and a pre-trained model \"Wavegram\" from PANN.\n\nExample : (download Clotho v2.1)\n```shell\npython download.py data=clotho\n```\n\n## Usage\n\n### DCASE2021 Task 6\nAfter install the environment and the dataset, juste run the script `dcase.sh` :\n```shell\ncd slurm\n./dcase.sh\n```\n\n### Other example\nJust run in directory `standalone` :\n```shell\npython train.py expt=lat data=clotho epochs=60 \n```\nFor training Listen-Attend-Tell model with Clotho dataset during 60 epochs.\nThe testing is automatically done at the end of the training, but it can be turn off with `test=false`.\n\n### Main options for `train.py`\nThis project use Hydra for parsing parameters in terminal. The syntax is `param_name=VALUE` instead of `--param_name VALUE`.\n\n- expt=EXPERIMENT\n\t- lat (ListenAttendTell, a recurrent model based on Listen Attend Spell by Thomas Pellegrini)\n\t- cnnt (CNN-Tell, a convolutional recurrent model with a pre-trained encoder and the same decoder than LAT)\n\n### Result directory\nThe model and result data are saved in `logs/Clotho/train_ListenAttendTell/{DATETIME}_{TAG}/` directory, where DATETIME is the date of the start of the process and TAG the value of the `tag` option.\n\nThe results directory contains :\n- a `hydra` directory which store hydra parameters,\n- a `checkpoint` directory which store the best and the last model among training,\n- a ̀`events.outs.tfevents.ID` file which contains tensorboard logs,\n- a `hparams.yaml` file which store the experiment model hyper-parameters,\n- a `metrics.yaml` file which store the metrics results done by the Evaluator callback,\n- a list of `result_SUBSET.csv` files for each test dataset SUBSET which store the output of the model for each sample.\n- a `vocabulary.json` file containing the list of ordered words used, with frequencies of each word in the training dataset(s).\n\n## External authors\n- Thomas Pellegrini for the Listen-Attend-Spell model\n\t- [source code](https://github.com/topel/listen-attend-tell)\n- Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley for the Cnn14_DecisionLevelAtt model from PANN \n\t- [source code](https://github.com/qiuqiangkong/audioset_tagging_cnn)\n\t- Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley. \"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition.\" arXiv preprint arXiv:1912.10211 (2019).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabbeti%2Fdcase2021task6","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flabbeti%2Fdcase2021task6","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabbeti%2Fdcase2021task6/lists"}