{"id":47888509,"url":"https://github.com/awkrail/dcase2026_task6_baseline","last_synced_at":"2026-04-04T02:26:02.152Z","repository":{"id":348481172,"uuid":"1166571715","full_name":"awkrail/dcase2026_task6_baseline","owner":"awkrail","description":"DETR-based baseline for DCASE 2026 challenge task 6.","archived":false,"fork":false,"pushed_at":"2026-04-01T09:37:58.000Z","size":271815,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-01T11:38:22.559Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awkrail.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-25T11:15:34.000Z","updated_at":"2026-03-31T21:04:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/awkrail/dcase2026_task6_baseline","commit_stats":null,"previous_names":["awkrail/dcase2026_task6_baseline"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/awkrail/dcase2026_task6_baseline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awkrail%2Fdcase2026_task6_baseline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awkrail%2Fdcase2026_task6_baseline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awkrail%2Fdcase2026_task6_baseline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awkrail%2Fdcase2026_task6_baseline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awkrail","download_url":"https://codeload.github.com/awkrail/dcase2026_task6_baseline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awkrail%2Fdcase2026_task6_baseline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31385248,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T01:22:39.193Z","status":"online","status_checked_at":"2026-04-04T02:00:07.569Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-04T02:25:58.682Z","updated_at":"2026-04-04T02:26:02.136Z","avatar_url":"https://github.com/awkrail.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dcase2026_task6_baseline\n[QD-DETR](https://github.com/wjun0830/QD-DETR)-based baseline for DCASE 2026 challenge task 6.\n\n## Model architecture\nThe model is based on QD-DETR, a Transformer-based encoder-decoder architecture. An overview architecture is described in Figure 2 in the [paper](https://arxiv.org/pdf/2303.13874).\nGiven an audio and text pair, [CLAP](https://github.com/microsoft/CLAP) encodes them into audio and text features, respectively.\nThese features are then forwarded into the cross-attention transformers, followed by the Transformer decoder.\nFinally, the model outputs multiple candidate moments with start/end timestamps and confidence scores.\n\n## Getting started\n0. Clone this repository\n```\ngit clone https://github.com/awkrail/dcase2026_task6_baseline.git\n```\n1. Install Pytorch \u0026 dependency libraries\nInstall pytorch, torchvision, and torchaudio based on your GPU environments. Note that the inference API is available for CPU environments. We tested the codes on Python 3.9 and CUDA 11.8:\n```\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118\npip install -r requirements.txt\n```\n2. Prepare feature files\nDownload [CASTELLA dataset](https://zenodo.org/records/17412176) and [Clotho-Moment dataset](https://zenodo.org/records/17129257).\n```\nwget https://zenodo.org/records/17412176/files/features.tar.gz\ntar -zxvf features.tar.gz\n```\n\n```\nwget https://zenodo.org/api/records/17129257/files-archive\ncat clotho-moment_features.tar.part-* \u003e clotho-moment_features.tar\ntar -xvf clotho-moment_features.tar\n```\n\nThese feature files are also available in HuggingFace.\n- [CASTELLA dataset](https://huggingface.co/datasets/lighthouse-emnlp2024/CASTELLA_CLAP_features)\n- [Clotho-Moment dataset](https://huggingface.co/datasets/lighthouse-emnlp2024/Clotho-Moment_CLAP_features)\n\n\n## Training and evaluation\n0. Train a model\n```\npython src/train.py --config config.yml  \n```\n- `config.yml` is for CASTELLA. If you train models on Clotho-Moment, use `config-pretraining.yml`\n- If you use pre-trained model weights, use `--resume ./**/{checkpoint}.pth`\n\n\n1. Evaluation\nReproduce the evaluation on the `val` set.\n```\npython src/evaluate.py --config config.yml --model_path results/best_checkpoint.pth\n```\nThe result is:\n```\n2026-03-30 01:14:08.441:INFO:__main__ - Setup config, data and model...\n2026-03-30 01:14:08.442:INFO:__main__ - setup model/optimizer/scheduler\n2026-03-30 01:14:08.885:INFO:__main__ - CUDA enabled.\n2026-03-30 01:14:09.264:INFO:__main__ - Model checkpoint: results/best_checkpoint.pth\n2026-03-30 01:14:09.264:INFO:__main__ - Starting inference...\n2026-03-30 01:14:09.264:INFO:__main__ - Generate submissions\ncompute st ed scores: 100%|███████████████████████████████████████████████████| 4/4 [00:01\u003c00:00,  2.93it/s]\nconvert to multiples of clip_length=1: 100%|███████████████████████████| 352/352 [00:00\u003c00:00, 28908.68it/s]\n2026-03-30 01:14:10.652:INFO:__main__ - Saving/Evaluating before nms results\nfull: [0, 1500], 352/352=100.00 examples.\n[eval_moment_retrieval] [full] 0.12 seconds\n2026-03-30 01:14:10.795:INFO:__main__ - metrics_no_nms OrderedDict([   ('MR-full-R1@0.5', 27.56),\n                ('MR-full-R1@0.7', 16.19),\n                ('MR-full-mAP', 11.44),\n                ('MR-full-mAP@0.5', 24.02),\n                ('MR-full-mAP@0.75', 10.26)])\n```\n\nReproduce the evaluation on the `test` set:\n```\npython src/evaluate.py --config config.yml --split test --model_path results/best_checkpoint.pth\n```\nThe result is:\n```\n2026-03-30 01:14:48.156:INFO:__main__ - Setup config, data and model...\n2026-03-30 01:14:48.160:INFO:__main__ - setup model/optimizer/scheduler\n2026-03-30 01:14:48.599:INFO:__main__ - CUDA enabled.\n2026-03-30 01:14:48.986:INFO:__main__ - Model checkpoint: results/best_checkpoint.pth\n2026-03-30 01:14:48.986:INFO:__main__ - Starting inference...\n2026-03-30 01:14:48.986:INFO:__main__ - Generate submissions\ncompute st ed scores: 100%|█████████████████████████████████████████████████| 14/14 [00:02\u003c00:00,  5.44it/s]\nconvert to multiples of clip_length=1: 100%|█████████████████████████| 1347/1347 [00:00\u003c00:00, 28259.52it/s]\n2026-03-30 01:14:51.617:INFO:__main__ - Saving/Evaluating before nms results\nfull: [0, 1500], 1347/1347=100.00 examples.\n[eval_moment_retrieval] [full] 0.24 seconds\n2026-03-30 01:14:51.886:INFO:__main__ - metrics_no_nms OrderedDict([   ('MR-full-R1@0.5', 23.16),\n                ('MR-full-R1@0.7', 10.32),\n                ('MR-full-mAP', 9.11),\n                ('MR-full-mAP@0.5', 20.34),\n                ('MR-full-mAP@0.75', 6.96)])\n```\n\n## Preparation for submission.jsonl\nRun the following command to create submission file. (Evaluation data for the submission will be publicly available on June 1, and the script will work after that.)\n```\npython src/create_submission.py --config config.yml --model_path results/best_checkpoint.pth\n```\nYou can get `private_submission.jsonl` file under `results` directory. For details, please read [this README.md](src/standalone_eval/README.md)\n\n## Statistics of scores\nScores may vary slightly due to different random seeds or minor differences in library versions. We conducted five training runs, and the resulting scores on CASTELLA `test` set (mean ± standard deviation) are as follows:\n- Only CASTELLA\n  - R1@0.5    : 22.74±0.77\n  - R1@0.7    : 10.17±0.86\n  - mAP (avg)  : 10.49±0.53\n  - mAP@0.5   : 21.93±0.57\n  - mAP@0.75  : 8.85±0.58\n- Clotho-Moment pre-training \u0026 CASTELLA fine-tuning\n  - R1@0.5    : 25.86±0.74\n  - R1@0.7    : 13.85±1.47\n  - mAP (avg)  : 11.74±0.39\n  - mAP@0.5   : 23.14±0.33\n  - mAP@0.75  : 10.54±0.54\n\n## Note\n- This recipe includes minor changes from the original paper to improve performance:\n  - Training extended from 100 to 200 epochs\n  - window sampling controlled by `max_windows` to stabilize the training\n\n## Citation\nIf you find this code useful for your research, please cite the original paper:\n```\n@inproceedings{munakata2025audiomoment,\n  author = {Munakata, Hokuto and Nishimura, Taichi and Nakada, Shota and Komatsu, Tatsuya},\n  title = {Language-based Audio Moment Retrieval},\n  booktitle = {Proc. ICASSP},\n  year = {2025},\n  pages = {1-5},\n  _pdf = {https://arxiv.org/pdf/2409.15672}\n}\n```\nQD-DETR citation:\n```\n@inproceedings{qddetr\n    author = {WonJun Moon and Sangeek Hyun and SangUk Park and Dongchan Park and Jae-Pil Heo},\n    title = {Query-Dependent Video Representation for Moment Retrieval and Highlight Detection},\n    booktitle = {Proc. CVPR},\n    year = {2023},\n}\n```\n\n## Others\nThis code is based on [lighthouse](https://github.com/line/lighthouse).\n\n\n## Contact\ntaichitary@gmail.com\n\nhokuto.munakata@lycorp.co.jp\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawkrail%2Fdcase2026_task6_baseline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawkrail%2Fdcase2026_task6_baseline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawkrail%2Fdcase2026_task6_baseline/lists"}