{"id":28491445,"url":"https://github.com/line/lighthouse","last_synced_at":"2025-07-04T23:30:52.799Z","repository":{"id":249219662,"uuid":"822877471","full_name":"line/lighthouse","owner":"line","description":"[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.","archived":false,"fork":false,"pushed_at":"2025-06-06T04:44:35.000Z","size":34955,"stargazers_count":147,"open_issues_count":4,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-06-06T05:28:03.727Z","etag":null,"topics":["audio","audio-moment-retrieval","audio-processing","computer-vision","highlight-detection","moment-retrieval","multimodal","natural-language-processing","video","video-moment-retrieval","video-processing"],"latest_commit_sha":null,"homepage":"https://www.arxiv.org/abs/2408.02901","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/line.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-07-02T02:28:34.000Z","updated_at":"2025-06-06T04:44:37.000Z","dependencies_parsed_at":"2024-08-17T11:26:05.481Z","dependency_job_id":"171fd372-a11f-46da-a71c-9ddb833cbbcf","html_url":"https://github.com/line/lighthouse","commit_stats":null,"previous_names":["line/lighthouse"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/line/lighthouse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/line%2Flighthouse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/line%2Flighthouse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/line%2Flighthouse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/line%2Flighthouse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/line","download_url":"https://codeload.github.com/line/lighthouse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/line%2Flighthouse/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263635484,"owners_count":23492205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-moment-retrieval","audio-processing","computer-vision","highlight-detection","moment-retrieval","multimodal","natural-language-processing","video","video-moment-retrieval","video-processing"],"created_at":"2025-06-08T08:07:19.047Z","updated_at":"2025-07-04T23:30:52.792Z","avatar_url":"https://github.com/line.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Lighthouse\n\n![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)\n[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Video moment retrieval demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/awkrail/lighthouse_demo)\n[![Audio moment retrieval demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/lighthouse-emnlp2024/AudioMomentRetrieval)\n[![Run pytest](https://github.com/line/lighthouse/actions/workflows/pytest.yml/badge.svg)](https://github.com/line/lighthouse/actions/workflows/pytest.yml)\n[![Run mypy and ruff](https://github.com/line/lighthouse/actions/workflows/mypy_ruff.yml/badge.svg)](https://github.com/line/lighthouse/actions/workflows/mypy_ruff.yml)\n\nLighthouse is a user-friendly library for reproducible video moment retrieval and highlight detection (MR-HD).\nIt supports seven models, four features (video and audio features), and six datasets for reproducible MR-HD, MR, and HD. In addition, we prepare an inference API and Gradio demo for developers to use state-of-the-art MR-HD approaches easily.\nFurthermore, Lighthouse supports [audio moment retrieval](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), a task to identify relevant moments from an audio input based on a given text query.\n\n## News\n- [2025/06/04] [Version 1.1](https://github.com/line/lighthouse/releases/tag/v1.1) has been released. It includes API changes, AMR gradio demo, and huggingface wrappers for the audio moment retrieval and clotho dataset.\n- [2024/12/24] Our work [\"Language-based audio moment retrieval\"](https://arxiv.org/abs/2409.15672) has been accepted at ICASSP 2025.\n- [2024/10/22] [Version 1.0](https://github.com/line/lighthouse/releases/tag/v1.0) has been released.\n- [2024/10/6] Our paper has been accepted at EMNLP2024, system demonstration track.\n- [2024/09/25] Our work [\"Language-based audio moment retrieval\"](https://arxiv.org/abs/2409.15672) has been released. Lighthouse supports AMR.\n- [2024/08/22] Our demo paper is available on arXiv. Any comments are welcome: [Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection](https://www.arxiv.org/abs/2408.02901).\n\n## Installation\nInstall ffmpeg first. If you are an Ubuntu user, run:\n```\napt install ffmpeg\n```\nThen, install pytorch, torchvision, torchaudio, and torchtext based on your GPU environments.\nNote that the inference API is available for CPU environments. We tested the codes on Python 3.9 and CUDA 11.8:\n```\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 torchtext==0.16.0 --index-url https://download.pytorch.org/whl/cu118\n```\nFinally, run to install dependency libraries:\n```\npip install 'git+https://github.com/line/lighthouse.git'\n```\n\n## Inference API (Available for both CPU/GPU mode)\nLighthouse supports the following inference API:\n```python\nimport torch\nfrom lighthouse.models import CGDETRPredictor\n\n# use GPU if available\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n# slowfast_path is necesary if you use clip_slowfast features\nquery = 'A man is speaking in front of the camera'\nmodel = CGDETRPredictor('/path/to/weight.ckpt', device=device,\n                        feature_name='clip_slowfast', slowfast_path='SLOWFAST_8x8_R50.pkl')\n\n# encode video features\nvideo = model.encode_video('api_example/RoripwjYFp8_60.0_210.0.mp4')\n\n# moment retrieval \u0026 highlight detection\nprediction = model.predict(query, video)\nprint(prediction)\n\"\"\"\npred_relevant_windows: [[start, end, score], ...,]\npred_saliency_scores: [score, ...]\n\n{'query': 'A man is speaking in front of the camera',\n 'pred_relevant_windows': [[117.1296, 149.4698, 0.9993],\n                           [-0.1683, 5.4323, 0.9631],\n                           [13.3151, 23.42, 0.8129],\n                           ...],\n 'pred_saliency_scores': [-10.868017196655273,\n                          -12.097496032714844,\n                          -12.483806610107422,\n                          ...]}\n\"\"\"\n```\nLighthouse also supports the AMR inference API:\n```python\nimport torch\nfrom lighthouse.models import QDDETRPredictor\n\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nmodel = QDDETRPredictor('/path/to/weight.ckpt', device=device, feature_name='clap')\n\naudio = model.encode_audio('api_example/1a-ODBWMUAE.wav')\nquery = 'Water cascades down from a waterfall.'\nprediction = model.predict(query, audio)\nprint(prediction)\n```\nRun `python api_example/demo.py` (MR-HD) or `python api_example/amr_demo.py` (AMR) to reproduce the results. It automatically downloads pre-trained weights.\nIf you want to use other models, download [pre-trained weights](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing). \nWhen using `clip_slowfast` features, it is necessary to download [slowfast pre-trained weights](https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl).\nWhen using `clip_slowfast_pann` features, in addition to the slowfast weight, download [panns weights](https://zenodo.org/record/3987831/files/Cnn14_mAP%3D0.431.pth).\nRun `python api_example/amr_demo.py` to reproduce the AMR results.\n\n**Limitation**: The maximum video duration is **150s** due to the current benchmark datasets.\nFor CPU users, set `feature_name='clip'` because CLIP+Slowfast or CLIP+Slowfast+PANNs features are very slow without GPUs.\n\n## Gradio demo\nRun `python gradio_demo/demo.py`. Upload the video and input text query, and click the blue button. For AMR demo, run `python gradio_demo/amr_demo.py`. \n\nMR-HD demo\n![Gradio demo image](images/vmr_demo.png)\n\nAMR demo\n![Amr demo image](images/amr_demo.png)\n\n## Supported models, datasets, and features\n### Models\nMoment retrieval \u0026 highlight detection\n- [x] : [Moment-DETR (Lei et al. NeurIPS21)](https://arxiv.org/abs/2107.09609)\n- [x] : [QD-DETR (Moon et al. CVPR23)](https://arxiv.org/abs/2303.13874)\n- [x] : [EaTR (Jang et al. ICCV23)](https://arxiv.org/abs/2308.06947)\n- [x] : [CG-DETR (Moon et al. arXiv24)](https://arxiv.org/abs/2311.08835)\n- [x] : [UVCOM (Xiao et al. CVPR24)](https://arxiv.org/abs/2311.16464)\n- [x] : [TR-DETR (Sun et al. AAAI24)](https://arxiv.org/abs/2401.02309)\n- [x] : [TaskWeave (Jin et al. CVPR24)](https://arxiv.org/abs/2404.09263)\n- [ ] : [R2-Tuning (Liu et al. ECCV24)](https://arxiv.org/abs/2404.00801)\n\n### Datasets\nMoment retrieval \u0026 highlight detection\n- [x] : [QVHighlights (Lei et al. NeurIPS21)](https://arxiv.org/abs/2107.09609)\n- [x] : [QVHighlights w/ Audio Features (Lei et al. NeurIPS21)](https://arxiv.org/abs/2107.09609)\n- [x] : [QVHighlights ASR Pretraining (Lei et al. NeurIPS21)](https://arxiv.org/abs/2107.09609)\n\nMoment retrieval\n- [x] : [ActivityNet Captions (Krishna et al. ICCV17)](https://arxiv.org/abs/1705.00754)\n- [x] : [Charades-STA (Gao et al. ICCV17)](https://arxiv.org/abs/1705.02101)\n- [x] : [TaCoS (Regneri et al. TACL13)](https://aclanthology.org/Q13-1003/)\n\nHighlight detection\n- [x] : [TVSum (Song et al. CVPR15)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Song_TVSum_Summarizing_Web_2015_CVPR_paper.pdf)\n- [x] : [YouTube Highlights (Sun et al. ECCV14)](https://grail.cs.washington.edu/wp-content/uploads/2015/08/sun2014rdh.pdf)\n\nAudio moment retrieval\n- [x] : [Clotho Moment/TUT2017/UnAV100-subset (Munakata et al. arXiv24)](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/)\n\n### Features\n- [x] : ResNet+GloVe\n- [x] : CLIP\n- [x] : CLIP+Slowfast\n- [x] : CLIP+Slowfast+PANNs (Audio) for QVHighlights\n- [x] : I3D+CLIP (Text) for TVSum\n\n## Reproduce the experiments\n\n### Pre-trained weights\nPre-trained weights can be downloaded from [here](https://drive.google.com/file/d/1jxs_bvwttXTF9Lk3aKLohkqfYOonLyrO/view?usp=sharing).\nDownload and unzip on the home directory.\n\n### Datasets\nDue to the copyright issue, we here distribute only feature files.\nDownload and place them under `./features` directory.\nTo extract features from videos, we use [HERO_Video_Feature_Extractor](https://github.com/linjieli222/HERO_Video_Feature_Extractor).\n\n- [QVHighlights](https://drive.google.com/file/d/1-ALnsXkA4csKh71sRndMwybxEDqa-dM4/view?usp=sharing)\n- [Charades-STA](https://drive.google.com/file/d/1EOeP2A4IMYdotbTlTqDbv5VdvEAgQJl8/view?usp=sharing)\n- [ActivityNet Captions](https://drive.google.com/file/d/1P2xS998XfbN5nSDeJLBF1m9AaVhipBva/view?usp=sharing)\n- [TACoS](https://drive.google.com/file/d/1rYzme9JNAk3niH1K81wgT13pOMn005jb/view?usp=sharing)\n- [TVSum](https://drive.google.com/file/d/1gSex1hpXLxHQu6zHyyQISKZjP7Ndt6U9/view?usp=sharing)\n- [YouTube Highlight](https://drive.google.com/file/d/12swoymGwuN5TlDlWBTo6UUWVm2DqVBpn/view?usp=sharing)\n\nFor [AMR](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), download features from here.\n\n- [Clotho Moment/TUT2017/UnAV100-subset](https://zenodo.org/records/13806234)\n\nThe whole directory should be look like this:\n```\nlighthouse/\n├── api_example\n├── configs\n├── data\n├── features # Download the features and place them here\n│   ├── ActivityNet\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   ├── resnet\n│   │   └── slowfast\n│   ├── Charades\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   ├── resnet\n│   │   ├── slowfast\n│   ├── QVHighlight\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   ├── pann\n│   │   ├── resnet\n│   │   └── slowfast\n│   ├── tacos\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   ├── resnet\n│   │   └── slowfast\n│   ├── tvsum\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   ├── i3d\n│   │   ├── resnet\n│   │   ├── slowfast\n│   ├── youtube_highlight\n│   │   ├── clip\n│   │   ├── clip_text\n│   │   └── slowfast\n│   └── clotho-moments\n│       ├── clap\n│       └── clap_text\n├── gradio_demo\n├── images\n├── lighthouse\n├── results # The pre-trained weights are saved in this directory\n└── training\n```\n\n### Training and evaluation\n\n#### Training\nThe training command is:\n```\npython training/train.py --model MODEL --dataset DATASET --feature FEATURE [--resume RESUME] [--domain DOMAIN]\n```\n|         | Options                                                                                                  |\n|---------|----------------------------------------------------------------------------------------------------------|\n| Model   | moment_detr, qd_detr, eatr, cg_detr, uvcom, tr_detr, taskweave_mr2hd, taskweave_hd2mr                    |\n| Feature | resnet_glove, clip, clip_slowfast, clip_slowfast_pann, i3d_clip, clap                                    |\n| Dataset | qvhighlight, qvhighlight_pretrain, activitynet, charades, tacos, tvsum, youtube_highlight, clotho-moment |\n\n(**Example 1**) Moment DETR w/ CLIP+Slowfast on QVHighlights:\n```\npython training/train.py --model moment_detr --dataset qvhighlight --feature clip_slowfast\n```\n(**Example 2**) Moment DETR w/ CLIP+Slowfast+PANNs (Audio) on QVHighlights:\n```\npython training/train.py --model moment_detr --dataset qvhighlight --feature clip_slowfast_pann\n```\n(**Pre-train \u0026 Fine-tuning, QVHighlights only**) Lighthouse supports pre-training. Run:\n```\npython training/train.py --model moment_detr --dataset qvhighlight_pretrain --feature clip_slowfast\n```\nThen fine-tune the model with `--resume` option:\n```\npython training/train.py --model moment_detr --dataset qvhighlight --feature clip_slowfast --resume results/moment_detr/qvhighlight_pretrain/clip_slowfast/best.ckpt\n```\n(**TVSum and YouTube Highlight**) To train models on these two datasets, you need to specify domain:\n```\npython training/train.py --model moment_detr --dataset tvsum --feature clip_slowfast --domain BK\n```\n\n#### Evaluation\nThe evaluation command is:\n```\npython training/evaluate.py --model MODEL --dataset DATASET --feature FEATURE --split {val,test} --model_path MODEL_PATH --eval_path EVAL_PATH [--domain DOMAIN]\n```\n(**Example 1**) Evaluating Moment DETR w/ CLIP+Slowfast on the QVHighlights val set:\n```\npython training/evaluate.py --model moment_detr --dataset qvhighlight --feature clip_slowfast --split val --model_path results/moment_detr/qvhighlight/clip_slowfast/best.ckpt --eval_path data/qvhighlight/highlight_val_release.jsonl\n```\nTo generate submission files for QVHighlight test sets, change split into test (**QVHighlights only**):\n```\npython training/evaluate.py --model moment_detr --dataset qvhighlight --feature clip_slowfast --split test --model_path results/moment_detr/qvhighlight/clip_slowfast/best.ckpt --eval_path data/qvhighlight/highlight_test_release.jsonl\n```\nThen zip `hl_val_submission.jsonl` and `hl_test_submission.jsonl`, and submit it to the [Codalab](https://codalab.lisn.upsaclay.fr/competitions/6937) (**QVHighlights only**):\n```\nzip -r submission.zip val_submission.jsonl test_submission.jsonl\n```\n\n## HuggingFace Wrapper\nWe support [wrappers for HuggingFace](https://huggingface.co/lighthouse-emnlp2024).\nYou can easily use models and dataset via `AutoModel` and `huggingface_hub`.\n\nThe following models and datasets are provided by the wrapper for HuggingFace.\n### Models\n  - [Audio Moment DETR (Munakata et al. ICASSP2024)](https://huggingface.co/lighthouse-emnlp2024/AM-DETR)\n### Datasets\n  - [Clotho Moment (Munakata et al. ICASSP2024)](https://huggingface.co/datasets/lighthouse-emnlp2024/Clotho-Moment)\n\n## Citation\nLighthouse\n```bibtex\n@InProceedings{taichi2024emnlp,\n  author    = {Taichi Nishimura and Shota Nakada and Hokuto Munakata and Tatsuya Komatsu},\n  title     = {Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection},\n  booktitle = {Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},\n  year      = {2024},\n}\n```\nAudio moment retrieval\n```bibtex\n@InProceedings{hokuto2025icassp,\n  author    = {Hokuto Munakata and Taichi Nishimura and Shota Nakada and Tatsuya Komatsu},\n  title     = {Language-based Audio Moment Retrieval},\n  booktitle = {IEEE International Conference on Acoustic, Speech, and Signal Processing},\n  year      = {2025},\n}\n```\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n## LICENSE\nApache License 2.0\n\n## Contact\nTaichi Nishimura ([taichitary@gmail.com](taichitary@gmail.com))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fline%2Flighthouse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fline%2Flighthouse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fline%2Flighthouse/lists"}