{"id":18382355,"url":"https://github.com/li-plus/dsnet","last_synced_at":"2025-05-08T21:11:26.943Z","repository":{"id":42175382,"uuid":"315966455","full_name":"li-plus/DSNet","owner":"li-plus","description":"DSNet: A Flexible Detect-to-Summarize Network for Video Summarization","archived":false,"fork":false,"pushed_at":"2021-09-16T08:24:19.000Z","size":15912,"stargazers_count":216,"open_issues_count":13,"forks_count":51,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-08T21:11:23.371Z","etag":null,"topics":["computer-vision","detection","machine-learning","pytorch","video-summarization"],"latest_commit_sha":null,"homepage":"https://ieeexplore.ieee.org/document/9275314","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/li-plus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-25T14:39:04.000Z","updated_at":"2025-05-06T19:07:18.000Z","dependencies_parsed_at":"2022-08-12T08:41:02.019Z","dependency_job_id":null,"html_url":"https://github.com/li-plus/DSNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/li-plus%2FDSNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/li-plus%2FDSNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/li-plus%2FDSNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/li-plus%2FDSNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/li-plus","download_url":"https://codeload.github.com/li-plus/DSNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253149616,"owners_count":21861739,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","detection","machine-learning","pytorch","video-summarization"],"created_at":"2024-11-06T01:04:40.070Z","updated_at":"2025-05-08T21:11:26.925Z","avatar_url":"https://github.com/li-plus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DSNet: A Flexible Detect-to-Summarize Network for Video Summarization [[paper]](https://ieeexplore.ieee.org/document/9275314)\n\n[![UnitTest](https://github.com/li-plus/DSNet/workflows/UnitTest/badge.svg)](https://github.com/li-plus/DSNet/actions)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue)](https://github.com/li-plus/DSNet/blob/main/LICENSE)\n\n![framework](docs/framework.jpg)\n\nA PyTorch implementation of our paper [DSNet: A Flexible Detect-to-Summarize Network for Video Summarization](https://ieeexplore.ieee.org/document/9275314) by [Wencheng Zhu](https://woshiwencheng.github.io/), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/), [Jiahao Li](https://github.com/li-plus), and [Jie Zhou](http://www.au.tsinghua.edu.cn/info/1078/1635.htm). Published in [IEEE Transactions on Image Processing](https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=83).\n\n## Getting Started\n\nThis project is developed on Ubuntu 16.04 with CUDA 9.0.176.\n\nFirst, clone this project to your local environment.\n\n```sh\ngit clone https://github.com/li-plus/DSNet.git\n```\n\nCreate a virtual environment with python 3.6, preferably using [Anaconda](https://www.anaconda.com/).\n\n```sh\nconda create --name dsnet python=3.6\nconda activate dsnet\n```\n\nInstall python dependencies.\n\n```sh\npip install -r requirements.txt\n```\n\n## Datasets Preparation\n\nDownload the pre-processed datasets into `datasets/` folder, including [TVSum](https://github.com/yalesong/tvsum), [SumMe](https://gyglim.github.io/me/vsum/index.html), [OVP](https://sites.google.com/site/vsummsite/download), and [YouTube](https://sites.google.com/site/vsummsite/download) datasets.\n\n```sh\nmkdir -p datasets/ \u0026\u0026 cd datasets/\nwget https://www.dropbox.com/s/tdknvkpz1jp6iuz/dsnet_datasets.zip\nunzip dsnet_datasets.zip\n```\n\nIf the Dropbox link is unavailable to you, try downloading from below links.\n\n+ (Baidu Cloud) Link: https://pan.baidu.com/s/1LUK2aZzLvgNwbK07BUAQRQ Extraction Code: x09b\n+ (Google Drive) https://drive.google.com/file/d/11ulsvk1MZI7iDqymw9cfL7csAYS0cDYH/view?usp=sharing\n\nNow the datasets structure should look like\n\n```\nDSNet\n└── datasets/\n    ├── eccv16_dataset_ovp_google_pool5.h5\n    ├── eccv16_dataset_summe_google_pool5.h5\n    ├── eccv16_dataset_tvsum_google_pool5.h5\n    ├── eccv16_dataset_youtube_google_pool5.h5\n    └── readme.txt\n```\n\n## Pre-trained Models\n\nOur pre-trained models are now available online. You may download them for evaluation, or you may skip this section and train a new one from scratch.\n\n```sh\nmkdir -p models \u0026\u0026 cd models\n# anchor-based model\nwget https://www.dropbox.com/s/0jwn4c1ccjjysrz/pretrain_ab_basic.zip\nunzip pretrain_ab_basic.zip\n# anchor-free model\nwget https://www.dropbox.com/s/2hjngmb0f97nxj0/pretrain_af_basic.zip\nunzip pretrain_af_basic.zip\n```\n\nTo evaluate our pre-trained models, type:\n\n```sh\n# evaluate anchor-based model\npython evaluate.py anchor-based --model-dir ../models/pretrain_ab_basic/ --splits ../splits/tvsum.yml ../splits/summe.yml\n# evaluate anchor-free model\npython evaluate.py anchor-free --model-dir ../models/pretrain_af_basic/ --splits ../splits/tvsum.yml ../splits/summe.yml --nms-thresh 0.4\n```\n\nIf everything works fine, you will get similar F-score results as follows.\n\n|              | TVSum | SumMe |\n| ------------ | ----- | ----- |\n| Anchor-based | 62.05 | 50.19 |\n| Anchor-free  | 61.86 | 51.18 |\n\n## Training\n\n### Anchor-based\n\nTo train anchor-based attention model on TVSum and SumMe datasets with canonical settings, run\n\n```sh\npython train.py anchor-based --model-dir ../models/ab_basic --splits ../splits/tvsum.yml ../splits/summe.yml\n```\n\nTo train on augmented and transfer datasets, run\n\n```sh\npython train.py anchor-based --model-dir ../models/ab_tvsum_aug/ --splits ../splits/tvsum_aug.yml\npython train.py anchor-based --model-dir ../models/ab_summe_aug/ --splits ../splits/summe_aug.yml\npython train.py anchor-based --model-dir ../models/ab_tvsum_trans/ --splits ../splits/tvsum_trans.yml\npython train.py anchor-based --model-dir ../models/ab_summe_trans/ --splits ../splits/summe_trans.yml\n```\n\nTo train with LSTM, Bi-LSTM or GCN feature extractor, specify the `--base-model` argument as `lstm`, `bilstm`, or `gcn`. For example,\n\n```sh\npython train.py anchor-based --model-dir ../models/ab_basic --splits ../splits/tvsum.yml ../splits/summe.yml --base-model lstm\n```\n\n### Anchor-free\n\nMuch similar to anchor-based models, to train on canonical TVSum and SumMe, run\n\n```sh\npython train.py anchor-free --model-dir ../models/af_basic --splits ../splits/tvsum.yml ../splits/summe.yml --nms-thresh 0.4\n```\n\nNote that NMS threshold is set to 0.4 for anchor-free models.\n\n## Evaluation\n\nTo evaluate your anchor-based models, run\n\n```sh\npython evaluate.py anchor-based --model-dir ../models/ab_basic/ --splits ../splits/tvsum.yml ../splits/summe.yml\n```\n\nFor anchor-free models, remember to specify NMS threshold as 0.4.\n\n```sh\npython evaluate.py anchor-free --model-dir ../models/af_basic/ --splits ../splits/tvsum.yml ../splits/summe.yml --nms-thresh 0.4\n```\n\n## Generating Shots with KTS\n\nBased on the public datasets provided by [DR-DSN](https://github.com/KaiyangZhou/pytorch-vsumm-reinforce), we apply [KTS](https://github.com/pathak22/videoseg/tree/master/lib/kts) algorithm to generate video shots for OVP and YouTube datasets. Note that the pre-processed datasets already contain these video shots. To re-generate video shots, run\n\n```sh\npython make_shots.py --dataset ../datasets/eccv16_dataset_ovp_google_pool5.h5\npython make_shots.py --dataset ../datasets/eccv16_dataset_youtube_google_pool5.h5\n```\n\n## Using Custom Videos\n\n### Training \u0026 Validation\n\nWe provide scripts to pre-process custom video data, like the raw videos in `custom_data` folder.\n\nFirst, create an h5 dataset. Here `--video-dir` contains several MP4 videos, and `--label-dir` contains ground truth user summaries for each video. The user summary of a video is a UxN binary matrix, where U denotes the number of annotators and N denotes the number of frames in the original video.\n\n```sh\npython make_dataset.py --video-dir ../custom_data/videos --label-dir ../custom_data/labels \\\n  --save-path ../custom_data/custom_dataset.h5 --sample-rate 15\n```\n\nThen split the dataset into training and validation sets and generate a split file to index them.\n\n```sh\npython make_split.py --dataset ../custom_data/custom_dataset.h5 \\\n  --train-ratio 0.67 --save-path ../custom_data/custom.yml\n```\n\nNow you may train on your custom videos using the split file.\n\n```sh\npython train.py anchor-based --model-dir ../models/custom --splits ../custom_data/custom.yml\npython evaluate.py anchor-based --model-dir ../models/custom --splits ../custom_data/custom.yml\n```\n\n### Inference\n\nTo predict the summary of a raw video, use `infer.py`. For example, run\n\n```sh\npython infer.py anchor-based --ckpt-path ../models/custom/checkpoint/custom.yml.0.pt \\\n  --source ../custom_data/videos/EE-bNr36nyA.mp4 --save-path ./output.mp4\n```\n\n## Acknowledgments\n\nWe gratefully thank the below open-source repo, which greatly boost our research.\n\n+ Thank [KTS](https://github.com/pathak22/videoseg/tree/master/lib/kts) for the effective shot generation algorithm.\n+ Thank [DR-DSN](https://github.com/KaiyangZhou/pytorch-vsumm-reinforce) for the pre-processed public datasets.\n+ Thank [VASNet](https://github.com/ok1zjf/VASNet) for the training and evaluation pipeline.\n\n## Citation\n\nIf you find our codes or paper helpful, please consider citing.\n\n```\n@article{zhu2020dsnet,\n  title={DSNet: A Flexible Detect-to-Summarize Network for Video Summarization},\n  author={Zhu, Wencheng and Lu, Jiwen and Li, Jiahao and Zhou, Jie},\n  journal={IEEE Transactions on Image Processing},\n  volume={30},\n  pages={948--962},\n  year={2020}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fli-plus%2Fdsnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fli-plus%2Fdsnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fli-plus%2Fdsnet/lists"}