{"id":13430485,"url":"https://github.com/med-air/Endo-FM","last_synced_at":"2025-03-16T05:31:22.421Z","repository":{"id":173251051,"uuid":"649606676","full_name":"med-air/Endo-FM","owner":"med-air","description":"[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train","archived":false,"fork":false,"pushed_at":"2024-03-27T16:28:27.000Z","size":40083,"stargazers_count":158,"open_issues_count":8,"forks_count":15,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-28T09:58:36.035Z","etag":null,"topics":["endoscopy","foundation-model","large-scale","miccai2023","pre-train","self-supervised","video"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/med-air.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-06-05T08:45:47.000Z","updated_at":"2024-10-26T14:26:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"653417f6-8c5b-4639-ab5a-39ac9d67b49d","html_url":"https://github.com/med-air/Endo-FM","commit_stats":null,"previous_names":["med-air/endo-fm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/med-air%2FEndo-FM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/med-air%2FEndo-FM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/med-air%2FEndo-FM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/med-air%2FEndo-FM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/med-air","download_url":"https://codeload.github.com/med-air/Endo-FM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243830912,"owners_count":20354848,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["endoscopy","foundation-model","large-scale","miccai2023","pre-train","self-supervised","video"],"created_at":"2024-07-31T02:00:54.486Z","updated_at":"2025-03-16T05:31:17.399Z","avatar_url":"https://github.com/med-air.png","language":"Python","funding_links":[],"categories":["3 Reasoning Tasks","Large-scale Datasets in Biomedical and Health Informatics"],"sub_categories":["3.9 Other Tasks and Applications","Open Source"],"readme":"# Foundation Model for Endoscopy Video Analysis\n\u003c!-- select Model and/or Data and/or Code as needed\u003e\n### Welcome to OpenMEDLab! 👋\n\n\u003c!--\n**Here are some ideas to get you started:**\n🙋‍♀️ A short introduction - what is your organization all about?\n🌈 Contribution guidelines - how can the community get involved?\n👩‍💻 Useful resources - where can the community find your docs? Is there anything else the community should know?\n🍿 Fun facts - what does your team eat for breakfast?\n🧙 Remember, you can do mighty things with the power of [Markdown](https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax)\n--\u003e\n\n\n\u003c!-- Insert the project banner here --\u003e\n\n[//]: # (\u003cdiv align=\"center\"\u003e)\n\n[//]: # (    \u003ca href=\"https://\"\u003e\u003cimg width=\"1000px\" height=\"auto\" src=\"https://github.com/openmedlab/sampleProject/blob/main/banner_sample.png\"\u003e\u003c/a\u003e)\n\n[//]: # (\u003c/div\u003e)\n\n[//]: # (---)\n\n\u003c!-- Select some of the point info, feel free to delete --\u003e\n\n[//]: # ([![Twitter]\u0026#40;https://img.shields.io/twitter/url?style=social\u0026url=https%3A%2F%2Ftwitter.com%2Fopendilab\u0026#41;]\u0026#40;https://twitter.com/opendilab\u0026#41;)\n\n[//]: # ([![PyPI]\u0026#40;https://img.shields.io/pypi/v/DI-engine\u0026#41;]\u0026#40;https://pypi.org/project/DI-engine/\u0026#41;)\n\n[//]: # (![Conda]\u0026#40;https://anaconda.org/opendilab/di-engine/badges/version.svg\u0026#41;)\n\n[//]: # (![Conda update]\u0026#40;https://anaconda.org/opendilab/di-engine/badges/latest_release_date.svg\u0026#41;)\n\n[//]: # (![PyPI - Python Version]\u0026#40;https://img.shields.io/pypi/pyversions/DI-engine\u0026#41;)\n\n[//]: # (![PyTorch Version]\u0026#40;https://img.shields.io/badge/dynamic/json?color=blue\u0026label=pytorch\u0026query=%24.pytorchVersion\u0026url=https%3A%2F%2Fgist.githubusercontent.com/PaParaZz1/54c5c44eeb94734e276b2ed5770eba8d/raw/85b94a54933a9369f8843cc2cea3546152a75661/badges.json\u0026#41;)\n\n[//]: # ()\n[//]: # ()\n[//]: # (![Loc]\u0026#40;https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/loc.json\u0026#41;)\n\n[//]: # (![Comments]\u0026#40;https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/comments.json\u0026#41;)\n\n[//]: # ()\n[//]: # (![Style]\u0026#40;https://github.com/opendilab/DI-engine/actions/workflows/style.yml/badge.svg\u0026#41;)\n\n[//]: # (![Docs]\u0026#40;https://github.com/opendilab/DI-engine/actions/workflows/doc.yml/badge.svg\u0026#41;)\n\n[//]: # (![Unittest]\u0026#40;https://github.com/opendilab/DI-engine/actions/workflows/unit_test.yml/badge.svg\u0026#41;)\n\n[//]: # (![Algotest]\u0026#40;https://github.com/opendilab/DI-engine/actions/workflows/algo_test.yml/badge.svg\u0026#41;)\n\n[//]: # (![deploy]\u0026#40;https://github.com/opendilab/DI-engine/actions/workflows/deploy.yml/badge.svg\u0026#41;)\n\n[//]: # ([![codecov]\u0026#40;https://codecov.io/gh/opendilab/DI-engine/branch/main/graph/badge.svg?token=B0Q15JI301\u0026#41;]\u0026#40;https://codecov.io/gh/opendilab/DI-engine\u0026#41;)\n\n[//]: # ()\n[//]: # (![GitHub Org's stars]\u0026#40;https://img.shields.io/github/stars/opendilab\u0026#41;)\n\n[//]: # ([![GitHub stars]\u0026#40;https://img.shields.io/github/stars/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/Med-AIR/Endo-FM/stargazers\u0026#41;)\n\n[//]: # ([![GitHub forks]\u0026#40;https://img.shields.io/github/forks/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/Med-AIR/Endo-FM/network\u0026#41;)\n\n[//]: # (![GitHub commit activity]\u0026#40;https://img.shields.io/github/commit-activity/m/opendilab/DI-engine\u0026#41;)\n\n[//]: # ([![GitHub issues]\u0026#40;https://img.shields.io/github/issues/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/opendilab/DI-engine/issues\u0026#41;)\n\n[//]: # ([![GitHub pulls]\u0026#40;https://img.shields.io/github/issues-pr/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/opendilab/DI-engine/pulls\u0026#41;)\n\n[//]: # ([![Contributors]\u0026#40;https://img.shields.io/github/contributors/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/opendilab/DI-engine/graphs/contributors\u0026#41;)\n\n[//]: # ([![GitHub license]\u0026#40;https://img.shields.io/github/license/opendilab/DI-engine\u0026#41;]\u0026#40;https://github.com/Med-AIR/Endo-FM/blob/master/LICENSE\u0026#41;)\n\n[//]: # (Updated on 2023.06.09)\n\n\n\nThis repository provides the official PyTorch implementation of the paper [**Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train**](https://arxiv.org/abs/2306.16741)\nby [Zhao Wang](https://kyfafyd.wang)\\*, [Chang Liu](https://scholar.google.com/citations?user=q2JSP3kAAAAJ)\\*, [Shaoting Zhang](http://www.qingyuan.sjtu.edu.cn/a/Shaoting-Zhang.html)†, and [Qi Dou](http://www.cse.cuhk.edu.hk/~qdou)†.\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://\"\u003e\u003cimg width=\"800px\" height=\"auto\" src=\"assets/framework.png\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n## Key Features\n\n\n[//]: # (key feature bulletin points here)\n- First foundation model for endoscopy video analysis.\n- A large-scale endoscopic video dataset with over 33K video clips.\n- Support 3 types of downstream tasks, including classification, segmentation, and detection.\n\n## Links\n\n- [Paper](https://arxiv.org/abs/2306.16741)\n- [Model](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EZh5mWE5CL1BpaJ1bXuokfYBDM2VaMknqG7YpaQBRgAvdQ?e=e2rVYW)\n- [OpenMEDLab Page](https://github.com/openmedlab/Endo-FM) \n\u003c!-- [Code] may link to your project at your institute\u003e\n\n\n\u003c!-- give a introduction of your project --\u003e\n\n## Details\n\n\u003e Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation. To date, a foundation model for endoscopic video analysis is still lacking. In this paper, we propose Endo-FM, a foundation model specifically developed using massive endoscopic video data. First, we build a video transformer, which captures both local and global long-range dependencies across spatial and temporal dimensions. Second, we pre-train our transformer model using global and local views via a self-supervised manner, aiming to make it robust to spatial-temporal variations and discriminative across different scenes. To develop the foundation model, we construct a large-scale endoscopy video dataset by combining 9 publicly available datasets and a privately collected dataset from Baoshan Branch of Renji Hospital in Shanghai, China. Our dataset overall consists of over 33K video clips with up to 5 million frames, encompassing various protocols, target organs, and disease types. Our pre-trained Endo-FM can be easily adopted for a given downtream task via fine-tuning by serving as the backbone. With experiments on 3 different types of downstream tasks, including classification, segmentation, and detection, our Endo-FM surpasses the current state-of-the-art self-supervised pre-training and adapter-based transfer learning methods by a significant margin.\n\n\u003c!-- Insert a pipeline of your algorithm here if got one --\u003e\n\n\n[//]: # (More intro text here.)\n\n\n## Datasets\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://\"\u003e\u003cimg width=\"800px\" height=\"auto\" src=\"assets/dataset_details.png\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://\"\u003e\u003cimg width=\"800px\" height=\"auto\" src=\"assets/dataset_visualization.png\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nWe utilize 6 public and 1 private datasets for pre-training and 3 datasets as the downstream tasks.\nExcept for SUN \u0026 SUN-SEG, we provide our preprocessed data for pre-training and downstream tasks.\n\n#### Pre-training Data (6 public + 1 private) \n- Colonoscopic [[original paper]](https://ieeexplore.ieee.org/abstract/document/7442848) [[original dataset]](http://www.depeca.uah.es/colonoscopy_dataset/)  [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/ES_hCHb2XWFJgsK4hrKUnNUBx3fl6QI3yyk9ImP4AkkRVw?e=LC4DU5)\n- SUN \u0026 SUN-SEG [[original paper1]](https://www.sciencedirect.com/science/article/pii/S0016510720346551) [[original paper2]](https://link.springer.com/article/10.1007/s11633-022-1371-y) [[original dataset1]](http://amed8k.sundatabase.org/) [[original dataset2]](https://github.com/GewelsJI/VPS/blob/main/docs/DATA_PREPARATION.md)\n- LPPolypVideo [[original paper]](https://link.springer.com/chapter/10.1007/978-3-030-87240-3_37) [[original dataset]](https://github.com/dashishi/LDPolypVideo-Benchmark) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/ERTYntGNWfZKj8FVjzsK0QEB6W6KoiuiP89Y3on1PJBAmg?e=P24jjG)\n- Hyper-Kvasir [[original paper]](https://www.nature.com/articles/s41597-020-00622-y) [[original dataset]](https://datasets.simula.no/hyper-kvasir/) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EeHnnUmGbmBGlw7UlNVvw2wBzBMzKi8Sus5LrdwrQi-XUA?e=gWr5qH)\n- Kvasir-Capsule [[original paper]](https://www.nature.com/articles/s41597-021-00920-z) [[original dataset]](https://datasets.simula.no/kvasir-capsule/) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EQhyk3_yz5pAtdpKVFU93S0BfPfTNpblPFXTHaW-BIjV-Q?e=9duP5z)\n- CholecTriplet [[original paper]](https://www.sciencedirect.com/science/article/pii/S1361841522000846) [[original dataset]](https://cholectriplet2021.grand-challenge.org/) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/Ea6g5KpHaJNLvYFqoZpHeroBS801guoB16X18F4GfEG4pw?e=SWHoyQ)\n- Our Private [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EZ2Vs0zU-L1Go8RITgs42b4BjlWy6UtGXh6AHmBGD_gGFw?e=SRiD7m)\n\n#### Downstream Data (3 public)\n- PolypDiag [[original paper]](https://link.springer.com/chapter/10.1007/978-3-031-16437-8_9) [[original dataset]](https://github.com/tianyu0207/weakly-polyp) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/Ed_RCZ86IktKkGNNL5qX9IsBvNa7wcyM8q4yBQBkzaBj8g?e=pvuZVt)\n- CVC-12k [[original paper]](https://www.sciencedirect.com/science/article/pii/S0895611115000567) [[original dataset]](https://polyp.grand-challenge.org/Databases/) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EQzj78YsrVZAtbNVHW7WPEEBX1AeolLI7gmBkg-iEg1lQg?e=0gQPzy)\n- KUMC [[original paper]](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0255809) [[original dataset]](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR) [[our preprocessed dataset]](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EQHKl1-MgA5Ams_sQ4_ssg8BFyd66qucAxUTEHz4lHxE7g?e=fFtXzd)\n\n\nFor SUN \u0026 SUN-SEG, you need first request the original videos following [this instruction](https://github.com/GewelsJI/VPS/blob/main/docs/DATA_PREPARATION.md).\nThen, you can transfer the data for pre-training videos by the following:\n```bash\ncd Endo-FM/data\npython sun.py\npython sun_seg.py\npython trans_videos_pretrain.py\n```\nFinally, generating the video list `pretrain/train.csv` for pre-training by the following:\n```bash\ncd Endo-FM/data\npython gencsv.py\n```\n\n\n## Get Started\n\n#### Main Requirements\n- torch==1.8.0\n- torchvision==0.9.0\n- pillow==6.2.2\n- timm==0.4.12\n\n#### Installation\nWe suggest using Anaconda to setup environment on Linux, if you have installed anaconda, you can skip this step.\n\n```shell\nwget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh \u0026\u0026 zsh Anaconda3-2020.11-Linux-x86_64.sh\n```\n\nThen, we can install packages using provided `environment.yaml`.\n\n```shell\ncd Endo-FM\nconda env create -f environment.yaml\nconda activate endofm\n```\n\n#### Pre-trained Weights\nYou can directly download our pre-trained Endo-FM via this [link](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EZh5mWE5CL1BpaJ1bXuokfYBDM2VaMknqG7YpaQBRgAvdQ?e=e2rVYW) and put it under `checkpoints/` for downstream fine-tuning.\n\n#### Downstream Fine-tuned Weights\nAlso, we provide the pre-trained weights of 3 downstream tasks for direct downstream testing.\n\n|    Dataset    | PolypDiag | CVC-12k | KUMC | \n|:--------------:|:----:|:----:|:-----:|\n|    Our Paper   | 90.7 | 73.9 | 84.1 |\n| Released Model | 91.5 | 76.6 | 84.0 |\n| Weights | [link](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/ERSlUP10MGpBuhg1uN5iaHABKqz1SPQSrr03j4sEWey-bw?e=muv8RL) | [link](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EePnpTllUCFEqpYp6BFPv0sBQyST4CV4jQ8pvaRynCkD7Q?e=f7LeBx) | [link](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155167044_link_cuhk_edu_hk/EYPkwbFyMfxEirezWtumAGIBSCTQ0EvDN4u99KKiRsaVBA?e=DsrkVG) |\n\n\u003c!-- [//]: # (#### Preprocess) --\u003e\n\n\n#### Pre-training\n```shell\ncd Endo-FM\nwget -P checkpoints/ https://github.com/kahnchana/svt/releases/download/v1.0/kinetics400_vitb_ssl.pth\nbash scripts/train_clips32k.sh\n```\n\n#### Downstream Fine-tuning\n```shell\n# PolypDiag (Classification)\ncd Endo-FM\nbash scripts/eval_finetune_polypdiag.sh\n\n# CVC (Segmentation)\ncd Endo-FM/TransUNet\npython train.py\n\n# KUMC (Detection)\ncd Endo-FM/STMT\npython setup.py build develop\npython -m torch.distributed.launch \\\n    --nproc_per_node=1 \\\n    tools/train_net.py \\\n    --master_port=$((RANDOM + 10000)) \\\n    --config-file configs/STFT/kumc_R_50_STFT.yaml \\\n    OUTPUT_DIR log_dir/kumc_finetune\n```\n\n#### Direct Downstream Testing\n```shell\n# PolypDiag (Classification)\ncd Endo-FM\nbash scripts/test_finetune_polypdiag.sh\n\n# CVC (Segmentation)\ncd Endo-FM/TransUNet\npython train.py --test\n\n# KUMC (Detection)\ncd Endo-FM/STMT\npython setup.py build develop\npython -m torch.distributed.launch \\\n    --nproc_per_node=1 \\\n    tools/test_net.py \\\n    --master_port=$((RANDOM + 10000)) \\\n    --config-file configs/STFT/kumc_R_50_STFT.yaml \\\n    MODEL.WEIGHT kumc.pth \\\n    OUTPUT_DIR log_dir/kumc_finetune\n```\n\n## 🙋‍♀️ Feedback and Contact\n\nFor further questions, pls feel free to contact [Zhao Wang](mailto:zwang21@cse.cuhk.edu.hk).\n\n\n## 🛡️ License\n\nThis project is under the Apache License 2.0 license. See [LICENSE](LICENSE) for details.\n\n## 🙏 Acknowledgement\n\nOur code is based on [DINO](https://github.com/facebookresearch/dino), [TimeSformer](https://github.com/facebookresearch/TimeSformer), [SVT](https://github.com/kahnchana/svt), [TransUNet](https://github.com/Beckschen/TransUNet), and [STFT](https://github.com/lingyunwu14/STFT). Thanks them for releasing their codes.\n\n## 📝 Citation\n\nIf you find this code useful, please cite in your research papers.\n```\n@inproceedings{\n    wang2023foundation,\n    title={Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train},\n    author={Zhao Wang and Chang Liu and Shaoting Zhang and Qi Dou},\n    booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},\n    pages={101--111},\n    year={2023},\n    organization={Springer}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmed-air%2FEndo-FM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmed-air%2FEndo-FM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmed-air%2FEndo-FM/lists"}