{"id":21187727,"url":"https://github.com/huanglizi/lvit","last_synced_at":"2025-05-16T15:07:15.224Z","repository":{"id":41260843,"uuid":"468262507","full_name":"HUANGLIZI/LViT","owner":"HUANGLIZI","description":"[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of \"LViT: Language meets Vision Transformer in Medical Image Segmentation\"","archived":false,"fork":false,"pushed_at":"2025-03-10T03:40:35.000Z","size":94336,"stargazers_count":338,"open_issues_count":5,"forks_count":32,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T00:41:27.498Z","etag":null,"topics":["medical-image-analysis","multimodal-learning","pytorch","segmentation","vision-language"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HUANGLIZI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-10T08:49:09.000Z","updated_at":"2025-04-12T09:12:01.000Z","dependencies_parsed_at":"2023-02-08T14:31:27.422Z","dependency_job_id":"5df562dd-b10f-49fc-bf04-404c99d5748b","html_url":"https://github.com/HUANGLIZI/LViT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HUANGLIZI%2FLViT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HUANGLIZI%2FLViT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HUANGLIZI%2FLViT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HUANGLIZI%2FLViT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HUANGLIZI","download_url":"https://codeload.github.com/HUANGLIZI/LViT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254553959,"owners_count":22090417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["medical-image-analysis","multimodal-learning","pytorch","segmentation","vision-language"],"created_at":"2024-11-20T18:40:06.247Z","updated_at":"2025-05-16T15:07:10.215Z","avatar_url":"https://github.com/HUANGLIZI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LViT\n\n\nThis repo is the official implementation of \"**LViT: Language meets Vision Transformer in Medical Image Segmentation**\" \n[Arxiv](https://arxiv.org/abs/2206.14718), [ResearchGate](https://www.researchgate.net/publication/371833348_LViT_Language_meets_Vision_Transformer_in_Medical_Image_Segmentation), [IEEEXplore](https://ieeexplore.ieee.org/document/10172039)\n\n![image](https://github.com/HUANGLIZI/LViT/blob/main/IMG/LViT.png)\n\n## Requirements\n\nPython == 3.7 and install from the ```requirements.txt``` using:\n```angular2html\npip install -r requirements.txt\n```\nQuestions about NumPy version conflict. The NumPy version we use is 1.17.5. We can install bert-embedding first, and install NumPy then.\n\n## Usage\n\n### 1. Data Preparation\n#### 1.1. QaTa-COV19, MosMedData+ and MoNuSeg Datasets (demo dataset)\nThe original data can be downloaded in following links:\n* QaTa-COV19 Dataset - [Link (Original)](https://www.kaggle.com/datasets/aysendegerli/qatacov19-dataset)\n\n* MosMedData+ Dataset - [Link (Original)](http://medicalsegmentation.com/covid19/) or [Kaggle](https://www.kaggle.com/datasets/maedemaftouni/covid19-ct-scan-lesion-segmentation-dataset)\n\n* MoNuSeG Dataset (demo dataset) - [Link (Original)](https://monuseg.grand-challenge.org/Data/)\n\n* ESO-CT Dataset [1] [2]\n\n[1] Jin, Dakai, et al. \"DeepTarget: Gross tumor and clinical target volume segmentation in esophageal cancer radiotherapy.\" Medical Image Analysis 68 (2021): 101909.\n\n[2] Ye, Xianghua, et al. \"Multi-institutional validation of two-streamed deep learning method for automated delineation of esophageal gross tumor volume using planning CT and FDG-PET/CT.\" Frontiers in Oncology 11 (2022): 785788.\n\nThe text annotation of QaTa-COV19 has been released!\n\n  *(Note: The text annotation of QaTa-COV19 train and val datasets [download link](https://1drv.ms/x/s!AihndoV8PhTDkm5jsTw5dX_RpuRr?e=uaZq6W).\n  The partition of train set and val set of QaTa-COV19 dataset [download link](https://1drv.ms/f/c/c3143e7c85766728/QihndoV8PhQggMO2rwAAAAAADo5kj33mUee33g).\n  The text annotation of QaTa-COV19 test dataset [download link](https://1drv.ms/x/s!AihndoV8PhTDkj1vvvLt2jDCHqiM?e=954uDF).)*\n\n  ***(Note: The contrastive label is available in the repo.)***\n  \n***(Note: The text annotation of MosMedData+ train dataset [download link](https://1drv.ms/x/s!AihndoV8PhTDguIIKCRfYB9Z0NL8Dw?e=8rj6rY).\nThe text annotation of MosMedData+ val dataset [download link](https://1drv.ms/x/c/c3143e7c85766728/QShndoV8PhQggMMGsQAAAAAAtAgZiRQFYfsAjw).\nThe text annotation of MosMedData+ test dataset [download link](https://1drv.ms/x/c/c3143e7c85766728/QShndoV8PhQggMMHsQAAAAAAdHkwXMxGlgU9Tg).)***\n  \n  *If you use the datasets provided by us, please cite the LViT.*\n\n#### 1.2. Format Preparation\n\nThen prepare the datasets in the following format for easy use of the code:\n\n```angular2html\n├── datasets\n    ├── QaTa-Covid19\n    │   ├── Test_Folder\n    |   |   ├── Test_text.xlsx\n    │   │   ├── img\n    │   │   └── labelcol\n    │   ├── Train_Folder\n    |   |   ├── Train_text.xlsx\n    │   │   ├── img\n    │   │   └── labelcol\n    │   └── Val_Folder\n    |\t    ├── Val_text.xlsx\n    │       ├── img\n    │       └── labelcol\n    └── MosMedDataPlus\n        ├── Test_Folder\n        |   ├── Test_text.xlsx\n        │   ├── img\n        │   └── labelcol\n        ├── Train_Folder\n        |   ├── Train_text.xlsx\n        │   ├── img\n        │   └── labelcol\n        └── Val_Folder\n            ├── Val_text.xlsx\n            ├── img\n            └── labelcol\n```\n\n\n\n### 2. Training\n\n#### 2.1. Pre-training\nYou can replace LVIT with U-Net for pre training and run:\n```angular2html\npython train_model.py\n```\n\n#### 2.2. Training\n\nYou can train to get your own model. It should be noted that using the pre-trained model in the step 2.1 will get better performance or you can simply change the model_name from LViT to LViT_pretrain in config.\n\n```angular2html\npython train_model.py\n```\n\n\n\n\n### 3. Evaluation\n\n#### 3.1. Test the Model and Visualize the Segmentation Results\nFirst, change the session name in ```Config.py``` as the training phase. Then run:\n```angular2html\npython test_model.py\n```\nYou can get the Dice and IoU scores and the visualization results. \n\n\n\n### 4. Results\n\n| Dataset    | \t   Model Name \t   | Dice (%) | IoU (%) |\n| ---------- | ------------------- | -------- | ------- |\n| QaTa-COV19 | U-Net      \t       | 79.02    | 69.46   |\n| QaTa-COV19 | LViT-T     \t       | 83.66    | 75.11   |\n| MosMedData+ | U-Net      \t       | 64.60    |  50.73   |\n| MosMedData+ | LViT-T     \t       | 74.57    |  61.33   |\n| MoNuSeg    | U-Net      \t       | 76.45    | 62.86   |\n| MoNuSeg    | LViT-T     \t       | 80.36    | 67.31   |\n| MoNuSeg    | LViT-T w/o pretrain | 79.98    | 66.83   |\n\n#### 4.1. More Results on other datasets\n\n| Dataset    | \t   Model Name \t   | Dice (%) | IoU (%) |\n| ---------- | ------------------- | -------- | ------- |\n| [BKAI-Poly](https://www.kaggle.com/competitions/bkai-igh-neopolyp/data)       | LViT-TW    \t       | 92.07  | 80.93    |\n| ESO-CT | LViT-TW    \t       | 68.27    | 57.02    |\n\n\n### 5. Reproducibility\n\nIn our code, we carefully set the random seed and set cudnn as 'deterministic' mode to eliminate the randomness. However, there still exsist some factors which may cause different training results, e.g., the cuda version, GPU types, the number of GPUs and etc. The GPU used in our experiments is 2-card NVIDIA V100 (32G) and the cuda version is 11.2. And the upsampling operation has big problems with randomness for multi-GPU cases.\nSee https://pytorch.org/docs/stable/notes/randomness.html for more details.\n\n\n\n## Reference\n\n\n* [TransUNet](https://github.com/Beckschen/TransUNet) \n* [MedT](https://github.com/jeya-maria-jose/Medical-Transformer)\n* [UCTransNet](https://github.com/McGregorWwww/UCTransNet)\n\n\n## Citation\n\n```bash\n@article{li2023lvit,\n  title={Lvit: language meets vision transformer in medical image segmentation},\n  author={Li, Zihan and Li, Yunxiang and Li, Qingde and Wang, Puyang and Guo, Dazhou and Lu, Le and Jin, Dakai and Zhang, You and Hong, Qingqi},\n  journal={IEEE Transactions on Medical Imaging},\n  year={2023},\n  publisher={IEEE}\n}\n```\n\n[![Stargazers repo roster for @HUANGLIZI/LViT](https://reporoster.com/stars/HUANGLIZI/LViT)](https://github.com/HUANGLIZI/LViT/stargazers)\n\n[![Forkers repo roster for @HUANGLIZI/LViT](https://reporoster.com/forks/HUANGLIZI/LViT)](https://github.com/HUANGLIZI/LViT/network/members)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuanglizi%2Flvit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuanglizi%2Flvit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuanglizi%2Flvit/lists"}