{"id":32929141,"url":"https://github.com/scholarchen20/ustxtseg","last_synced_at":"2026-05-15T08:01:44.473Z","repository":{"id":323339792,"uuid":"1092907567","full_name":"ScholarChen20/USTxtSeg","owner":"ScholarChen20","description":"Weakly-Supervised Medical Image Segmentation with Simple Text Cues","archived":false,"fork":false,"pushed_at":"2025-11-09T14:59:11.000Z","size":1499,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-09T16:22:49.105Z","etag":null,"topics":["image-processing","image-segmentation","medical","multimodal","python","pytorch","segment-anything","ultrasound-imaging"],"latest_commit_sha":null,"homepage":"https://github.com/ScholarChen20/USTxtSeg","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ScholarChen20.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-09T14:47:26.000Z","updated_at":"2025-11-09T15:00:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ScholarChen20/USTxtSeg","commit_stats":null,"previous_names":["scholarchen20/ustxtseg"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ScholarChen20/USTxtSeg","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScholarChen20%2FUSTxtSeg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScholarChen20%2FUSTxtSeg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScholarChen20%2FUSTxtSeg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScholarChen20%2FUSTxtSeg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ScholarChen20","download_url":"https://codeload.github.com/ScholarChen20/USTxtSeg/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScholarChen20%2FUSTxtSeg/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33058965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-15T02:00:06.351Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-processing","image-segmentation","medical","multimodal","python","pytorch","segment-anything","ultrasound-imaging"],"created_at":"2025-11-11T11:07:33.090Z","updated_at":"2026-05-15T08:01:44.466Z","avatar_url":"https://github.com/ScholarChen20.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[//]: # (# SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues)\n\n[//]: # (Paper : [arxiv]\u0026#40;https://arxiv.org/abs/2406.19364\u0026#41;,  has been acceptd by ***MICCAI2024✨***)\n\n[//]: # ()\n[//]: # (by Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen)\n\n\n\n## 🙋 Introduction\n\n[//]: # (Our contribution consists of two key components: an effective Textual-to-Visual Cue Converter that produces visual prompts from text prompts on medical images, and a text-guided segmentation model with Text-Vision Hybrid Attention that fuses text and image features. We evaluate our framework on two medical image segmentation tasks: colonic polyp segmentation and MRI brain tumor segmentation, and achieve consistent state-of-the-art performance.)\n\n[//]: # (\u003cimg src=images/frame.png width=700 /\u003e)\n\n[//]: # (\u003cimg src=images/attention.png width=700/\u003e)\n\n## 🚀 Updates\n\n[//]: # (* `[2024.07.07]` We are excited to release : ✅dataset and ✅TVCC code.)\n\n[//]: # (* `[2024.09.25]` We are excited to release : ✅TVHA code.)\n\n\n## 📖 Dataset Preparation\n* Dataset Download\n    1. Polyp Dataset: [PolypGen](https://www.synapse.org/#!Synapse:syn26376615/wiki/613312) (data_C1 - data_C6 is used), [others](https://github.com/DengPingFan/PraNet) (including CVC-300 (60 samples), CVC-ClinicDB (612 samples), CVC-ColonDB (380 samples), ETIS-LaribPolypDB (196 samples), Kvasir (100 samples), Kvasir-SEG (900 samples))\n    2. Brain Tumor Dataset: [kaggle_3m](https://www.kaggle.com/datasets/nikhilroxtomar/brain-tumor-segmentation)\n    3. Isic Dataset: [ISIC](https://challenge.isic-archive.com/data/#2019)\n*  For TVCC, to avoid handcrafted prompting cost, \u003cu\u003ewe use GPT-4 to generate a concise sentence within 20 words\u003c/u\u003e. Before training, you need to transform your dataset into **ODVG** format for precise alignment of regions and phrases. **coco** format label is also required for test and validation.\n    ```\n    python util/mask2odvg.py\n    python util/mask2coco.py\n    ```\n* For TVHA segmentation model, just use binary mask.\n\n## ⚡ Quick Start\n### 1. Environment\n\nClone the whole repository and install the dependencies.\n\n```\nconda create -n USTxtSeg python=3.11\nconda activate USTxtSeg\ngit clone https://github.com/xyx1024/USTxtSeg.git\npip install -r requirements.txt\n```\n\nsee [mmdet_get_started_中文](https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/docs/zh_cn/get_started.md) or [mmdet_get_started_english](https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/docs/en/get_started.md) to install mmdet. \n\n### 2. For TVCC \ndownload swin_tiny_patch4_window7_224.pth : https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth\n\ndownload grounding-dino checkpoints: \n```\nwget load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa\n```\nThen use config files to pretrain TVCC：support polyp dataset, brain tumor dataset, isic dataset.\n```\ncd TVCC/polyp_grounding_dino\n./tools/dist_train.sh TVCC/polyp_grounding_dino/config/GroundingDINO_Polyp_PhraseGrounding_config.py n # gpu num, change as you want\n```\nTVCC evaluation:\n```\n# 单卡\npython tools/test.py config_path ckpt_path\n\n# 4 卡\n./tools/dist_test.sh config_path ckpt_path 4\n```\nvisual cues visualize:\n```\npython tools/image_demo.py \n        image_path \\\n        config_path \\\n        --weights weight_path \\\n        --texts 'xxx'\n```\n### 3. Pseudo Masks Generation\nClick the links below to download the checkpoint for the corresponding model type.\n\ndefault or vit_h: [ViT-H SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth).\n\nvit_l: [ViT-L SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth).\n\nvit_b: [ViT-B SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth) \u0026 [SAM-Med2d](https://drive.google.com/file/d/1ARiB5RkSsWmAB_8mqWnwDF8ZKTtFwsjl/view?usp=drive_link).\n\nUse the checkpoint of SAM and TVCC to generate the pseudo masks.\n```\ncd TVCC/polyp_grounding_dino\npython TVCC_Sam.py\n```\n### 4. USTxtSeg with TVHA\nuse pseudo mask and text prompt to supervise model.\n```\npython train.py\npython test.py\n```\n\n## 🎯 Results\n**Comparison experiments and Ablation study:**\n\n[//]: # ()\n[//]: # (\u003cimg src=images/results.png width=700 /\u003e)\n\n[//]: # ()\n**Visualization**\n\n[//]: # ()\n[//]: # (\u003cimg src=images/visualization.png width=700 /\u003e)\n\n[//]: # ()\n## 🗓️ Ongoing\n- [x] paper release\n\n[//]: # (- [x] dataset release)\n\n[//]: # (- [x] TVCC pretrain and test code release)\n\n[//]: # (- [x] SimTxtSeg with TVHA model release.)\n\n[//]: # ()\n## 🎫 License\nThis project is released under the Apache 2.0 license.\n\n## 💘 Acknowledge\nmmdetection: https://github.com/open-mmlab/mmdetection/tree/main\n\nGroundingDINO: https://github.com/IDEA-Research/GroundingDINO\n\nSegment Anything: https://github.com/facebookresearch/segment-anything?tab=readme-ov-file\n\n[//]: # ()\n## ✒️ Citation\n\n[//]: # (If you find this repository useful, please consider citing this paper:)\n\n[//]: # (```)\n\n\n[//]: # (```)\n\n[//]: # (## 📬 Contact)\n\n[//]: # (If you have any question, please feel free to contact silver_iris@163.com.)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscholarchen20%2Fustxtseg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscholarchen20%2Fustxtseg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscholarchen20%2Fustxtseg/lists"}