{"id":19529273,"url":"https://github.com/isl-org/lang-seg","last_synced_at":"2025-08-25T17:30:41.018Z","repository":{"id":40338214,"uuid":"426202254","full_name":"isl-org/lang-seg","owner":"isl-org","description":"Language-Driven Semantic Segmentation","archived":false,"fork":false,"pushed_at":"2024-07-05T20:22:26.000Z","size":9562,"stargazers_count":734,"open_issues_count":16,"forks_count":94,"subscribers_count":17,"default_branch":"main","last_synced_at":"2024-12-16T06:01:09.457Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/isl-org.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-09T11:27:30.000Z","updated_at":"2024-12-15T18:08:21.000Z","dependencies_parsed_at":"2024-11-18T02:10:49.240Z","dependency_job_id":null,"html_url":"https://github.com/isl-org/lang-seg","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isl-org%2Flang-seg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isl-org%2Flang-seg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isl-org%2Flang-seg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isl-org%2Flang-seg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/isl-org","download_url":"https://codeload.github.com/isl-org/lang-seg/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230925037,"owners_count":18301259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T01:23:14.532Z","updated_at":"2024-12-23T07:01:51.387Z","avatar_url":"https://github.com/isl-org.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PROJECT NOT UNDER ACTIVE MANAGEMENT\nThis project will no longer be maintained by Intel.  \nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \nIntel no longer accepts patches to this project.  \nIf you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n\n# Language-driven Semantic Segmentation (LSeg)\nThe repo contains official PyTorch Implementation of paper [Language-driven Semantic Segmentation](https://arxiv.org/abs/2201.03546). \n\nICLR 2022\n\n#### Authors: \n* [Boyi Li](https://sites.google.com/site/boyilics/home)\n* [Kilian Q. Weinberger](http://kilian.cs.cornell.edu/index.html)\n* [Serge Belongie](https://scholar.google.com/citations?user=ORr4XJYAAAAJ\u0026hl=zh-CN)\n* [Vladlen Koltun](http://vladlen.info/)\n* [Rene Ranftl](https://scholar.google.at/citations?user=cwKg158AAAAJ\u0026hl=de)\n\n\n### Overview\n\n\nWe present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., ''grass'' or 'building'') together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., ''cat'' and ''furry''). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided. \n\nPlease check our [Video Demo (4k)](https://www.youtube.com/watch?v=bmU75rsmv6s) to further showcase the capabilities of LSeg.\n\n## Usage\n### Installation\nOption 1: \n\n``` pip install -r requirements.txt ```\n\nOption 2: \n```\nconda install ipython\npip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2\npip install git+https://github.com/zhanghang1989/PyTorch-Encoding/\npip install pytorch-lightning==1.3.5\npip install opencv-python\npip install imageio\npip install ftfy regex tqdm\npip install git+https://github.com/openai/CLIP.git\npip install altair\npip install streamlit\npip install --upgrade protobuf\npip install timm\npip install tensorboardX\npip install matplotlib\npip install test-tube\npip install wandb\n```\n\n### Data Preparation\nBy default, for training, testing and demo, we use [ADE20k](https://groups.csail.mit.edu/vision/datasets/ADE20K/).\n\n```\npython prepare_ade20k.py\nunzip ../datasets/ADEChallengeData2016.zip\n```\n\nNote: for demo, if you want to use random inputs, you can ignore data loading and comment the code at [link](https://github.com/isl-org/lang-seg/blob/main/modules/lseg_module.py#L55). \n\n\n### 🌻 Try demo now\n\n#### Download Demo Model\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003ename\u003c/th\u003e\n      \u003cth\u003ebackbone\u003c/th\u003e\n      \u003cth\u003etext encoder\u003c/th\u003e\n      \u003cth\u003eurl\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n       \u003ctd\u003eModel for demo\u003c/td\u003e\n      \u003cth\u003eViT-L/16\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n#### 👉 Option 1: Running interactive app\nDownload the model for demo and put it under folder `checkpoints` as `checkpoints/demo_e200.ckpt`. \n\nThen ``` streamlit run lseg_app.py ```\n\n#### 👉 Option 2: Jupyter Notebook\nDownload the model for demo and put it under folder `checkpoints` as `checkpoints/demo_e200.ckpt`. \n\nThen follow [lseg_demo.ipynb](https://github.com/isl-org/lang-seg/blob/main/lseg_demo.ipynb) to play around with LSeg. Enjoy!\n\n\n\n### Training and Testing Example\nTraining: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32\n\n``` bash train.sh ```\n\nTesting: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32\n\n``` bash test.sh ```\n\n### Zero-shot Experiments\n#### Data Preparation\nPlease follow [HSNet](https://github.com/juhongm999/hsnet) and put all dataset in `data/Dataset_HSN`\n\n#### Pascal-5i\n``` \nfor fold in 0 1 2 3; do\npython -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset pascal \\\n--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \\\n--weights checkpoints/pascal_fold${fold}.ckpt \ndone\n```\n#### COCO-20i\n``` \nfor fold in 0 1 2 3; do\npython -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset coco \\\n--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \\\n--weights checkpoints/pascal_fold${fold}.ckpt \ndone\n```\n#### FSS\n``` \npython -u test_lseg_zs.py --backbone clip_vitl16_384 --module clipseg_DPT_test_v2 --dataset fss \\\n--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \\\n--weights checkpoints/fss_l16.ckpt \n```\n\n``` \npython -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset fss \\\n--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \\\n--weights checkpoints/fss_rn101.ckpt \n```\n\n#### Model Zoo\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n       \u003cth\u003edataset\u003c/th\u003e\n      \u003cth\u003efold\u003c/th\u003e\n      \u003cth\u003ebackbone\u003c/th\u003e\n      \u003cth\u003etext encoder\u003c/th\u003e\n      \u003cth\u003eperformance\u003c/th\u003e\n      \u003cth\u003eurl\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n       \u003cth\u003epascal\u003c/th\u003e\n       \u003ctd\u003e0\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e52.8\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1y4z4_yNGlZtn6osaeN4ZjMs6c0vr0F3m/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003epascal\u003c/th\u003e\n       \u003ctd\u003e1\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e53.8\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1UZzN8kWkH-G8v6P8xcBXEHRZlQKPRxrX/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003epascal\u003c/th\u003e\n       \u003ctd\u003e2\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e44.4\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1KCq1JphSMvj8X78bkWbdNIFm5zYzLTMX/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003epascal\u003c/th\u003e\n       \u003ctd\u003e3\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e38.5\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1A_fllOJqyBg0ZTJcm85Cn0NAcwQbXnhl/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003ecoco\u003c/th\u003e\n       \u003ctd\u003e0\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e22.1\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1nSYO3XtAv4mzWi4x-MFUfk04cpBKJT38/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003ecoco\u003c/th\u003e\n       \u003ctd\u003e1\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e25.1\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1w0vz3yjEi_ZLgECRrgtoLxEHugyJkrs5/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003ecoco\u003c/th\u003e\n       \u003ctd\u003e2\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e24.9\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1wmHtmJLdta18XuWQv6oX9llidCll_HrD/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003ecoco\u003c/th\u003e\n       \u003ctd\u003e3\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e21.5\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1dliBUSOog7taJxMmb9cdefKH4XOKChVJ/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003efss\u003c/th\u003e\n       \u003ctd\u003e-\u003c/td\u003e\n      \u003cth\u003eResNet101\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e84.7\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1UIj49Wp1mAopPub5M6O4WW-Z79VB1bhw/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n       \u003cth\u003efss\u003c/th\u003e\n       \u003ctd\u003e-\u003c/td\u003e\n      \u003cth\u003eViT-L/16\u003c/th\u003e\n      \u003cth\u003eCLIP ViT-B/32\u003c/th\u003e\n      \u003cth\u003e87.8\u003c/th\u003e\n      \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1Nplkc_JsHIS55d--K2vonOOC3HrppzYy/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nIf you find this repo useful, please cite:\n```\n@inproceedings{\nli2022languagedriven,\ntitle={Language-driven Semantic Segmentation},\nauthor={Boyi Li and Kilian Q Weinberger and Serge Belongie and Vladlen Koltun and Rene Ranftl},\nbooktitle={International Conference on Learning Representations},\nyear={2022},\nurl={https://openreview.net/forum?id=RriDjddCLN}\n}\n```\n\n## Acknowledgement\nThanks to the code base from [DPT](https://github.com/isl-org/DPT), [Pytorch_lightning](https://github.com/PyTorchLightning/pytorch-lightning), [CLIP](https://github.com/openai/CLIP), [Pytorch Encoding](https://github.com/zhanghang1989/PyTorch-Encoding), [Streamlit](https://streamlit.io/), [Wandb](https://wandb.ai/site)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisl-org%2Flang-seg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fisl-org%2Flang-seg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisl-org%2Flang-seg/lists"}