{"id":13439869,"url":"https://github.com/baaivision/tokenize-anything","last_synced_at":"2025-05-15T17:02:16.751Z","repository":{"id":212578622,"uuid":"731657905","full_name":"baaivision/tokenize-anything","owner":"baaivision","description":"[ECCV 2024] Tokenize Anything via Prompting","archived":false,"fork":false,"pushed_at":"2024-12-11T06:36:49.000Z","size":7049,"stargazers_count":576,"open_issues_count":10,"forks_count":25,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-07T23:03:23.105Z","etag":null,"topics":["foundation-models","multimodal","promptable","representation-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/baaivision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-14T15:17:01.000Z","updated_at":"2025-04-07T06:25:17.000Z","dependencies_parsed_at":"2024-05-28T06:49:26.165Z","dependency_job_id":"d6d52a2f-f744-43e6-a0f0-2e65e51cb75d","html_url":"https://github.com/baaivision/tokenize-anything","commit_stats":null,"previous_names":["baaivision/tokenize-anything"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2Ftokenize-anything","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2Ftokenize-anything/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2Ftokenize-anything/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2Ftokenize-anything/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/baaivision","download_url":"https://codeload.github.com/baaivision/tokenize-anything/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384937,"owners_count":22062421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foundation-models","multimodal","promptable","representation-learning"],"created_at":"2024-07-31T03:01:17.785Z","updated_at":"2025-05-15T17:02:16.709Z","avatar_url":"https://github.com/baaivision.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Paper List"],"sub_categories":["Follow-up Papers"],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eTokenize Anything via Prompting\u003c/h1\u003e\n\n[Ting Pan](https://github.com/PhyscalX/)\u003csup\u003e1,2*\u003c/sup\u003e, \u0026nbsp; [Lulu Tang](https://scholar.google.com/citations?authuser=1\u0026user=o2fG4xUAAAAJ)\u003csup\u003e2*\u003c/sup\u003e, \u0026nbsp; [Xinlong Wang](https://www.xloong.wang/)\u003csup\u003e2¶\u003c/sup\u003e, \u0026nbsp; [Shiguang Shan](https://scholar.google.com/citations?user=Vkzd7MIAAAAJ\u0026hl=en)\u003csup\u003e1\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003e[ICT-CAS](http://english.ict.cas.cn/), \u0026nbsp; \u003csup\u003e2\u003c/sup\u003e[BAAI](https://www.baai.ac.cn/english.html)\u003cbr\u003e\n\u003csup\u003e*\u003c/sup\u003e Equal Contribution, \u003csup\u003e¶\u003c/sup\u003eProject Lead\n\n[[`Paper`](https://arxiv.org/pdf/2312.09128.pdf)] [[`🤗 Demo`](https://huggingface.co/spaces/BAAI/tokenize-anything)]\n\u003cbr\u003e\u003cbr\u003e\u003cimage src=\"assets/model_overview.png\"/\u003e\n\n\u003c/div\u003e\n\nWe present **T**okenize **A**nything via **P**rompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.\n\n## Installation\n\n### Preliminaries\n\n``torch`` \u003e= 2.1\n\n``flash-attn`` \u003e= 2.3.3 (for TextGeneration)\n\n``gradio-image-prompter`` (for GradioApp, Install from [URL](https://github.com/PhyscalX/gradio-image-prompter))\n\n### Installing Package\n\nClone this repository to local disk and install:\n\n```bash\ncd tokenize-anything \u0026\u0026 pip install .\n```\n\nYou can also install from the remote repository: \n\n```bash\npip install git+ssh://git@github.com/baaivision/tokenize-anything.git\n```\n\n## Quick Start\n\n### Development\n\nThe **TAP** models can be used for diverse vision and language tasks. \n\nWe adopt a modular design that decouples all components and predictors.\n\nAs a best practice, implement your custom predictor and asynchronous pipeline as follows:\n\n```python\nfrom tokenize_anything import model_registry\n\nwith \u003cdistributed_actor\u003e:\n    model = model_registry[\"\u003cmodel_type\u003e\"](checkpoint=\"\u003cpath/to/checkpoint\u003e\")\n    results = \u003ccustom_predictor\u003e(model, *args, **kwargs)\n\nserver.collect_results()\n```\n\nSee builtin examples (web-demo and evaluations) provided in [scripts](scripts/) for more details.\n\n### Inference\n\nSee [Inference Guide](notebooks/inference.ipynb).\n\nSee [Concept Guide](notebooks/concept.ipynb).\n\n### Evaluation\n\nSee [Evaluation Guide for TAP-H](notebooks/evaluation_tap_vit_h_v1_1.ipynb).\n\nSee [Evaluation Guide for TAP-L](notebooks/evaluation_tap_vit_l_v1_1.ipynb).\n\nSee [Evaluation Guide for TAP-B](notebooks/evaluation_tap_vit_b_v1_1.ipynb).\n\n## Models\n\n### Model weights\n\n#### V1.1 Release Notes\n\n- Three versions of the model are available with different image encoders.\n- Use a longer pre-training and fine-tuning schedule (improved segmentation and caption performance).\n- Apply weight decay for all bias parameters (avoid FP16 overflow in QK matmul).\n- Sample point prompts from predicted mask instead of GT box during VG training.\n\n| Model | Description | Schedule | MD5 | Weights |\n| ----- | ------------| ------ | ----| ------ |\n| **tap_vit_h** | ViT-H TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | 4bdfb9 | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/models/tap_vit_h_v1_1.pkl) |\n| **tap_vit_l** | ViT-L TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | c1d41f | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/models/tap_vit_l_v1_1.pkl) |\n| **tap_vit_b** | ViT-B TAP v1.1 model | (100% SA-1B, 180k), (VG, 50ep) | 707f80 | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/models/tap_vit_b_v1_1.pkl) |\n\n#### V1.0 Release Notes\n\n- Two versions of the model are available with different image encoders.\n- Original paper results.\n\n| Model | Description | Schedule | MD5 | Weights |\n| ----- | ------------| ------ | ----| ------ |\n| **tap_vit_l** | ViT-L TAP v1.0 model | (50% SA-1B, 90k), (VG, 25ep) | 03f8ec | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/models/tap_vit_l_v1_0.pkl) |\n| **tap_vit_b** | ViT-B TAP v1.0 model | (50% SA-1B, 90k), (VG, 25ep) | b45cbf | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/models/tap_vit_b_v1_0.pkl) |\n\n### Concept weights\n\n***Note***: You can generate these weights following the [Concept Guide](notebooks/concept.ipynb).\n\n| Concept | Description | Weights |\n| ------- | ------------| ------ |\n| **Merged-2560** | Merged concepts | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/concepts/merged_2560.pkl) |\n| **LVIS-1203**   | LVIS concepts | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/concepts/lvis_1203.pkl) |\n| **COCO-80**   | COCO concepts  | [🤗 HF link](https://huggingface.co/BAAI/tokenize-anything/blob/main/concepts/coco_80.pkl) |\n\n## License\n[Apache License 2.0](LICENSE)\n\n## Citation\n\n```\n@article{pan2023tap,\n  title={Tokenize Anything via Prompting},\n  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},\n  journal={arXiv preprint arXiv:2312.09128},\n  year={2023}\n}\n```\n\n## Acknowledgement\n\nWe thank the repositories: [SAM](https://github.com/facebookresearch/segment-anything), [EVA](https://github.com/baaivision/EVA), [LLaMA](https://github.com/facebookresearch/llama), [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Gradio](https://github.com/gradio-app/gradio), [Detectron2](https://github.com/facebookresearch/detectron2) and [CodeWithGPU](https://github.com/seetacloud/codewithgpu).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaaivision%2Ftokenize-anything","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbaaivision%2Ftokenize-anything","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaaivision%2Ftokenize-anything/lists"}