{"id":15028135,"url":"https://github.com/whai362/pvt","last_synced_at":"2025-05-15T18:10:35.160Z","repository":{"id":37720676,"uuid":"341748701","full_name":"whai362/PVT","owner":"whai362","description":"Official implementation of PVT series","archived":false,"fork":false,"pushed_at":"2022-10-27T08:47:14.000Z","size":15173,"stargazers_count":1790,"open_issues_count":40,"forks_count":251,"subscribers_count":23,"default_branch":"v2","last_synced_at":"2025-03-31T22:22:01.317Z","etag":null,"topics":["backbone","detection","pvt","pvtv2","segmentation","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/whai362.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-24T02:01:37.000Z","updated_at":"2025-03-30T09:38:48.000Z","dependencies_parsed_at":"2022-07-08T01:35:22.307Z","dependency_job_id":null,"html_url":"https://github.com/whai362/PVT","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whai362%2FPVT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whai362%2FPVT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whai362%2FPVT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whai362%2FPVT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/whai362","download_url":"https://codeload.github.com/whai362/PVT/tar.gz/refs/heads/v2","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744335,"owners_count":20988783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backbone","detection","pvt","pvtv2","segmentation","transformer"],"created_at":"2024-09-24T20:07:41.190Z","updated_at":"2025-04-07T23:10:22.444Z","avatar_url":"https://github.com/whai362.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Updates\n- (2022/08/09) Application examples for polyp segmentation (polyp-pvt) and vision-language modeling.\n- (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.\n\n# Pyramid Vision Transformer\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"400\", src=\"./logo.png\"\u003e\n\u003c/div\u003e\n\u003cp align=\"center\"\u003e\n  The image is from Transformers: Revenge of the Fallen.\n\u003c/p\u003e\n\nThis repository contains the official implementation of [PVTv1](https://arxiv.org/abs/2102.12122) \u0026 [PVTv2](https://arxiv.org/pdf/2106.13797.pdf) in image classification, object detection, and semantic segmentation tasks.\n\n\n## Model Zoo\n\n### Image Classification\n\nClassification configs \u0026 weights see \u003e\u003e\u003e[here](classification/)\u003c\u003c\u003c.\n\n- PVTv2 on ImageNet-1K\n\n| Method           | Size | Acc@1 | #Params (M) |\n|------------------|:----:|:-----:|:-----------:|\n| PVTv2-B0        |  224 |  70.5 |     3.7     |\n| PVTv2-B1        |  224 |  78.7 |     14.0    |\n| PVTv2-B2-Linear |  224 |  82.1 |     22.6    |\n| PVTv2-B2        |  224 |  82.0 |     25.4    |\n| PVTv2-B3        |  224 |  83.1 |     45.2    |\n| PVTv2-B4        |  224 |  83.6 |     62.6    |\n| PVTv2-B5        |  224 |  83.8 |     82.0    |\n\n- PVTv1 on ImageNet-1K\n\n| Method     | Size | Acc@1 | #Params (M) |\n|------------|:----:|:-----:|:-----------:|\n| PVT-Tiny   |  224 |  75.1 |     13.2    |\n| PVT-Small  |  224 |  79.8 |     24.5    |\n| PVT-Medium |  224 |  81.2 |     44.2    |\n| PVT-Large  |  224 |  81.7 |     61.4    |\n\n\n### Object Detection \n\nDetection configs \u0026 weights see \u003e\u003e\u003e[here](detection/)\u003c\u003c\u003c.\n\n\n- PVTv2 on COCO\n\n#### Baseline Detectors\n\n\n|   Method   | Backbone | Pretrain    | Lr schd | Aug | box AP | mask AP |\n|------------|----------|-------------|:-------:|:---:|:------:|:-------:|\n|  RetinaNet | PVTv2-b0 | ImageNet-1K |    1x   |  No |  37.2  |    -    |\n|  RetinaNet | PVTv2-b1 | ImageNet-1K |    1x   |  No |  41.2  |    -    |\n|  RetinaNet | PVTv2-b2 | ImageNet-1K |    1x   |  No |  44.6  |    -    |\n|  RetinaNet | PVTv2-b3 | ImageNet-1K |    1x   |  No |  45.9  |    -    |\n|  RetinaNet | PVTv2-b4 | ImageNet-1K |    1x   |  No |  46.1  |    -    |\n|  RetinaNet | PVTv2-b5 | ImageNet-1K |    1x   |  No |  46.2  |    -    |\n| Mask R-CNN | PVTv2-b0 | ImageNet-1K |    1x   |  No |  38.2  |   36.2  |\n| Mask R-CNN | PVTv2-b1 | ImageNet-1K |    1x   |  No |  41.8  |   38.8  |\n| Mask R-CNN | PVTv2-b2 | ImageNet-1K |    1x   |  No |  45.3  |   41.2  |\n| Mask R-CNN | PVTv2-b3 | ImageNet-1K |    1x   |  No |  47.0  |   42.5  |\n| Mask R-CNN | PVTv2-b4 | ImageNet-1K |    1x   |  No |  47.5  |   42.7  |\n| Mask R-CNN | PVTv2-b5 | ImageNet-1K |    1x   |  No |  47.4  |   42.5  |\n\n\n#### Advanced Detectors\n\n\n| Method             | Backbone        | Pretrain    | Lr schd | Aug | box AP | mask AP |\n|--------------------|-----------------|-------------|:-------:|:---:|:------:|:-------:|\n| Cascade Mask R-CNN | PVTv2-b2-Linear | ImageNet-1K |    3x   | Yes |  50.9  |   44.0  |\n| Cascade Mask R-CNN | PVTv2-b2        | ImageNet-1K |    3x   | Yes |  51.1  |   44.4  |\n| ATSS          | PVTv2-b2-Linear | ImageNet-1K |    3x   | Yes |  48.9  |   -   |\n| ATSS          | PVTv2-b2        | ImageNet-1K |    3x   | Yes |  49.9  |   -   |\n| GFL           | PVTv2-b2-Linear | ImageNet-1K |    3x   | Yes |  49.2  |   -   |\n| GFL           | PVTv2-b2        | ImageNet-1K |    3x   | Yes |  50.2  |   -   |\n| Sparse R-CNN  | PVTv2-b2-Linear | ImageNet-1K |    3x   | Yes |  48.9  |   -   |\n| Sparse R-CNN  | PVTv2-b2        | ImageNet-1K |    3x   | Yes |  50.1  |   -   |\n\n- PVTv1 on COCO\n\n| Detector  | Backbone  | Pretrain    | Lr schd | box AP | mask AP |\n|-----------|-----------|-------------|:-------:|:------:|:-------:|\n| RetinaNet | PVT-Tiny  | ImageNet-1K |    1x   |  36.7  |    -    |\n| RetinaNet | PVT-Small | ImageNet-1K |    1x   |  40.4  |    -    |\n| Mask RCNN | PVT-Tiny  | ImageNet-1K |    1x   |  36.7  |   35.1  |\n| Mask RCNN | PVT-Small | ImageNet-1K |    1x   |  40.4  |   37.8  |\n| DETR      | PVT-Small | ImageNet-1K |   50ep  |  34.7  |    -    |\n\n\n### Semantic Segmentation\n\nSegmentation configs \u0026 weights see \u003e\u003e\u003e[here](segmentation/)\u003c\u003c\u003c.\n\nPVT-v2 + Segmentation see \u003e\u003e\u003e[here](https://github.com/whai362/PVTv2-Seg)\u003c\u003c\u003c.\n\n- PVTv1 on ADE20K\n\n| Method       | Backbone   | Pretrain    | Iters | mIoU |\n|--------------|------------|-------------|-------|------|\n| Semantic FPN | PVT-Tiny   | ImageNet-1K | 40K   | 35.7 |\n| Semantic FPN | PVT-Small  | ImageNet-1K | 40K   | 39.8 |\n| Semantic FPN | PVT-Medium | ImageNet-1K | 40K   | 41.6 |\n| Semantic FPN | PVT-Large  | ImageNet-1K | 40K   | 42.1 |\n\n### Polyp Segmentation\nPolyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. [pdf](https://arxiv.org/abs/2108.06932) | [code](https://github.com/DengPingFan/Polyp-PVT)\n\n### Vision-Language Modeling\nMasked Vision-Language Transformer in Fashion. [pdf](https://dengpingfan.github.io/papers/[2022][MIR]MVLT.pdf) | [code](https://github.com/GewelsJI/MVLT)\n\n## License\nThis repository is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file.\n\n\n## Citation\nIf you use this code for a paper, please cite:\n\nPVTv1\n```\n@inproceedings{wang2021pyramid,\n  title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},\n  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  pages={568--578},\n  year={2021}\n}\n```\n\nPVTv2\n```\n@article{wang2021pvtv2,\n  title={Pvtv2: Improved baselines with pyramid vision transformer},\n  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},\n  journal={Computational Visual Media},\n  volume={8},\n  number={3},\n  pages={1--10},\n  year={2022},\n  publisher={Springer}\n}\n```\n\n\n\n## Contact\n\nThis repo is currently maintained by Wenhai Wang ([@whai362](https://github.com/whai362)), Enze Xie ([@xieenze](https://github.com/xieenze)), and Zhe Chen ([@czczup](https://github.com/czczup)).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhai362%2Fpvt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwhai362%2Fpvt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhai362%2Fpvt/lists"}