{"id":18340894,"url":"https://github.com/tinyvision/solider","last_synced_at":"2025-05-16T01:05:15.617Z","repository":{"id":150671246,"uuid":"575379823","full_name":"tinyvision/SOLIDER","owner":"tinyvision","description":"A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent","archived":false,"fork":false,"pushed_at":"2023-07-21T08:22:12.000Z","size":456,"stargazers_count":1445,"open_issues_count":24,"forks_count":233,"subscribers_count":102,"default_branch":"main","last_synced_at":"2025-04-08T11:15:10.484Z","etag":null,"topics":["cvpr2023","human-centric","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinyvision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-07T11:35:59.000Z","updated_at":"2025-04-05T12:45:45.000Z","dependencies_parsed_at":"2024-11-05T20:32:21.990Z","dependency_job_id":null,"html_url":"https://github.com/tinyvision/SOLIDER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinyvision%2FSOLIDER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinyvision%2FSOLIDER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinyvision%2FSOLIDER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinyvision%2FSOLIDER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinyvision","download_url":"https://codeload.github.com/tinyvision/SOLIDER/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254448579,"owners_count":22072764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2023","human-centric","self-supervised-learning"],"created_at":"2024-11-05T20:24:35.343Z","updated_at":"2025-05-16T01:05:15.569Z","avatar_url":"https://github.com/tinyvision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\u003cimg src=\"assets/logo.png\" width=\"900\"\u003e\u003c/div\u003e\n\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/pedestrian-attribute-recognition-on-pa-100k)](https://paperswithcode.com/sota/pedestrian-attribute-recognition-on-pa-100k?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/person-re-identification-on-msmt17)](https://paperswithcode.com/sota/person-re-identification-on-msmt17?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/person-re-identification-on-market-1501)](https://paperswithcode.com/sota/person-re-identification-on-market-1501?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/person-search-on-cuhk-sysu)](https://paperswithcode.com/sota/person-search-on-cuhk-sysu?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/person-search-on-prw)](https://paperswithcode.com/sota/person-search-on-prw?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/pedestrian-detection-on-citypersons)](https://paperswithcode.com/sota/pedestrian-detection-on-citypersons?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/semantic-segmentation-on-lip-val)](https://paperswithcode.com/sota/semantic-segmentation-on-lip-val?p=beyond-appearance-a-semantic-controllable)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-appearance-a-semantic-controllable/pose-estimation-on-coco)](https://paperswithcode.com/sota/pose-estimation-on-coco?p=beyond-appearance-a-semantic-controllable)\n\nWelcome to **SOLIDER**! SOLIDER is a Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, different downstream tasks always require different ratios of semantic information and appearance information, and a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller, which can fit different needs of downstream tasks. For more details, please refer to our paper [Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks](https://arxiv.org/abs/2303.17602).\n\n\u003cdiv align=\"center\"\u003e\u003cimg src=\"assets/framework.png\" width=\"900\"\u003e\u003c/div\u003e\n\n## Updates\n- **[2023/07/21: Codes of human pose task is released!] ![new](https://img.alicdn.com/imgextra/i4/O1CN01kUiDtl1HVxN6G56vN_!!6000000000764-2-tps-43-19.png)**\n    * Training details of our pretrained model on downstream human pose task is released.\n- **[2023/05/15: Codes of human parsing task is released!] ![new](https://img.alicdn.com/imgextra/i4/O1CN01kUiDtl1HVxN6G56vN_!!6000000000764-2-tps-43-19.png)**\n    * Training details of our pretrained model on downstream human parsing task is released.\n- **[2023/04/24: Codes of attribute recognition task is released!] ![new](https://img.alicdn.com/imgextra/i4/O1CN01kUiDtl1HVxN6G56vN_!!6000000000764-2-tps-43-19.png)**\n    * Training details of our pretrained model on downstream person attribute recognition task is released.\n- **[2023/03/28: Codes of 3 downstream tasks are released!]**\n    * Training details of our pretrained model on 3 downstream human visual tasks, including person re-identification, person search and pedestrian detection, are released.\n- **[2023/03/13: SOLIDER is accepted by CVPR2023!]**\n    * The paper of SOLIDER is accepted by CVPR2023, and its offical pytorch implementation is released in this repo. \n\n## Installation\nThis codebase has been developed with python version 3.7, PyTorch version 1.7.1, CUDA 10.1 and torchvision 0.8.2.                                           \n\n## Datasets\nWe use **LUPerson** as our training data, which consists of unlabeled human images. Download **LUPerson** from its [offical link](https://github.com/DengpanFu/LUPerson) and unzip it.\n\n## Training\n- Choice 1. To train SOLIDER from scratch, please run:\n```shell\nsh run_solider.sh\n```\n\n- Choice 2. Training SOLIDER from scratch may take a long time. To speed up the training, you can train a DINO model first, and then finetune it with SOLIDER, as follows:\n```shell\nsh run_dino.sh\nsh resume_solider.sh\n```\n\n## Finetuning and Inference\nThere is a demo to run the trained SOLIDER model, which can be embedded into the inference or the downstream task finetuning.\n```shell\npython demo.py\n```\n\n## Models\nWe use [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) as our backbone, which shows great advantages on many CV tasks.\n| Task | Dataset | Swin Tiny\u003cbr\u003e([Link](https://drive.google.com/file/d/12UyPVFmjoMVpQLHN07tNh4liHUmyDqg8/view?usp=share_link)) | Swin Small\u003cbr\u003e([Link](https://drive.google.com/file/d/1oyEgASqDHc7YUPsQUMxuo2kBZyi2Tzfv/view?usp=share_link)) | Swin Base\u003cbr\u003e([Link](https://drive.google.com/file/d/1uh7tO34tMf73MJfFqyFEGx42UBktTbZU/view?usp=share_link)) |\n| :---: |:---: |:---: | :---: | :---: |\n| Person Re-identification (mAP/R1)\u003cbr\u003ew/o re-ranking | Market1501 | 91.6/96.1 | 93.3/96.6 | 93.9/96.9 |\n|  | MSMT17 | 67.4/85.9 | 76.9/90.8 | 77.1/90.7 |\n| Person Re-identification (mAP/R1)\u003cbr\u003ewith re-ranking | Market1501 | 95.3/96.6 | 95.4/96.4 | 95.6/96.7 |\n|  | MSMT17 | 81.5/89.2 | 86.5/91.7 | 86.5/91.7 |\n| Attribute Recognition (mA) | PETA_ZS | 74.37 | 76.21 | 76.43 |\n|  | RAP_ZS | 74.23 | 75.95 | 76.42 |\n|  | PA100K | 84.14 | 86.25 | 86.37 |\n| Person Search (mAP/R1) | CUHK-SYSU | 94.9/95.7 | 95.5/95.8 | 94.9/95.5 |\n|  | PRW | 56.8/86.8 | 59.8/86.7 | 59.7/86.8 |\n| Pedestrian Detection (MR-2) | CityPersons | 10.3/40.8 | 10.0/39.2 | 9.7/39.4 |\n| Human Parsing (mIOU) | LIP | 57.52 | 60.21 | 60.50 |\n| Pose Estimation (AP/AR) | COCO | 74.4/79.6 | 76.3/81.3 | 76.6/81.5 |\n\n- All the models are trained on the whole LUPerson dataset.\n\n## Traning codes on Downstream Tasks\n- [Person Re-identification](https://github.com/tinyvision/SOLIDER-REID)\n- [Person Search](https://github.com/tinyvision/SOLIDER-PersonSearch)\n- [Pedestrian Detection](https://github.com/tinyvision/SOLIDER-PedestrianDetection)\n- [Person Attribute Recognition](https://github.com/tinyvision/SOLIDER-PersonAttributeRecognition)\n- [Human Parsing](https://github.com/tinyvision/SOLIDER-HumanParsing)\n- [Pose Estimation](https://github.com/tinyvision/SOLIDER-HumanPose)\n\n## Acknowledgement\nOur implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.\n- [Swin-Transformer](https://github.com/microsoft/Swin-Transformer)\n- [DINO](https://github.com/facebookresearch/dino)\n- [TransReID](https://github.com/damo-cv/TransReID)\n- [TransReID-SSL](https://github.com/damo-cv/TransReID-SSL)\n- [SeqNet](https://github.com/serend1p1ty/SeqNet)\n- [Pedestron](https://github.com/hasanirtiza/Pedestron)\n- [LUPerson](https://github.com/DengpanFu/LUPerson)\n- [SCHP](https://github.com/GoGoDuck912/Self-Correction-Human-Parsing)\n- [mmpose](https://github.com/open-mmlab/mmpose)\n\n## Reference\nIf you use SOLIDER in your research, please cite our work by using the following BibTeX entry:\n```\n@inproceedings{chen2023beyond,\n  title={Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks},\n  author={Weihua Chen and Xianzhe Xu and Jian Jia and Hao Luo and Yaohua Wang and Fan Wang and Rong Jin and Xiuyu Sun},\n  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2023},\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinyvision%2Fsolider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinyvision%2Fsolider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinyvision%2Fsolider/lists"}