{"id":25558109,"url":"https://github.com/sungnyun/openssl-simcore","last_synced_at":"2025-07-31T21:04:46.431Z","repository":{"id":171366967,"uuid":"614242624","full_name":"sungnyun/openssl-simcore","owner":"sungnyun","description":"(CVPR 2023) Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning","archived":false,"fork":false,"pushed_at":"2023-10-03T04:11:22.000Z","size":171,"stargazers_count":28,"open_issues_count":0,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-25T22:21:33.929Z","etag":null,"topics":["coreset","openssl","pytorch","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sungnyun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-15T07:32:16.000Z","updated_at":"2025-01-15T13:39:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"0e8fbbce-a2a7-4bfb-b288-f8f180368c0f","html_url":"https://github.com/sungnyun/openssl-simcore","commit_stats":null,"previous_names":["sungnyun/openssl-simcore"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungnyun%2Fopenssl-simcore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungnyun%2Fopenssl-simcore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungnyun%2Fopenssl-simcore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungnyun%2Fopenssl-simcore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sungnyun","download_url":"https://codeload.github.com/sungnyun/openssl-simcore/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248507481,"owners_count":21115612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coreset","openssl","pytorch","self-supervised-learning"],"created_at":"2025-02-20T15:29:51.625Z","updated_at":"2025-04-12T02:42:00.703Z","avatar_url":"https://github.com/sungnyun.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenSSL-SimCore (CVPR 2023)\n\n\u003ca href='https://arxiv.org/abs/2303.11101'\u003e\u003cimg src='https://img.shields.io/badge/Paper-arXiv:2303.11101-Green'\u003e\u003c/a\u003e\n\u003ca href='https://www.youtube.com/watch?v=f_-dIVRo8Q8'\u003e\u003cimg src='https://img.shields.io/badge/YouTube-Video-red'\u003e\u003c/a\u003e \n\u003ca href=#bibtex\u003e\u003cimg src='https://img.shields.io/badge/Paper-BibTex-yellow'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/sungnyun/openssl-simcore'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SimCore%20Model-blue'\u003e\u003c/a\u003e\n\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"1394\" src=\"https://user-images.githubusercontent.com/46050900/226794108-4ca0e8e8-0d1b-4509-97b5-214b41f03d7a.png\"\u003e\n\u003c/p\u003e\n\n[**Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning**](https://arxiv.org/abs/2303.11101)\u003cbr/\u003e\n[Sungnyun Kim](https://github.com/sungnyun)\\*,\n[Sangmin Bae](https://www.raymin0223.com)\\*,\n[Se-Young Yun](https://fbsqkd.github.io)\u003cbr/\u003e\n\\* equal contribution\n\n- **Open-set Self-Supervised Learning (OpenSSL) task**: an unlabeled open-set available during the pretraining phase on the fine-grained dataset.\n- **SimCore**: simple coreset selection algorithm to leverage a subset semantically similar to the target dataset.\n- SimCore significantly improves representation learning performance in various downstream tasks.\n- [update on 10.02.2023] Shared SimCore-pretrained models on [HuggingFace Models](https://huggingface.co/sungnyun/openssl-simcore).\n\n\n## Requirements\nInstall the necessary packages with: \n```\n$ pip install -r requirements.txt\n```\n\n\n## Data Preparation\nWe used 11 fine-grained datasets and 7 open-sets.\nPlace each data files into `data/[DATASET_NAME]/` (it should be constructed as the `torchvision.datasets.ImageFolder` format).    \nTo download and setup the data, please see the [docs](data/README.md) and run python files, if necessary.\n```bash\n$ cd data/\n$ python [DATASET_NAME]_image_folder_generator.py\n```\n\n## Pretraining\nTo simply pretrain the model, run the shell file. (We support multi-GPUs training, while we utilized 4 GPUs.)    \nYou will need to define the **path for each dataset**, and the **retrieval model checkpoint**. \n```bash\n# specify $TAG and $DATA\n\n$ CUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e bash run_selfsup.sh\n```\nHere are some important arguments to be considered.\n- `--dataset1`: fine-grained target dataset name\n- `--dataset2`: open-set name (default: imagenet)\n- `--data_folder1`: directory where the `dataset1` is located\n- `--data_folder2`: directory where the `dataset2` is located\n- `--retrieval_ckpt`: retrieval model checkpoint before SimCore pretraining; for this, pretrain vanilla SSL for 1K epochs\n- `--model`: model architecture (default: resnet50), see [models](models/)\n- `--method`: self-supervised learning method (default: simclr), see [ssl](ssl/)\n- `--sampling_method`: strategy for sampling from the open-set (choose between \"random\" or \"simcore\")\n- `--no_sampling`: if sampling unwanted (vanilla SSL pretrain), set this True\n\nThe pretrained model checkpoints will be saved at `save/[EXP_NAME]/`. For example, if you run the default shell file, the last epoch checkpoint will be saved as `save/$DATA_resnet50_pretrain_simclr_merge_imagenet_$TAG/last.pth`.\n\n\n## Linear Evaluation\nLinear evaluation of the pretrained models can be similarly implemented as the pretraining.    \nRun the following shell file, with the **pretrained model checkpoint** additionally defined.\n```bash\n# specify $TAG, $DATA, and --pretrained_ckpt\n\n$ CUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e bash run_sup.sh\n```\nWe also support **kNN evaluation** (`--knn`, `--topk`) and **semi-supervised fine-tuning** (`--label_ratio`, `--e2e`).\n\n### Result\nSimCore with a stopping criterion highly improves the accuracy by +10.5% (averaged over 11 datasets), compared to the pretraining without any open-set.\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"750\" src=\"https://user-images.githubusercontent.com/46050900/226905308-9cec7d37-f06e-4b6d-8a49-370ea6394afa.png\"\u003e\n\u003c/p\u003e\n\n### Try other open-sets\nSimCore works with various, or even uncurated open-sets. You can also try with your custom, web-crawled, or uncurated open-sets.\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"350\" src=\"https://user-images.githubusercontent.com/46050900/226906525-6fcda233-692d-48e9-a241-2faa3daf3893.png\"\u003e\n\u0026nbsp;\n\u0026nbsp;\n\u0026nbsp;\n\u003cimg width=\"350\" src=\"https://user-images.githubusercontent.com/46050900/226906595-2f3e293e-1f79-4992-bc22-fd128cb131d9.png\"\u003e\n\u003c/p\u003e\n\n\n## Downstream Tasks\nSimCore is extensively evaluated in various downstream tasks.    \nWe thus provide the training and evaluation codes for following downstream tasks.    \nFor more details, please see the [docs](downstream/README.md) and `downstream/` directory.    \n- [object detection](downstream/detection)\n- [pixel-wise segmentation](downstream/segmentation)\n- [open-set semi-supervised learning](downstream/opensemi)\n- [webly supervised learning](downstream/weblysup)\n- [semi-supervised learning](downstream/semisup)\n- [active learning](downstream/active)\n- [hard negative mining](downstream/hnm)\n\n Use the pretrained model checkpoint to run each downstream task.\n\n\n## BibTeX\nIf you find this repo useful for your research, please consider citing our paper:\n\n```\n@article{kim2023coreset,\n  title={Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning},\n  author={Kim, Sungnyun and Bae, Sangmin and Yun, Se-Young},\n  journal={arXiv preprint arXiv:2303.11101},\n  year={2023}\n}\n```\n\n## Contact\n- Sungnyun Kim: ksn4397@kaist.ac.kr\n- Sangmin Bae: bsmn0223@kaist.ac.kr\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungnyun%2Fopenssl-simcore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsungnyun%2Fopenssl-simcore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungnyun%2Fopenssl-simcore/lists"}