{"id":13563903,"url":"https://github.com/Sense-GVT/DeCLIP","last_synced_at":"2025-04-03T20:32:08.664Z","repository":{"id":38446563,"uuid":"415237842","full_name":"Sense-GVT/DeCLIP","owner":"Sense-GVT","description":"Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm","archived":false,"fork":false,"pushed_at":"2022-09-19T03:50:23.000Z","size":993,"stargazers_count":636,"open_issues_count":23,"forks_count":31,"subscribers_count":19,"default_branch":"main","last_synced_at":"2024-11-04T16:45:59.091Z","etag":null,"topics":["big-model","clip","image-text","multi-model","self-supervised","vision-language-pretraining","zero-shot"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sense-GVT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-09T07:35:46.000Z","updated_at":"2024-11-04T05:20:59.000Z","dependencies_parsed_at":"2022-07-11T19:51:47.661Z","dependency_job_id":null,"html_url":"https://github.com/Sense-GVT/DeCLIP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-GVT%2FDeCLIP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-GVT%2FDeCLIP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-GVT%2FDeCLIP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-GVT%2FDeCLIP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sense-GVT","download_url":"https://codeload.github.com/Sense-GVT/DeCLIP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247075149,"owners_count":20879390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-model","clip","image-text","multi-model","self-supervised","vision-language-pretraining","zero-shot"],"created_at":"2024-08-01T13:01:24.424Z","updated_at":"2025-04-03T20:32:08.329Z","avatar_url":"https://github.com/Sense-GVT.png","language":"Python","readme":"\u003c!-- # DeCLIP\nSupervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm.\n\nOur paper is available on [arxiv](https://arxiv.org/abs/2110.05208) --\u003e\n\n\n# [Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm.](https://arxiv.org/abs/2110.05208)\n\nDeCLIP is an open-source project that welcomes any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as a standardized toolkit to reimplement existing methods and develop their own new Contrastive Language-Image Pretraining methods. You can find the following things in this repo:\n+ Pre-trained models and training codes to reproduce various Contrastive Language-Image Pretraining methods(e.g. CLIP, DeCLIP, SLIP, FILIP).\n+ Various benchmark datasets for Large-scale Contrastive Language-Image Pretraining task.\n+ Zero-shot transfer and linear classification evaluation scripts for downstream datasets.\n\nWe aims to democratize large-scale CLIP to build a fair and reproducible CLIP community. Our paper are available on:\n\n**DeCLIP**: [Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm](https://arxiv.org/abs/2110.05208).\n\n**CLIP-Benchmark**: [Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision](https://arxiv.org/abs/2203.05796).\n\n\n## Call for Papers \u0026 Participation\n\n:loudspeaker: **Call for Papers \u0026 Participation**: ECCV Workshop and Challenge on [Computer Vision in the Wild (CVinW)](https://computer-vision-in-the-wild.github.io/eccv-2022/)\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd style=\"width:220px\"\u003e  \u003ccenter\u003e\u003ca href=\"https://computer-vision-in-the-wild.github.io/eccv-2022/\"\u003e\u003cimg src=\"https://computer-vision-in-the-wild.github.io/eccv-2022/static/eccv2022/img/ECCV-logo3.png\" alt=\"CVinW\"  width=\"100\" height=\"52\"\u003e   [Workshop]\u003c/a\u003e  \u003c/center\u003e \u003c/td\u003e\n    \u003ctd style=\"width:220px\"\u003e \u003ccenter\u003e\u003ca href=\"https://eval.ai/web/challenges/challenge-page/1832/overview\"\u003e\u003cimg src=\"https://evalai.s3.amazonaws.com/media/logos/4e939412-a9c0-46bd-9797-5ba0bd0a9095.jpg\" alt=\"ICinW\"  width=\"100\" height=\"52\"\u003e [IC Challenge]\u003c/a\u003e   \u003c/center\u003e \u003c/td\u003e\n    \u003ctd style=\"width:220px\"\u003e  \u003ccenter\u003e\u003ca href=\"https://eval.ai/web/challenges/challenge-page/1839/overview\"\u003e\u003cimg src=\"https://evalai.s3.amazonaws.com/media/logos/e3727105-2b29-4c9b-98a6-3d1191884eb5.jpg\" alt=\"ODinW\"  width=\"100\" height=\"52\"\u003e [OD Challenge]\u003c/a\u003e  \u003c/center\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Introduction\n\nRecently, large-scale Contrastive Language-Image Pre-training (CLIP) (Radfordet al., 2021) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks. However, CLIP is quite data-hungry and requires 400M image-text pairs for pre-training, thereby restricting its adoption. This work proposes a novel training paradigm, Data efficient CLIP (DeCLIP), to alleviate this limitation. We demonstrate that by carefully utilizing the widespread supervision among the image-text pairs, our DeCLIP can learn generic visual features more efficiently. Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities; (3) nearest-neighbor supervision from other similar pairs. Benefiting from these intrinsic supervision, our DeCLIP-ResNet50 can achieve 60.4% zero-shot top1 accuracy on ImageNet, which is 0.8% above the CLIP-ResNet50 while using 7.1× fewer data. Our DeCLIP-ResNet50 outperforms its counterpart in 8 out of 11 visual datasets when transferred to downstream tasks. Moreover, Scaling up the model and computing also works well in our framework.\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"docs/main_figure.jpg\" alt=\"Declip framework\" width=\"800\"/\u003e\u003c/p\u003e\n\n\u003c!-- ![main_figure](docs/main_figure.jpg) --\u003e\n\n\n# Updates\n\n***2022-09-19*** :loudspeaker: **Call for Papers \u0026 Participation**: ECCV Workshop and Challenge on [Computer Vision in the Wild (CVinW)](https://computer-vision-in-the-wild.github.io/eccv-2022/)\n\n***2022-06-25*** We release the checkpoints of each models for benchmark.\n\n***2022-03-10*** We update the result of CLIP-Benchmark and release our YFCC15M dataset.\n\n***2022-02-22*** We release our training code, benchmark, and model zoo! ***We will release the checkpoints of each models after align the results soon***. We hope this project could serve the growing Contrastive Language-Image Pretraining research community by providing a flexible as well as standardized toolkit.\n\n***2021-11-06*** First Commit, Our code, dataset and models will be relased soon.\n\n\n## Installation\n\nPlease refer to [get_started.md](docs/get_started.md#installation) for installation and [dataset_prepare.md](docs/dataset_prepare.md#prepare-datasets) for dataset preparation.\n\n\n## Get Started\n\nInstall PyTorch. The code has been tested with CUDA 11.2/CuDNN 8.1.0, PyTorch 1.8.1.\n\n\nFirst, prepare pre-training datasets and downstream classification datasets through [get_started.md](docs/get_started.md#installation). \n\nWe organize the different models trained on different data through separate [experimental catalogs] (experiments/), you can check the dir for detail.\n\n#### 1. Pre-training\n\nYou can run `run.sh` directly to train the corresponding model. We train most of our models on 4x8-gpu nodes. Check the config in the experiment directory of the corresponding model for details.\n\n#### 2. Zero-shot Evalution\n\nYou can add a argument `--evaluate` on run script for zero-shot evalution.\n\n\n## DeCLIP Model-Zoo\n\n### Our pretrain visual backbone model (w/o text encoder)\n\u003c!-- \nDeCLIP_r50    [GoogleDriver](https://drive.google.com/file/d/1SZJ8CU5dDIwuvZWxb4xdld7qv7aw6wKm/view?usp=sharing).  \nDeCLIP_vitb32 [GoogleDriver](https://drive.google.com/file/d/1W2cCxsr3EjvOOWzVXZukLk38c8LC6UUm/view?usp=sharing) --\u003e\n\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"center\"\u003eMethod\u003c/th\u003e\n\u003cth valign=\"center\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"center\"\u003eModel\u003c/th\u003e\n\u003cth valign=\"center\"\u003eEpochs\u003c/th\u003e\n\u003cth valign=\"center\"\u003e0-shot\u003c/th\u003e\n\u003cth valign=\"center\"\u003eConfig\u003c/th\u003e\n\u003cth valign=\"center\"\u003ePaper\u003c/th\u003e\n\u003cth valign=\"center\"\u003eWeights\u003c/th\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eDeclip-88M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e62.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_r50_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1SZJ8CU5dDIwuvZWxb4xdld7qv7aw6wKm/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eDeclip-88M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e66.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_vit_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1W2cCxsr3EjvOOWzVXZukLk38c8LC6UUm/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n \n \n \u003c/tbody\u003e\u003c/table\u003e\n \n\n### Our pretrain declip model (w text encoder)\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"center\"\u003eMethod\u003c/th\u003e\n\u003cth valign=\"center\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"center\"\u003eModel\u003c/th\u003e\n\u003cth valign=\"center\"\u003eEpochs\u003c/th\u003e\n\u003cth valign=\"center\"\u003e0-shot\u003c/th\u003e\n\u003cth valign=\"center\"\u003eConfig\u003c/th\u003e\n\u003cth valign=\"center\"\u003ePaper\u003c/th\u003e\n\u003cth valign=\"center\"\u003eWeights\u003c/th\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eDeclip-88M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e62.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_r50_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1YS6xRdF5bY4rvgyCFQVIyPMr7evfZi3D/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eDeclip-88M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e66.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_vit_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/135v0R8zTiAppr-BejtoXRHeGeMTlDf2x/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n \n \n \u003c/tbody\u003e\u003c/table\u003e\n\n\n\n# CLIP-Benchmark\n\n\nDemocratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision. Our paper is available on [Arxiv](https://arxiv.org/abs/2203.05796).\n\n\n\nWitnessing its great success, researchers continue to push the frontier of CLIP.  For instance, SLIP, DeCLIP and FILIP achieve considerable improvements via embracing different kinds of supervision within the image-text pairs. However, it remains challenging to make fair comparison between these methods. This is because they do not choose consistent training recipes and even use different data. We propose CLIP-benchmark, a first attempt to evaluate, analyze, and benchmark CLIP and its variants. Moreover, we further combine DeCLIP with FILIP, bringing us the strongest variant DeFILIP.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"docs/main_figure2.png\" alt=\"Declip framework\" width=\"800\"/\u003e\u003c/p\u003e\n\n\u003c!-- **Model will be relased soon** --\u003e\n\n### Supported Models:\n\nThe following models are pre-trained on YFCC15M and evaluated on ImageNet-1K (ILSVRC2012).\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"center\"\u003eMethod\u003c/th\u003e\n\u003cth valign=\"center\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"center\"\u003eModel\u003c/th\u003e\n\u003cth valign=\"center\"\u003eEpochs\u003c/th\u003e\n\u003cth valign=\"center\"\u003e0-shot\u003c/th\u003e\n\u003cth valign=\"center\"\u003eConfig\u003c/th\u003e\n\u003cth valign=\"center\"\u003ePaper\u003c/th\u003e\n\u003cth valign=\"center\"\u003eWeights\u003c/th\u003e\n\n\u003c!-- TABLE BODY --\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/clip_experiments/yfcc15m_vit_clip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2103.00020.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1Xr6fP6DlVzh4SMXXhtcOlh6hFWPaUgs2/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e43.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_vit_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1YOAjiSgOBV4gjPpMyo8UDRSZMFxkP3Dz/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eSLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e34.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/slip_experiments/yfcc15m_vit_slip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2112.12750.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1PhjUgjLHymTW42_6TaY21P685-lSjbTj/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eFILIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e39.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/filip_experiments/yfcc15m_vit_filip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2111.07783.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1qqsCMgOqOaQkmgkG205qoKxcOsr3QacO/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctd align=\"center\"\u003eDeFILIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e45.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/defilip_experiments/yfcc15m_vit_defilip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2203.05796\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1FYId-kfhwneDGZO6R8nxAS7ti4_V4xwG/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/tbody\u003e\u003c/table\u003e\n\n\n\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"center\"\u003eMethod\u003c/th\u003e\n\u003cth valign=\"center\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"center\"\u003eModel\u003c/th\u003e\n\u003cth valign=\"center\"\u003eEpochs\u003c/th\u003e\n\u003cth valign=\"center\"\u003e0-shot\u003c/th\u003e\n\u003cth valign=\"center\"\u003eConfig\u003c/th\u003e\n\u003cth valign=\"center\"\u003ePaper\u003c/th\u003e\n\u003cth valign=\"center\"\u003eWeights\u003c/th\u003e\n\n\u003c!-- TABLE BODY --\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e37.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/clip_experiments/yfcc15m_r50_clip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2103.00020.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1ln2doFyO3Jhld1GJUedHozvA6cdoE6jp/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eDeCLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e44.4\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/declip_experiments/yfcc15m_r50_declip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2110.05208.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1ZjgJkA-50p0U16kJS-oEKHRUWD2nAjJb/view?usp=sharing\"\u003eGoogleDriver\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eSLIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e28.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/slip_experiments/yfcc15m_r50_slip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2112.12750.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e--\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eFILIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e21.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/filip_experiments/yfcc15m_r50_filip\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://arxiv.org/pdf/2111.07783.pdf\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e--\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c!-- \u003ctd align=\"center\"\u003eDeFILIP\u003c/td\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32\u003c/td\u003e\n\u003ctd align=\"center\"\u003e--\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"experiments/defilip_experiments\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"\"\u003epaper\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"\"\u003eurl\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n --\u003e\n\u003c/tbody\u003e\u003c/table\u003e\n\n### Supported datasets:\n\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"center\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"center\"\u003eSamples\u003c/th\u003e\n\u003cth valign=\"center\"\u003edownload\u003c/th\u003e\n\u003cth valign=\"center\"\u003ePaper\u003c/th\u003e\n\n\u003c!-- TABLE BODY --\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eYFCC-15M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e15,388,848\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1P-2_dHNc_c5XMY0A-89iNF5Cz_Y_Cfsy/view?usp=sharing\"\u003egoogle driver\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"\"\u003eurl\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/tbody\u003e\u003c/table\u003e\n\n\n\n\n\u003c!-- \n### Our pretrain visual backbone \n\n**Model will be relased soon**  --\u003e\n\n\n\n## Changelog\n\n***2022-02-22*** Realase our Training code\n\n***2021-11-06*** First Commit\n \n\n\n\n## Citation\n\n\u003c!--  @misc{li2021supervision,\n      title={Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm}, \n      author={Yangguang Li and Feng Liang and Lichen Zhao and Yufeng Cui and Wanli Ouyang and Jing Shao and Fengwei Yu and Junjie Yan},\n      year={2021},\n      eprint={2110.05208},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n} --\u003e\n\n```\n@inproceedings{li2022supervision,\n      title={Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image  Pre-training Paradigm},\n      author={Yangguang Li and Feng Liang and Lichen Zhao and Yufeng Cui and Wanli Ouyang and Jing Shao and Fengwei Yu and Junjie Yan},\n      booktitle={International Conference on Learning Representations},\n      year={2022},\n      url={https://openreview.net/forum?id=zq1iJkNk3uN}\n}\n\n@misc{cui2022democratizing,\n      title={Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision}, \n      author={Yufeng Cui and Lichen Zhao and Feng Liang and Yangguang Li and Jing Shao},\n      year={2022},\n      eprint={2203.05796},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n## License\n\nFor academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact the authors.\n\n## Acknowledgement\n\nDeCLIP is an open-source project that welcomes any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as a standardized toolkit to reimplement existing methods and develop their own new Contrastive Language-Image Pretraining methods.\n\nOur framework is based on [prototype](https://github.com/ModelTC/prototype).\n\n\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSense-GVT%2FDeCLIP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSense-GVT%2FDeCLIP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSense-GVT%2FDeCLIP/lists"}