https://github.com/kimrass/clip

PyTorch implementation of 'CLIP' (Radford et al., 2021) and training it on Flickr8k + Flickr30k
https://github.com/kimrass/clip

clip flickr30k flickr8k linear-classification multi-modal text-image-retrieval zero-shot-classification

Last synced: about 2 months ago
JSON representation

PyTorch implementation of 'CLIP' (Radford et al., 2021) and training it on Flickr8k + Flickr30k

Host: GitHub
URL: https://github.com/kimrass/clip
Owner: KimRass
Created: 2023-10-19T06:36:41.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-14T09:09:05.000Z (about 1 year ago)
Last Synced: 2024-03-14T10:30:17.887Z (about 1 year ago)
Topics: clip, flickr30k, flickr8k, linear-classification, multi-modal, text-image-retrieval, zero-shot-classification
Language: Python
Homepage:
Size: 18.3 MB
Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 'CLIP' (Radford et al., 2021) implementation from scratch in PyTorch
- [Learning Transferable Visual Models From Natural Language Supervision](https://github.com/KimRass/CLIP/blob/main/papers/learning_transferable_visual_models_from_natural_language_supervision.pdf)
## Pretrained Model
- CLIP trained on Flickr8k + Flickr30k for 200 epochs
- [clip_flickr.pth](https://drive.google.com/file/d/1BEKphn5BULRIMYJr5JT5_p2W8sYzJKHO/view?usp=drive_link)
## Linear Classification on ImageNet1k (mini) Dataset
```bash
# e.g.,
python3 linear_classification.py\
--ckpt_path="../clip_flickr.pth"\
--data_dir="../imagenet-mini/"\
--n_epochs=64\
--batch_size=128\
--n_cpus=4 # Optional
```
- Top-5 accuracy on validation set: 5.8%
## Zero-shot Classification on ImageNet1k (mini) Dataset
```bash
# e.g.,
python3 zero_shot_classification.py\
--ckpt_path="../clip_flickr.pth"\
--data_dir="../imagenet-mini/"\
--batch_size=16\
--n_cpus=4\ # Optional
--max_len=128\ # Optional
--k=10 # Optional
```
- Top-10 accuracy on train + validation set: 3.0%
## Implementation Details
- Temperature와 관련한 부분은 구현하지 않았습니다.
- "The learnable temperature parameter was clipped to prevent scaling the logits by more than 100 which we found necessary to prevent training instability."

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kimrass/clip

Awesome Lists containing this project

README