https://github.com/thisisiron/korclip
π Korean CLIP
https://github.com/thisisiron/korclip
clip contrastive-learning korclip korean-clip
Last synced: 3 months ago
JSON representation
π Korean CLIP
- Host: GitHub
- URL: https://github.com/thisisiron/korclip
- Owner: thisisiron
- Created: 2024-09-18T16:31:25.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-04T10:39:18.000Z (7 months ago)
- Last Synced: 2024-12-31T06:12:38.528Z (5 months ago)
- Topics: clip, contrastive-learning, korclip, korean-clip
- Language: Jupyter Notebook
- Homepage:
- Size: 1.26 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# KorCLIP
KorCLIP is a project that implements a Korean version of the CLIP (Contrastive LanguageβImage Pretraining) model.
## Usage
```python
import io
import requests
from PIL import Image
import torch
from torchvision import transforms as T
from transformers import AutoImageProcessor, AutoModel, AutoTokenizerMODEL_PATH = "thisisiron/korclip-vit-base-patch32"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModel.from_pretrained(MODEL_PATH).to(device)image = Image.open(io.BytesIO(requests.get("http://images.cocodataset.org/val2014/COCO_val2014_000000537955.jpg").content))
preprocess = T.Compose([
T.Resize((224, 224)),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
image = preprocess(image).unsqueeze(0).to(device)
text = tokenizer(["κ°μμ§", "κ³ μμ΄", "κ±°λΆμ΄"], return_tensors="pt").to(device)with torch.no_grad():
image_features = model.get_image_features(image)
text_features = model.get_text_features(**text)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)print("Label probs:", text_probs)
```## Dataset
### COCO 2014
- Image download
```
cd data
./download.sh
```
- Korean annotation download [[link]](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=261)
- You need the `MSCOCO_train_val_Korean.json` file.
- Preprocessing (json -> csv)
```
python csv_converter.py
```
- Diretory structure
```
βββ data
β βββ **download.sh**
β βββ **download.sh**
β βββ **download.sh**
β βββ **csv_converter.py**
β βββ train2014.zip
β βββ val2014.zip
β βββ train2014/
β βββ val2014/
β βββ MSCOCO_train_val_Korean.json
β βββ COCO_train.csv
β βββ COCO_val.csv
```## Training
- Single-GPU
```
./run.sh 1
```
- Multi-GPU
```
./run.sh NUM_GPU
```## Evaluation (Zero-Shot Prediction)
- I am currently evaluating using only one template. I plan to add additional datasets and templates for future evaluations.
- The following metric is the results of training on the "COCO2014" Korean dataset only.
```
python eval.py
```| Dataset | Acc@1 | Acc@5 |
|---|---|---|
|CIFAR10| 61.99 | 93.82 |## Inference
- You can refer to `infer.ipynb`.