https://github.com/rathod-shubham/clip-classifier

CLIP is a multi-modal, zero-shot open-source paradigm. Without optimizing for a specific purpose, given a picture and text descriptions, the model can predict the best suitable text description for that image.
https://github.com/rathod-shubham/clip-classifier

ai artificial-intelligence artificial-neural-networks classification computer-vision deep-learning machine-learning machine-learning-algorithms python python3

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/rathod-shubham/clip-classifier
Owner: RATHOD-SHUBHAM
Created: 2023-09-24T04:00:39.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-06-05T23:29:57.000Z (over 1 year ago)
Last Synced: 2025-01-31T15:12:57.513Z (8 months ago)
Topics: ai, artificial-intelligence, artificial-neural-networks, classification, computer-vision, deep-learning, machine-learning, machine-learning-algorithms, python, python3
Language: Jupyter Notebook
Homepage:
Size: 91.5 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

# CLIP

* What is CLIP?

* Contrastive Language-Image Pre-training (CLIP forshort) is a state-of-the-art model introduced by OpenAl in February.
* CLIP is a neural network trained on about 400 million (text and image) pairs.
* Training uses a contrastive learning approach that aims to unify text and images, allowing tasks like image classification to be done with text-image similarity.

* CLIP Architecture:
* Two encoders are jointly trained to predict the correct pairings of abatch of training (image, text) examples.

* The text encoder's backbone is a transformer model, and the base size uses 63 millions- parameters,12 layers, and a 512-wide modelcontaining 8 attention heads.
* The image encoder, on the other hand, uses both a Vision Transformer (ViT) and a ResNet50 as its backbone, responsible for generating the feature representation of the image.

* Run Code:
* Install:
1. !pip install git+https://github.com/PrithivirajDamodaran/ZSIC.git\
2. !pip install streamlit

* Run app:
streamlit run app.py

Screenshot

---

# Image Search

## SentenceTransformers
SentenceTransformers provides models that allow to embed images and text into the same vector space.

This allows to find similar images as well as to implement image search.

## clip-ViT-B-32
This is the Image & Text model CLIP, which maps text and images to a shared vector space

## Usage
1. Git clone Repository.
2. cd ImageSearch.
3. pip install requirements.txt

## Docker Image
* [Image](https://hub.docker.com/repository/docker/gibbo96/text2image/general)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rathod-shubham/clip-classifier

Awesome Lists containing this project

README