An open API service indexing awesome lists of open source software.

https://github.com/deepmancer/clip-object-detection

Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.
https://github.com/deepmancer/clip-object-detection

clip deep-learning faster-rcnn object-detection openai-clip rcnn region-proposal zero-shot-object-detection

Last synced: 5 months ago
JSON representation

Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.

Awesome Lists containing this project

README

          

# πŸš€ CLIP Zero-Shot Object Detection


PyTorch
OpenAI
Python
Jupyter Notebook
License

> **Detect objects in images without training!**

Welcome to the **CLIP Zero-Shot Object Detection** project! This repository demonstrates how to perform zero-shot object detection by integrating OpenAI's **CLIP** (Contrastive Language-Image Pretraining) model with a **Faster R-CNN** for region proposal generation.

---

| **Source Code** | **Website** |
|:-----------------|:------------|
| github.com/deepmancer/clip-object-detection | deepmancer.github.io/clip-object-detection |

---

## 🎯 Quick Start

Set up and run the pipeline in three simple steps:

1. **Clone the Repository**:

```bash
git clone https://github.com/deepmancer/clip-object-detection.git
cd clip-object-detection
```

2. **Install Dependencies**:

```bash
pip install -r requirements.txt
```

3. **Run the Notebook**:

```bash
jupyter notebook clip_object_detection.ipynb
```

---

## πŸ€” What is CLIP?

**CLIP** (Contrastive Language–Image Pretraining) is trained on 400 million image-text pairs. It embeds images and text into a shared space where the cosine similarity between embeddings reflects their semantic relationship.



CLIP Model Architecture

CLIP Model Architecture - Paper


---

## πŸ” Methodology

Our approach combines CLIP and Faster R-CNN for zero-shot object detection:

1. **πŸ“¦ Region Proposal**: Use Faster R-CNN to identify potential object locations.
2. **🎯 CLIP Embeddings**: Encode image regions and text descriptions into a shared embedding space.
3. **πŸ” Similarity Matching**: Compute cosine similarity between text and image embeddings to identify matches.
4. **✨ Results**: Highlight detected objects with their confidence scores.

---

## πŸ“Š Example Results

### Input Image


Original Image

### Region Proposals

Regions proposed by Faster R-CNN's RPN:


Candidate Regions

### Detected Objects

Objects detected by CLIP based on textual queries:


Detected Objects

---

## πŸ“¦ Requirements

Ensure the following are installed:

- **PyTorch**: Deep learning framework.
- **Torchvision**: Pre-trained Faster R-CNN.
- **OpenAI CLIP**: [GitHub Repository](https://github.com/openai/CLIP.git).
- Additional dependencies are listed in [requirements.txt](requirements.txt).

---

## πŸ“ License

This project is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute the code.

---

## ⭐ Support the Project

If this project inspires or assists your work, please consider giving it a ⭐ on GitHub! Your support motivates us to continue improving and expanding this repository.