Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Curt-Park/segment-anything-with-clip

Segment Anything combined with CLIP
https://github.com/Curt-Park/segment-anything-with-clip

colab-notebook huggingface-spaces machine-learning nlp-machine-learning segmentation-model

Last synced: 2 months ago
JSON representation

Segment Anything combined with CLIP

Awesome Lists containing this project

README

        

# Segment Anything with Clip
[[HuggingFace Space](https://huggingface.co/spaces/curt-park/segment-anything-with-clip)] | [[COLAB](https://colab.research.google.com/github/Curt-Park/segment-anything-with-clip/blob/main/colab.ipynb)] | [[Demo Video](https://youtu.be/vM7MfAc3BdQ)]

Meta released [a new foundation model for segmentation tasks](https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/).
It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text.
However, the text prompt is not released yet.

Alternatively, I took the following steps:
1. Get all object proposals generated by SAM (Segment Anything Model).
2. Crop the object regions by bounding boxes.
3. Get cropped images' features and a query feature from [CLIP](https://openai.com/research/clip).
4. Calculate the similarity between image features and the query feature.
```python
# How to get the similarity.
preprocessed_img = preprocess(crop).unsqueeze(0)
tokens = clip.tokenize(texts)
logits_per_image, _ = model(preprocessed_img, tokens)
similarity = logits_per_image.softmax(-1)
```

## How to run on local
[Anaconda](https://www.anaconda.com/) is required before start setup.
```bash
make env
conda activate segment-anything-with-clip
make setup
```

```bash
# this executes GRadio server.
make run
```
Open http://localhost:7860/
![](https://user-images.githubusercontent.com/14961526/232016821-dda192c1-1095-4086-adb8-e6a9f44b685f.png)

## Successive Works
- [Fast Segment Everything](https://huggingface.co/spaces/Annotation-AI/fast-segment-everything): Re-implemented *Everything* algorithm in iterative manner that is better for CPU only environments. It shows comparable results to the original Everything within 1/5 number of inferences (e.g. 1024 vs 200), and it takes under 10 seconds to search for masks on a `CPU upgrade` instance (8 vCPU, 32GB RAM) of Huggingface space.
- [Fast Segment Everything with Text Prompt](https://huggingface.co/spaces/Annotation-AI/fast-segment-everything-with-text-prompt): This example based on Fast-Segment-Everything provides a text prompt that generates an attention map for the area you want to focus on.
- [Fast Segment Everything with Image Prompt](https://huggingface.co/spaces/Annotation-AI/fast-segment-everything-with-image-prompt): This example based on Fast-Segment-Everything provides an image prompt that generates an attention map for the area you want to focus on.
- [Fast Segment Everything with Drawing Prompt](https://huggingface.co/spaces/Annotation-AI/fast-segment-everything-with-drawing-prompt): This example based on Fast-Segment-Everything provides a drawing prompt that generates an attention map for the area you want to focus on.

## References
- https://github.com/facebookresearch/segment-anything
- https://github.com/openai/CLIP