https://xk-huang.github.io/segment-caption-anything/

[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gradio demo that show how to use the model.
https://xk-huang.github.io/segment-caption-anything/

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://xk-huang.github.io/segment-caption-anything/
Owner: xk-huang
License: apache-2.0
Created: 2023-11-17T14:10:41.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-23T22:54:27.000Z (10 months ago)
Last Synced: 2024-08-23T23:51:01.787Z (10 months ago)
Language: Python
Homepage: https://xk-huang.github.io/segment-caption-anything/
Size: 78.9 MB
Stars: 176
Watchers: 7
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Segment-Anything - [code

README

        # Segment and Caption Anything

The repository contains the official implementation of "Segment and Caption Anything"

[Project Page](https://xk-huang.github.io/segment-caption-anything), [Paper](https://arxiv.org/abs/2312.00869)

![teaser](./docs/teaser-github.svg)

tl;dr

1. Despite the absence of semantic labels in the training data, SAM implies high-level semantics sufficient for captioning. 

2. SCA (b) is a lightweight augmentation of SAM (a) with the ability to generate regional captions.

3. On top of SAM architecture, we add a fixed pre-trained language mode, and a optimizable lightweight hybrid feature mixture whose training is cheap and scalable.

  

    

    

  

  

    

    

  

News

- [01/31/2024] Update the [paper](https://xk-huang.github.io/segment-caption-anything/files/segment-caption-anything.013124.pdf) and the [supp](https://xk-huang.github.io/segment-caption-anything/files/segment-caption-anything-supp.013124.pdf). Release code v0.0.2: bump transformers to 4.36.2, support mistral series, phi-2, zephyr; add experiments about SAM+Image Captioner+[V-CoT](https://github.com/ttengwang/Caption-Anything), and more. 

- [12/05/2023] Release paper, code v0.0.1, and project page!

## Environment Preparation

Please check [docs/ENV.md](docs/ENV.md).

## Model Zoo

Please check [docs/MODEL_ZOO.md](docs/MODEL_ZOO.md)

## Gradio Demo

Please check [docs/DEMO.md](docs/DEMO.md)

## Running Training and Inference

Please check [docs/USAGE.md](docs/USAGE.md).

## Experiments and Evaluation

Please check [docs/EVAL.md](docs/EVAL.md)

## License

The trained weights are licensed under the [Apache 2.0 license](https://github.com/xk-huang/segment-caption-anything/blob/1c810bfcfeb3b95cd4b1f502f8f30c46333d58b8/LICENSE).

## Acknowledgement

Deeply appreciate these wonderful open source projects: [transformers](https://github.com/huggingface/transformers), [accelerate](https://github.com/huggingface/accelerate), [deepspeed](https://github.com/microsoft/DeepSpeed), [detectron2](https://github.com/facebookresearch/detectron2), [hydra](https://github.com/facebookresearch/hydra), [timm](https://github.com/huggingface/pytorch-image-models), [gradio](https://github.com/gradio-app/gradio).

## Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

```

@inproceedings{huang2024segment,

  title={Segment and caption anything},

  author={Huang, Xiaoke and Wang, Jianfeng and Tang, Yansong and Zhang, Zheng and Hu, Han and Lu, Jiwen and Wang, Lijuan and Liu, Zicheng},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  pages={13405--13417},

  year={2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://xk-huang.github.io/segment-caption-anything/

Awesome Lists containing this project

README