Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/OpenGVLab/All-Seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
https://github.com/OpenGVLab/All-Seeing
all-seeing dataset region-text
Last synced: about 1 month ago
JSON representation
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
- Host: GitHub
- URL: https://github.com/OpenGVLab/All-Seeing
- Owner: OpenGVLab
- Created: 2023-08-03T06:49:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-09T06:39:51.000Z (5 months ago)
- Last Synced: 2024-08-09T07:50:04.909Z (5 months ago)
- Topics: all-seeing, dataset, region-text
- Language: Python
- Homepage: https://huggingface.co/spaces/OpenGVLab/all-seeing
- Size: 57.5 MB
- Stars: 433
- Watchers: 23
- Forks: 14
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Segment-Anything - [homepage
README
# The All-Seeing Project
This is the official implementation of the following papers:
- [The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World](https://arxiv.org/abs/2308.01907)
- [The All-Seeing Project V2: Towards General Relation Comprehension of the Open World](https://arxiv.org/abs/2402.19474)
> The name "All-Seeing" is derived from "The All-Seeing Eye", which means having complete knowledge, awareness, or insight into all aspects of existence. The logo is Millennium Puzzle, an artifact from the manga "Yu-Gi-Oh!")
## News and Updates 🚀🚀🚀
- `July 01, 2024`: All-Seeing Project v2 is accepted by ECCV 2024! Note that the [model](https://huggingface.co/OpenGVLab/ASMv2) and [data](https://huggingface.co/datasets/OpenGVLab/AS-V2) have already been released in huggingface.
- `Feb 28, 2024`: All-Seeing Project v2 is out! Our [**ASMv2**](https://huggingface.co/OpenGVLab/ASMv2) achieves state-of-the-art performance across a variety of image-level and region-level tasks! See [**here**](all-seeing-v2/README.md) for more details.
- `Feb 21, 2024`: [**ASM**](https://huggingface.co/OpenGVLab/ASM-FT), [**AS-Core**](https://huggingface.co/datasets/OpenGVLab/AS-Core), [**AS-10M**](https://huggingface.co/datasets/OpenGVLab/AS-V2/blob/main/as_pretrain_10m.json), [**AS-100M**](https://huggingface.co/datasets/OpenGVLab/AS-100M) is released!
- `Jan 16, 2024`: All-Seeing Project is accepted by ICLR 2024!
- `Aug 29, 2023`: [**All-Seeing Model Demo**](https://openxlab.org.cn/apps/detail/wangweiyun/AllSeeingModel) is available on the OpenXLab now!## Schedule
- [x] Release the ASMv2 model.
- [x] Release the AS-V2 dataset.
- [x] Release the ASM model.
- [ ] Release the full version of AS-1B.
- [x] Release AS-Core, which is the human-verified subset of AS-1B.
- [x] Release AS-100M, which is the 100M subset of AS-1B.
- [x] Release AS-10M, which is the 10M subset of AS-1B.
- [x] Online demo, including dataset browser and ASM online demo.## Introduction
### The All-Seeing Project [[Paper](https://arxiv.org/abs/2308.01907)][[Model](https://huggingface.co/OpenGVLab/ASM-FT)][[Dataset](https://huggingface.co/datasets/OpenGVLab/AS-100M)][[Code](all-seeing/README.md)][[Zhihu](https://zhuanlan.zhihu.com/p/686963813)][[Medium](https://ai.gopubby.com/the-all-seeing-project-towards-panoptic-visual-recognization-and-general-relation-comprehension-f76c2bde3e2c)]
[***All-Seeing 1B (AS-1B) dataset***](https://huggingface.co/datasets/OpenGVLab/AS-100M): we propose a new large-scale dataset (AS-1B) for open-world panoptic visual recognition and understanding, using an economical semi-automatic data engine that combines the power of off-the-shelf vision/language models and human feedback.
[***All-Seeing Model (ASM)***](https://huggingface.co/OpenGVLab/ASM-FT): we develop a unified vision-language foundation model (ASM) for open-world panoptic visual recognition and understanding. Aligning with LLMs, our ASM supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability.
### The All-Seeing Project V2 [[Paper](https://arxiv.org/abs/2402.19474)][[Model](https://huggingface.co/OpenGVLab/ASMv2)][[Dataset](https://huggingface.co/datasets/OpenGVLab/AS-V2)][[Code](all-seeing-v2/README.md)][[Zhihu](https://zhuanlan.zhihu.com/p/686963813)][[Medium](https://ai.gopubby.com/the-all-seeing-project-towards-panoptic-visual-recognization-and-general-relation-comprehension-f76c2bde3e2c)]
***[All-Seeing Dataset V2 (AS-V2) dataset](https://huggingface.co/datasets/OpenGVLab/AS-V2)***: we propose a novel task, termed Relation Conversation (ReC), which unifies the formulation of text generation, object localization, and relation comprehension. Based on the unified formulation, we construct the AS-V2 dataset, which consists of 127K high-quality relation conversation samples, to unlock the ReC capability for Multi-modal Large Language Models (MLLMs).
***[All-Seeing Model v2 (ASMv2)](https://huggingface.co/OpenGVLab/ASMv2)***: we develop ASMv2, which integrates the Relation Conversation ability while maintaining powerful general capabilities.
It is endowed with grounding and referring capabilities, exhibiting state-of-the-art performance on region-level tasks.
Furthermore, this model can be naturally adapted to the Scene Graph Generation task in an open-ended manner.***[Circular-based Relation Probing Evaluation (CRPE) benchmark](https://huggingface.co/datasets/OpenGVLab/CRPE)***: We construct a benchmark called Circular-based Relation Probing Evaluation (CRPE), which is the first benchmark that covers all elements of the relation triplets `(subject, predicate, object)`, providing a systematic platform for the evaluation of relation comprehension ability.
## License
This project is released under the [Apache 2.0 license](LICENSE).
## 🖊️ Citation
If you find this project useful in your research, please consider cite:
```BibTeX
@article{wang2023allseeing,
title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World},
author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others},
journal={arXiv preprint arXiv:2308.01907},
year={2023}
}
@article{wang2024allseeing_v2,
title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World},
author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others},
journal={arXiv preprint arXiv:2402.19474},
year={2024}
}
```