Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ClaudiaCuttano/SAMWISE
Official repository for the paper: "SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation"
https://github.com/ClaudiaCuttano/SAMWISE
Last synced: about 1 month ago
JSON representation
Official repository for the paper: "SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation"
- Host: GitHub
- URL: https://github.com/ClaudiaCuttano/SAMWISE
- Owner: ClaudiaCuttano
- License: apache-2.0
- Created: 2024-11-26T17:33:45.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-27T11:44:08.000Z (about 2 months ago)
- Last Synced: 2024-11-27T12:29:09.176Z (about 2 months ago)
- Size: 9.72 MB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation
[Claudia Cuttano](), [Gabriele Trivigno](), [Gabriele Rosi](), [Carlo Masone](), [Giuseppe Averta]()
Official repository for the paper: "SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation". In this work we build upon the Segment-Anything 2 (SAM2) model and make it wiser, by empowering it with natural language understanding and explicit temporal modeling at the feature extraction stage, without fine-tuning its weights, and without outsourcing modality interaction to external models. Our proposed method, SAMWISE, achieves state-of-the-art across various benchmarks, by adding a negligible overhead of just 4.2 M parameters.**[📄[arXiv]](https://arxiv.org/abs/2411.17646)**
🚀 **Code and Trained Models Coming Soon!** 🚀
Our proposed SAMWISE.### SAMWISE in Action 👀
Our approach integrates natural language knowledge and temporal cues for streaming-based Referring Video Segmentation (RVOS). We mitigate tracking bias—where the model may overlook an identifiable object while tracking another—through a learnable mechanism. This enables efficient streaming processing, leveraging memory from previous frames to maintain context and ensure accurate object segmentation.
SAMWISE for streaming-based RVOS.SAMWISE (our model, not the hobbit) segments objects from The Lord of the Rings in zero-shot—no extra training, just living up to its namesake! 🧙♂️✨
![Local GIF](./assets/video_four.gif)