An open API service indexing awesome lists of open source software.

https://github.com/iAsakiT3T/SHIFNet

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
https://github.com/iAsakiT3T/SHIFNet

Last synced: 2 months ago
JSON representation

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Awesome Lists containing this project

README

        

## Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
![framework](assets/framework.png)
SHIFNet is an innovative SAM2-driven Hybrid Interactive Fusion Paradigm designed for RGB-T perception tasks. This framework fully unlocks the potential of SAM2 through language-guided adaptation, effectively mitigating its inherent RGB bias and enhancing cross-modal semantic consistency. SHIFNet consists of two key components: (1) Semantic-Aware Cross-modal Fusion (SACF) module, which dynamically balances modality contributions through text-guided affinity learning, enabling adaptive cross-modal information integration; (2) Heterogeneous Prompting Decoder (HPD), which enhances global semantic understanding through a semantic enhancement module and category embeddings, ensuring cross-modal semantic consistency. With only 32.27M trainable parameters, SHIFNet achieves 89.8%, 67.8%, and 59.2% mIoU on PST900, FMB, and MFNet benchmarks, respectively, while attaining 76.5% pedestrian detection accuracy in safety-critical scenarios. By reducing the cost of large-scale data collection and enhancing multi-modal perception capabilities, SHIFNet provides a reliable perception foundation for intelligent robotic systems operating in complex environments.

## Visualization on FMB dataset
![vis](assets/vis.png)