Projects in Awesome Lists by FoundationVision
A curated list of projects in awesome lists by FoundationVision .
https://github.com/foundationvision/var
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer
Last synced: 10 Apr 2025
https://github.com/FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer
Last synced: 03 Apr 2025
https://github.com/foundationvision/llamagen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
auto-regressive-model diffusion diffusion-models image-generation llama llm text2image
Last synced: 15 May 2025
https://github.com/FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
auto-regressive-model diffusion diffusion-models image-generation llama llm text2image
Last synced: 07 May 2025
https://github.com/foundationvision/glee
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
foundation-model interactive-segmentation object-detection open-vocabulary-detection open-vocabulary-segmentation open-vocabulary-video-segmentation open-world referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation segment-anything tracking video-instance-segmentation video-object-segmentation zero-shot-object-detection
Last synced: 15 May 2025
https://github.com/FoundationVision/GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
foundation-model interactive-segmentation object-detection open-vocabulary-detection open-vocabulary-segmentation open-vocabulary-video-segmentation open-world referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation segment-anything tracking video-instance-segmentation video-object-segmentation zero-shot-object-detection
Last synced: 19 Jul 2025
https://github.com/foundationvision/vnext
Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))
instance-segmentation motion object-detection tracking transformer video-instance-segmentation
Last synced: 04 Apr 2025
https://github.com/FoundationVision/VNext
Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))
instance-segmentation motion object-detection tracking transformer video-instance-segmentation
Last synced: 07 Apr 2025
https://github.com/foundationvision/groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
foundation-models grounding large-language-models llama llama2 llm mllm multimodal vision-language-model
Last synced: 04 Apr 2025
https://github.com/foundationvision/omnitokenizer
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
auto-regressive-model image-generation tokenization vae video-generation vqvae
Last synced: 07 Apr 2025
https://github.com/foundationvision/uniref
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
object-segmentation unified-model
Last synced: 13 Apr 2025
https://github.com/FoundationVision/UniRef
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
object-segmentation unified-model
Last synced: 24 Jul 2025
https://github.com/foundationvision/generateu
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world
Last synced: 05 Apr 2025
https://github.com/foundationvision/vaex
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
Last synced: 25 Aug 2025
https://github.com/FoundationVision/UniTok
A Unified Tokenizer for Visual Generation and Understanding
Last synced: 01 Apr 2025