An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by FoundationVision

A curated list of projects in awesome lists by FoundationVision .

https://github.com/foundationvision/var

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 10 Apr 2025

https://github.com/FoundationVision/VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 03 Apr 2025

https://github.com/foundationvision/llamagen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

auto-regressive-model diffusion diffusion-models image-generation llama llm text2image

Last synced: 15 May 2025

https://github.com/FoundationVision/LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

auto-regressive-model diffusion diffusion-models image-generation llama llm text2image

Last synced: 07 May 2025

https://github.com/foundationvision/vnext

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

instance-segmentation motion object-detection tracking transformer video-instance-segmentation

Last synced: 04 Apr 2025

https://github.com/FoundationVision/VNext

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

instance-segmentation motion object-detection tracking transformer video-instance-segmentation

Last synced: 07 Apr 2025

https://github.com/foundationvision/groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

foundation-models grounding large-language-models llama llama2 llm mllm multimodal vision-language-model

Last synced: 04 Apr 2025

https://github.com/foundationvision/omnitokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

auto-regressive-model image-generation tokenization vae video-generation vqvae

Last synced: 07 Apr 2025

https://github.com/foundationvision/uniref

[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces

object-segmentation unified-model

Last synced: 13 Apr 2025

https://github.com/FoundationVision/UniRef

[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces

object-segmentation unified-model

Last synced: 24 Jul 2025

https://github.com/foundationvision/generateu

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world

Last synced: 05 Apr 2025

https://github.com/foundationvision/vaex

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Last synced: 25 Aug 2025

https://github.com/FoundationVision/UniTok

A Unified Tokenizer for Visual Generation and Understanding

Last synced: 01 Apr 2025