Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks
https://github.com/uncbiag/Awesome-Foundation-Models

Last synced: 4 days ago
JSON representation

Survey
Papers by Date
- 2024
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 06/06 - AI/vision-lstm.svg?style=social&label=Star)](https://github.com/NX-AI/vision-lstm)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 04/14
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 01/19 - Anything.svg?style=social&label=Star)](https://github.com/LiheYoung/Depth-Anything)
  - 05/09 - VLLM/Lumina-T2X.svg?style=social&label=Star)](https://github.com/Alpha-VLLM/Lumina-T2X)
  - 05/08
  - 05/07
  - 05/03 - ai/reka-vibe-eval.svg?style=social&label=Star)](https://github.com/reka-ai/reka-vibe-eval)
  - 04/30
  - 04/26
  - 02/28 - ai-lab/Consistency_LLM.svg?style=social&label=Star)](https://github.com/hao-ai-lab/Consistency_LLM)
  - 05/17
  - 07/24
  - 07/17
  - 07/12
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 07/31
  - 07/29 - anything-2.svg?style=social&label=Star)](https://github.com/facebookresearch/segment-anything-2)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 03/14
  - 01/30 - CVC/YOLO-World.svg?style=social&label=Star)](https://github.com/AILab-CVC/YOLO-World)
  - 08/14
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 07/29 - anything-2.svg?style=social&label=Star)](https://github.com/facebookresearch/segment-anything-2)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 01/22 - AIMI/CheXagent.svg?style=social&label=Star)](https://github.com/Stanford-AIMI/CheXagent)
  - 01/19 - Anything.svg?style=social&label=Star)](https://github.com/LiheYoung/Depth-Anything)
  - 02/27
  - 09/25
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/25
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 02/06 - AutoML/MobileVLM.svg?style=social&label=Star)](https://github.com/Meituan-AutoML/MobileVLM)
  - 01/15
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 01/23
  - 01/30 - CVC/YOLO-World.svg?style=social&label=Star)](https://github.com/AILab-CVC/YOLO-World)
  - 03/18
  - 03/14
  - 03/09 - Chapel Hill) [![Star](https://img.shields.io/github/stars/uncbiag/uniGradICON.svg?style=social&label=Star)](https://github.com/uncbiag/uniGradICON)
  - 08/22
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 02/21
  - 02/20 - HPC-AI-Lab/Neural-Network-Diffusion.svg?style=social&label=Star)](https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion)
  - 01/16
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 04/02
  - 04/02
  - 04/10
  - 02/20
  - 02/19
  - 03/05
  - 03/01 - AutoML/VisionLLaMA.svg?style=social&label=Star)](https://github.com/Meituan-AutoML/VisionLLaMA)
  - 02/22
  - 03/22
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 09/30
  - 09/27
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 03/01
  - 05/21 - 024-02499-w)) [![Star](https://img.shields.io/github/stars/microsoft/BiomedParse.svg?style=social&label=Star)](https://github.com/microsoft/BiomedParse)
  - 05/14
  - 05/06
  - 10/01
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 09/18 - VL.svg?style=social&label=Star)](https://github.com/QwenLM/Qwen2-VL)
  - 09/18 - labs/moshi.svg?style=social&label=Star)](https://github.com/kyutai-labs/moshi)
  - 08/27
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 10/30
  - 10/21
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 11/14
  - 11/13
  - 11/07
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/04 - DiT.svg?style=social&label=Star)](https://github.com/YuchuanTian/U-DiT)
  - 10/30 - W/TokenFormer.svg?style=social&label=Star)](https://github.com/Haiyang-W/TokenFormer)
  - 06/10
  - 05/31
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/20 - models/octo.svg?style=social&label=Star)](https://github.com/octo-models/octo)
  - 10/04
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 10/02
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 10/10 - CVC/UniRepLKNet.svg?style=social&label=Star)](https://github.com/AILab-CVC/UniRepLKNet)
  - 06/24 - mllm/cambrian.svg?style=social&label=Star)](https://github.com/cambrian-mllm/cambrian)
  - 06/13 - 4m.svg?style=social&label=Star)](https://github.com/apple/ml-4m)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 10/31
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 11/21
  - 05/22 - gigapath/prov-gigapath.svg?style=social&label=Star)](https://github.com/prov-gigapath/prov-gigapath)
  - 04/02 - learning-for-vlm.svg?style=social&label=Star)](https://github.com/hellomuffin/iterated-learning-for-vlm)
- 2022
- 2023
  - Meta-Transformer: A Unified Framework for Multimodal Learning
  - Visual Instruction Tuning - Madison and Microsoft) [![Star](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg?style=social&label=Star)](https://github.com/haotian-liu/LLaVA)
  - Visual Instruction Tuning - Madison and Microsoft) [![Star](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg?style=social&label=Star)](https://github.com/haotian-liu/LLaVA)
  - Visual Prompt Multi-Modal Tracking
  - Tracking Everything Everywhere All at Once
  - Foundation Models for Generalist Geospatial Artificial Intelligence
  - InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
  - The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
  - Meta-Transformer: A Unified Framework for Multimodal Learning
  - Retentive Network: A Successor to Transformer for Large Language Models
  - Neural World Models for Computer Vision
  - Recognize Anything: A Strong Image Tagging Model
  - Towards Visual Foundation Models of Physical Scenes - purpose visual representations of physical scenes
  - LIMA: Less Is More for Alignment
  - PaLM 2 Technical Report
  - IMAGEBIND: One Embedding Space To Bind Them All
  - Images Speak in Images: A Generalist Painter for In-Context Visual Learning
  - UniDector: Detecting Everything in the Open World: Towards Universal Object Detection
  - Unmasked Teacher: Towards Training-Efficient Video Foundation Models
  - Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
  - EVA-CLIP: Improved Training Techniques for CLIP at Scale
  - EVA-02: A Visual Representation for Neon Genesis
  - EVA-01: Exploring the Limits of Masked Visual Representation Learning at Scale
  - LLaMA: Open and Efficient Foundation Language Models
  - The effectiveness of MAE pre-pretraining for billion-scale pretraining
  - BloombergGPT: A Large Language Model for Finance
  - BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  - Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
  - UNINEXT: Universal Instance Perception as Object Discovery and Retrieval
  - InternVideo: General Video Foundation Models via Generative and Discriminative Learning
  - InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
  - BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
  - Foundation Models for Generalist Geospatial Artificial Intelligence
  - InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
  - Retentive Network: A Successor to Transformer for Large Language Models
  - Recognize Anything: A Strong Image Tagging Model
  - IMAGEBIND: One Embedding Space To Bind Them All
  - SegGPT: Segmenting Everything In Context
  - Images Speak in Images: A Generalist Painter for In-Context Visual Learning
  - UniDector: Detecting Everything in the Open World: Towards Universal Object Detection
  - Unmasked Teacher: Towards Training-Efficient Video Foundation Models
  - Visual Prompt Multi-Modal Tracking
  - Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
  - EVA-CLIP: Improved Training Techniques for CLIP at Scale
  - EVA-02: A Visual Representation for Neon Genesis
  - LLaMA: Open and Efficient Foundation Language Models
  - The effectiveness of MAE pre-pretraining for billion-scale pretraining
  - BloombergGPT: A Large Language Model for Finance
  - BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  - UNINEXT: Universal Instance Perception as Object Discovery and Retrieval
  - InternVideo: General Video Foundation Models via Generative and Discriminative Learning
  - InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
  - BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
  - SAM: Segment Anything - anything.svg?style=social&label=Star)](https://github.com/facebookresearch/segment-anything)
  - BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
  - Neural World Models for Computer Vision
  - Towards Visual Foundation Models of Physical Scenes - purpose visual representations of physical scenes
  - Mamba: Linear-Time Sequence Modeling with Selective State Spaces - sized Transformers while scaling linearly with sequence length. from CMU)
  - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
  - FLIP: Scaling Language-Image Pre-training via Masking
  - GPT-4 Technical Report
  - The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
  - SEEM: Segment Everything Everywhere All at Once - Madison, HKUST, and Microsoft) [![Star](https://img.shields.io/github/stars/UX-Decoder/Segment-Everything-Everywhere-All-At-Once.svg?style=social&label=Star)](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once)
  - BioCLIP: A Vision Foundation Model for the Tree of Life
  - LIMA: Less Is More for Alignment
  - LLaMA 2: Open Foundation and Fine-Tuned Chat Models
  - Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
- 2021
  - Unifying Vision-and-Language Tasks via Text Generation - Chapel Hill)
  - Unifying Vision-and-Language Tasks via Text Generation - Chapel Hill)
  - UniT: Multimodal Multitask Learning with a Unified Transformer
  - WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training - scale Chinese multimodal pre-training model called BriVL; from Renmin University of China)
  - Codex: Evaluating Large Language Models Trained on Code
  - Florence: A New Foundation Model for Computer Vision
  - DALL-E: Zero-Shot Text-to-Image Generation
  - Multimodal Few-Shot Learning with Frozen Language Models
  - Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
  - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - attention blocks; ICLR, from Google)
  - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - attention blocks; ICLR, from Google)
  - WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training - scale Chinese multimodal pre-training model called BriVL; from Renmin University of China)
  - ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
  - DALL-E: Zero-Shot Text-to-Image Generation
  - Codex: Evaluating Large Language Models Trained on Code
- Before 2021
  - UNITER: UNiversal Image-TExt Representation Learning
  - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  - Attention Is All You Need
  - LXMERT: Learning Cross-Modality Encoder Representations from Transformers - Chapel Hill)
  - GPT-3: Language Models are Few-Shot Learners - context learning compared with GPT-2; from OpenAI)
Topics
- Large Language Models (LLM)
  - LLaMA 2: Open Foundation and Fine-Tuned Chat Models
  - GPT-3: Language Models are Few-Shot Learners - context learning compared with GPT-2; from OpenAI)
- Training Efficiency
  - The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Towards Artificial General Intelligence (AGI)
  - Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
- Large Language Models
  - GPT-4 Technical Report
- Perception Tasks: Detection, Segmentation, and Pose Estimation
  - SEEM: Segment Everything Everywhere All at Once - Madison, HKUST, and Microsoft)
  - SegGPT: Segmenting Everything In Context
- Vision-Language Pretraining
  - BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models - the-shelf frozen vision and language models. from Salesforce Research)
Papers by Topic
- Large Benchmarks
- Large Language/Multimodal Models
  - GPT-3: Language Models are Few-Shot Learners - context learning compared with GPT-2; from OpenAI)
  - GPT-2: Language Models are Unsupervised Multitask Learners
  - LLaVA: Visual Instruction Tuning - Madison) [![Star](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg?style=social&label=Star)](https://github.com/haotian-liu/LLaVA)
  - MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/MiniGPT-4.svg?style=social&label=Star)](https://github.com/Vision-CAIR/MiniGPT-4)
  - GPT: Improving Language Understanding by Generative Pre-Training
  - T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - research/text-to-text-transfer-transformer.svg?style=social&label=Star)](https://github.com/google-research/text-to-text-transfer-transformer)
- Vision-Language Pretraining
- AI Safety and Responsibility
  - Bounding the probability of harm from an AI to create a guardrail
  - Managing Extreme AI Risks amid Rapid Progress
- Linear Attention
  - FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning - AILab/flash-attention.svg?style=social&label=Star)](https://github.com/Dao-AILab/flash-attention)
  - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - AILab/flash-attention.svg?style=social&label=Star)](https://github.com/Dao-AILab/flash-attention)
Related Awesome Repositories
- AI Safety and Responsibility
  - Awesome-Diffusion-Models - usion/Awesome-Diffusion-Models.svg?style=social&label=Star)](https://github.com/diff-usion/Awesome-Diffusion-Models)
  - Awesome-Video-Diffusion-Models - Video-Diffusion-Models.svg?style=social&label=Star)](https://github.com/ChenHsing/Awesome-Video-Diffusion-Models)
  - Awesome-Diffusion-Model-Based-Image-Editing-Methods - Diffusion-Model-Based-Image-Editing-Methods.svg?style=social&label=Star)](https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods)
  - Awesome-CV-Foundational-Models - CV-Foundational-Models.svg?style=social&label=Star)](https://github.com/awaisrauf/Awesome-CV-Foundational-Models)
  - Awesome-Healthcare-Foundation-Models - Qiu/Awesome-Healthcare-Foundation-Models.svg?style=social&label=Star)](https://github.com/Jianing-Qiu/Awesome-Healthcare-Foundation-Models)
  - awesome-large-multimodal-agents - large-multimodal-agents.svg?style=social&label=Star)](https://github.com/jun0wanan/awesome-large-multimodal-agents)
  - Computer Vision in the Wild (CVinW) - Vision-in-the-Wild/CVinW_Readings.svg?style=social&label=Star)](https://github.com/Computer-Vision-in-the-Wild/CVinW_Readings)

Programming Languages

Jupyter Notebook 1 HTML 1

Categories

Papers by Date 315 Survey 108 Papers by Topic 18 Topics 8 Related Awesome Repositories 7

Sub Categories

2024 123 2022 123 2023 73 Before 2024 69 2021 15 AI Safety and Responsibility 9 Large Language/Multimodal Models 6 Vision-Language Pretraining 5 Before 2021 5 Large Benchmarks 4 Large Language Models (LLM) 2 Linear Attention 2 Perception Tasks: Detection, Segmentation, and Pose Estimation 2 Towards Artificial General Intelligence (AGI) 1 Training Efficiency 1 Large Language Models 1

Keywords

diffusion-models 2 video-editing 1 video-diffusion-model 1 video-diffusion 1 video 1 text-to-video 1 survey 1 diffusion 1 awesome-list 1 awesome 1 score-matching 1 score-based 1 machine-learning 1 generative-model 1 artificial-intelligence 1