Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models
- **A Survey of Reasoning with Foundation Models**
- Chuanyang Zheng
- Enze Xie
- Zhengying Liu
- Hongyang Li
- Wenhai Wang
- Jie Fu
- Junxian He
- Wu Yuan
- Qi Liu
- Xihui Liu
- Yu Li
- Hao Dong
- Yu Cheng
- Ming Zhang
- Pheng Ann Heng
- Jifeng Dai
- Ping Luo
- Jingdong Wang
- Ji-Rong Wen
- Xipeng Qiu
- Yike Guo
- Hui Xiong
- Qun Liu
- Zhenguo Li
- [arXiv
- [Link
- [arXiv
- [Link
- [arXiv
- [Tutorial
- [arXiv
- [Link
- [arXiv
- [Link
- [arXiv
- [Link
- [arXiv
- [Link
- [arXiv
- [Paper
- [Link
- [arXiv
- [Paper
- [Link
- [arXiv
- [Paper
- [Link
- [Link
- Mistral 7B
- [Paper
- [Code
- Qwen Technical Report
- [Paper
- [Code
- [Project
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- [Paper
- [Code
- [Blog
- [Paper
- [Code
- [Project
- PaLM 2 Technical Report
- PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
- [Paper
- [Blog
- [Code
- GPT-4 Technical Report
- [Paper
- [Blog
- LLaMA: Open and Efficient Foundation Language Models
- [Paper
- [Code
- [Blog
- [Blog
- PaLM: Scaling Language Modeling with Pathways
- [Paper
- [Blog
- Finetuned Language Models Are Zero-Shot Learners
- Evaluating Large Language Models Trained on Code
- Language Models are Few-Shot Learners
- [Paper
- [Code
- PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
- [Paper
- [Code
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- [Paper
- [Code
- [Blog
- citations
- Star
- [arXiv
- [paper
- [code
- [project
- citations
- Star
- [arXiv
- [paper
- [code
- Explain Any Concept: Segment Anything Meets Concept-Based Explanation
- [Paper
- [Code
- Segment and Track Anything
- [Paper
- [Code
- SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
- [Paper
- [Code
- Edit Everything: A Text-Guided Generative System for Images Editing
- [Paper
- [Code
- Inpaint Anything: Segment Anything Meets Image Inpainting
- [Paper
- [Code
- citations
- Star
- [arXiv
- [paper
- [code
- [blog
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- [Paper
- [Code
- citations
- Star
- [arXiv
- [paper
- [code
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- [Paper
- [Code
- citations
- Star
- [arXiv
- [paper
- [code
- [stable diffusion
- Star
- Resolution-robust Large Mask Inpainting with Fourier Convolutions
- [Paper
- [Code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- [arXiv
- [paper
- [Implementation
- [code
- [blog
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- Gemini: A Family of Highly Capable Multimodal Models
- [Paper
- [Project
- citations
- [arXiv
- [paper
- [code
- [project
- [Paper
- [Blog
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- [Paper
- [Code
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- [Paper
- [Code
- Caption Anything: Interactive Image Description with Diverse Multimodal Controls
- [Paper
- [Code
- Scalable Mask Annotation for Video Text Spotting
- [Paper
- [Code
- Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models
- [Paper
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
- Visual Instruction Tuning
- [Paper
- [Code
- [Project
- CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
- [Paper
- [Code
- One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
- GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
- [Paper
- [Code
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- [Paper
- [Code
- From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
- CoCa: Contrastive Captioners are Image-Text Foundation Models
- [Paper
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- [Paper
- [Code
- Learning to Prompt for Vision-Language Models
- [Paper
- [Code
- Learning Transferable Visual Models From Natural Language Supervision
- [Paper
- [Code
- [Blog
- Solving Quantitative Reasoning Problems with Language Models
- [Paper
- [Blog
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- [Paper
- [Code
- Large Language Models are Zero-Shot Reasoners
- [Paper
- [Code
- STaR: Bootstrapping Reasoning With Reasoning
- [Paper
- [Code
- MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving
- [Paper
- [Code
- Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
- [Paper
- [Code
- Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
- [Paper
- [Code
- Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
- [Paper
- [Code
- [Project
- [Paper
- [Code
- Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
- [Paper
- [Code
- [Project
- Language Models of Code are Few-Shot Commonsense Learners
- [Paper
- [Code
- A Systematic Investigation of Commonsense Knowledge in Large Language Models
- [Paper
- Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense
- [Paper
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning
- [Paper
- [Code
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
- [Paper
- [Code
- [Project
- ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
- [Paper
- [Project
- NEWTON: Are Large Language Models Capable of Physical Reasoning?
- [Paper
- [Code
- [Project
- PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
- [Paper
- [Code
- Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
- [Paper
- [Code
- ESPRIT: Explaining Solutions to Physical Reasoning Tasks
- [Paper
- [Code
- PIQA: Reasoning about Physical Commonsense in Natural Language
- [Paper
- [Project
- citations
- [arXiv
- [paper
- [project
- Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
- [Paper
- [Code
- PROST: Physical Reasoning of Objects through Space and Time
- [Paper
- [Code
- GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
- [Paper
- [Project
- Probing Physical Reasoning with Counter-Commonsense Context
- LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
- [Paper
- [Code
- UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
- [Paper
- Differentiable Open-Ended Commonsense Reasoning
- CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
- Abductive Commonsense Reasoning
- PHYRE: A New Benchmark for Physical Reasoning
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale
- MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
- HellaSwag: Can a Machine Really Finish Your Sentence?
- SocialIQA: Commonsense Reasoning about Social Interactions
- [Paper
- SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
- [Paper
- MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models
- [Paper
- [Code
- [Project
- [Paper
- MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
- MultiModalQA: Complex Question Answering over Text, Tables and Images
- Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
- Deep Learning in Neural Networks: An Overview
- [Paper
- [Paper
- [Paper
- Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Are NLP Models really able to Solve Simple Math Word Problems?
- [Paper
- [Code
- Measuring Mathematical Problem Solving With the MATH Dataset
- How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation
- [Paper
- Learn to Solve Algebra Word Problems Using Quadratic Programming
- [Paper
- Learning to Automatically Solve Algebra Word Problems
- [Paper
- [Paper
- [Code
- [Blog
- UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
- GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
- Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
- Solving Geometry Problems: Combining Text and Diagram Interpretation
- [Paper
- LEGO-Prover: Neural Theorem Proving with Growing Libraries
- Lyra: Orchestrating Dual Correction in Automated Theorem Proving
- DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function
- [Paper
- Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
- Magnushammer: A Transformer-based Approach to Premise Selection
- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
- Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis
- Autoformalization with Large Language Models
- [Paper
- HyperTree Proof Search for Neural Theorem Proving
- Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
- Formal Mathematics Statement Curriculum Learning
- The Lean 4 Theorem Prover and Programming Language
- TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
- Proof Artifact Co-training for Theorem Proving with Language Models
- Generative Language Modeling for Automated Theorem Proving
- Formal Verification of Hardware Components in Critical Systems
- [Paper
- [Paper
- Learning to Prove Theorems via Interacting with Proof Assistants
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- [Paper
- TacticToe: Learning to Prove with Tactics
- [Paper
- [Paper
- [Slides
- [Paper
- [Paper
- [Project
- [Paper
- SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
- ScienceWorld: Is your Agent Smarter than a 5th Grader?
- [Book
- Star
- [code
- Guiding Mathematical Reasoning via Mastering Commonsense Formula Knowledge
- [Paper
- ARB: Advanced Reasoning Benchmark for Large Language Models
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
- TheoremQA: A Theorem-driven Question Answering dataset
- Language Models are Multilingual Chain-of-Thought Reasoners
- [Paper
- [Code
- Training Verifiers to Solve Math Word Problems
- [Paper
- [Code
- [Blog
- IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
- FinQA: A Dataset of Numerical Reasoning over Financial Data
- Program Synthesis with Large Language Models
- HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
- Evaluating Large Language Models Trained on Code
- A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
- AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry
- Measuring Coding Challenge Competence With APPS
- TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
- Are NLP Models really able to Solve Simple Math Word Problems?
- TSQA: Tabular Scenario Based Question Answering
- Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems
- HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
- DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
- Natural Questions: A Benchmark for Question Answering Research
- [Paper
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
- The Web as a Knowledge-base for Answering Complex Questions
- Variational Reasoning for Question Answering with Knowledge Graph
- From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
- [Paper
- Deep Neural Solver for Math Word Problems
- [Paper
- Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
- Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks
- [Paper
- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
- [Paper
- Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems
- The Value of Semantic Parse Labeling for Knowledge Base Question Answering
- [Paper
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Key-Value Memory Networks for Directly Reading Documents
- MAWPS: A Math Word Problem Repository
- [Paper
- Automatically Solving Number Word Problems by Semantic Parsing and Reasoning
- [Paper
- Compositional Semantic Parsing on Semi-Structured Tables
- Parsing Algebraic Word Problems into Equations
- [Paper
- [Paper
- Learning to Solve Arithmetic Word Problems with Verb Categorization
- [Paper
- Semantic Parsing on Freebase from Question-Answer Pairs
- [Paper
- Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
- [Paper
- Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions
- [Paper
- The ATIS Spoken Language Systems Pilot Corpus
- [Paper
- Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
- LogicLLM: Exploring Self-supervised Logic-enhanced Training for Large Language Models
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
- Explicit Planning Helps Language Models in Logical Reasoning
- Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
- Weakly Supervised Neural Symbolic Learning for Cognitive Tasks
- [Paper
- NeuPSL: Neural Probabilistic Soft Logic
- Generating Natural Language Proofs with Verifier-Guided Search
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
- MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design
- [Paper
- Neural probabilistic logic programming in DeepProbLog
- [Paper
- Abductive Learning with Ground Knowledge Base
- [Paper
- Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text
- Transformers as Soft Reasoners over Language
- Neural Module Networks for Reasoning over Text
- The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
- FOLIO: Natural Language Reasoning with First-Order Logic
- ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
- Causal Parrots: Large Language Models May Talk Causality But Are Not Causal
- Causal Discovery with Language Models as Imperfect Experts
- From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data
- Can Large Language Models Infer Causation from Correlation?
- The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
- Understanding Causality with Large Language Models: Feasibility and Opportunities
- Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
- Can large language models build causal graphs?
- Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis
- Probing for Correlations of Causal Facts: Large Language Models and Causality
- [Paper
- Can Large Language Models Distinguish Cause from Effect?
- [Paper
- Learning Faithful Representations of Causal Graphs
- [Paper
- InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance
- [Paper
- Towards Causal Representation Learning
- CausaLM: Causal Model Explanation Through Counterfactual Language Models
- Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation
- Elements of Causal Inference: Foundations and Learning Algorithms
- [Book
- [Book
- [Paper
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
- Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios
- [Paper
- [Paper
- CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
- Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere
- [Paper
- Distinguishing cause from effect using observational data: methods and benchmarks
- Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
- VLGrammar: Grounded Grammar Induction of Vision and Language
- Attention over learned object embeddings enables complex visual reasoning
- PointLLM: Empowering Large Language Models to Understand Point Clouds
- 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
- 3D-LLM: Injecting the 3D World into Large Language Models
- SQA3D: Situated Question Answering in 3D Scenes
- PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
- OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
- M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
- [Paper
- [Code
- Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
- [Paper
- [Code
- Self-Supervised Speech Representation Learning: A Review
- SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
- data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
- SUPERB: Speech processing Universal PERformance Benchmark
- Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Conformer: Convolution-augmented Transformer for Speech Recognition
- Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
- An Unsupervised Autoregressive Model for Speech Representation Learning
- Representation Learning with Contrastive Predictive Coding
- [Paper
- Neural Discrete Representation Learning
- Large-Scale Domain Adaptation via Teacher-Student Learning
- SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
- XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
- SUPERB: Speech processing Universal PERformance Benchmark
- MLS: A Large-Scale Multilingual Dataset for Speech Research
- A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
- Libri-Light: A Benchmark for ASR with Limited or No Supervision
- Common Voice: A Massively-Multilingual Speech Corpus
- [Paper
- [Project
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- [Paper
- [Code
- [Paper
- [Project
- Kosmos-2: Grounding Multimodal Large Language Models to the World
- BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
- Language Is Not All You Need: Aligning Perception with Language Models
- [Project
- Flamingo: a Visual Language Model for Few-Shot Learning
- MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
- Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
- [Paper
- [Code
- DetGPT: Detect What You Need via Reasoning
- [Blog
- [Code
- DePlot: One-shot visual language reasoning by plot-to-table translation
- MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
- LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
- LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
- On Evaluating Adversarial Robustness of Large Vision-Language Models
- Evaluating Object Hallucination in Large Vision-Language Models
- On the Hidden Mystery of OCR in Large Multimodal Models
- Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
- Evaluating Understanding on Conceptual Abstraction Benchmarks
- Communicating Natural Programs to Humans and Machines
- CIDEr: Consensus-based Image Description Evaluation
- citations
- [arXiv
- [paper
- [project
- Vision-Language Foundation Models as Effective Robot Imitators
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- Reasoning with Language Model is Planning with World Model
- citations
- [arXiv
- [paper
- [project
- RT-1: Robotics Transformer for Real-World Control at Scale
- Skill Induction and Planning with Latent Language
- A Generalist Agent
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
- Pre-Trained Language Models for Interactive Decision-Making
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning
- Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
- [Paper
- [Paper
- PAL: Program-aided Language Models
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
- Code as Policies: Language Model Programs for Embodied Control
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
- [Paper
- Statler: State-Maintaining Language Models for Embodied Reasoning
- Collaborating with language models for embodied reasoning
- Toolformer: Language Models Can Teach Themselves to Use Tools
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
- ReAct: Synergizing Reasoning and Acting in Language Models
- Measuring and Narrowing the Compositionality Gap in Language Models
- Inner Monologue: Embodied Reasoning through Planning with Language Models
- Federated Large Language Model: A Position Paper
- Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems
- Building Cooperative Embodied Agents Modularly with Large Language Models
- Improving Factuality and Reasoning in Language Models through Multiagent Debate
- [Paper
- [Book
- DriveLM: Driving with Graph Visual Question Answering
- [Paper
- [Code
- LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
- [Paper
- [Project
- DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
- [Paper
- [Code
- LMDrive: Closed-Loop End-to-End Driving with Large Language Models
- [Paper
- [Code
- Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
- Vision Language Models in Autonomous Driving and Intelligent Transportation Systems
- DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
- MotionLM: Multi-Agent Motion Forecasting as Language Modeling
- End-to-end Autonomous Driving: Challenges and Frontiers
- Graph-based Topology Reasoning for Driving Scenes
- Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
- [Paper
- DriveLM: Driving with Graph Visual Question Answering
- [Paper
- [Code
- Language Prompt for Autonomous Driving
- Language Conditioned Traffic Generation
- NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
- BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
- iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
- Habitat 2.0: Training Home Assistants to Rearrange their Habitat
- RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
- Grounding Human-to-Vehicle Advice for Self-driving Vehicles
- Habitat: A Platform for Embodied AI Research
- Gibson Env: Real-World Perception for Embodied Agents
- VirtualHome: Simulating Household Activities via Programs
- Theory of Mind Might Have Spontaneously Emerged in Large Language Models
- [Paper
- [Paper
- Large Language Models Are Not Strong Abstract Reasoners
- BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information
- Think about it! Improving defeasible reasoning by first modeling the question scenario
- Thinking Like a Skeptic: Defeasible Inference in Natural Language
- [Paper
- KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion
- [Paper
- citations
- Star
- [arXiv
- [paper
- [code
- [project
- [huggingface
- citations
- [medRxiv
- [paper
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- [arXiv
- [paper
- citations
- [arXiv
- [paper
- citations
- Star
- [paper
- [code
- citations
- [arXiv
- [paper
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- Star
- [arXiv
- [paper
- [code
- citations
- [arXiv
- [paper
- citations
- [arXiv
- [paper
- Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
- Uni-RNA: Universal Pre-Trained Models Revolutionize RNA Research
- [Paper
- HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
- DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins
- GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information
- [News
- [Paper
- ProGen2: Exploring the Boundaries of Protein Language Models
- [Paper
- Large Language Models Are Reasoning Teachers
- PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
- [Code
- The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
- [Code
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling
- Recipes for building an open-domain chatbot
- CLUE: A Chinese Language Understanding Evaluation Benchmark
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Complexity of Word Collocation Networks: A Preliminary Structural Analysis
- MOFI: Learning Image Representations from Noisy Entity Annotated Images
- Revisiting Weakly Supervised Pre-Training of Visual Perception Models
- ImageNet-21K Pretraining for the Masses
- Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
- ImageNet Large Scale Visual Recognition Challenge
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
- ImageBind: One Embedding Space To Bind Them All
- DataComp: In search of the next generation of multimodal datasets
- LAION-5B: An open large-scale dataset for training next generation image-text models
- Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
- [Code
- Flamingo: a Visual Language Model for Few-Shot Learning
- RedCaps: web-curated image-text data created by the people, for the people
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
- WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
- Im2Text: Describing Images Using 1 Million Captioned Photographs
- Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- [Paper
- [Code
- [Blog
- Attention Is All You Need
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- [Paper
- [Code
- [Blog
- LLaMA: Open and Efficient Foundation Language Models
- [Paper
- [Code
- [Blog
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- GLM-130B: An Open Bilingual Pre-trained Model
- OPT: Open Pre-trained Transformer Language Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Language Models are Few-Shot Learners
- [Paper
- [Code
- [Paper
- [Paper
- Improving CLIP Training with Language Rewrites
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- Scaling Language-Image Pre-training via Masking
- DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
- K-LITE: Learning Transferable Visual Models with External Knowledge
- FILIP: Fine-grained Interactive Language-Image Pre-Training
- Learning Transferable Visual Models From Natural Language Supervision
- [Paper
- [Code
- [Blog
- Efficient Streaming Language Models with Attention Sinks
- Retentive Network: A Successor to Transformer for Large Language Models
- LongNet: Scaling Transformers to 1,000,000,000 Tokens
- RWKV: Reinventing RNNs for the Transformer Era
- Hyena Hierarchy: Towards Larger Convolutional Language Models
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- Long Range Language Modeling via Gated State Spaces
- Diagonal State Spaces are as Effective as Structured State Spaces
- Efficiently Modeling Long Sequences with Structured State Spaces
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
- MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
- WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
- Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
- Let's Verify Step by Step
- Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
- Specializing Smaller Language Models towards Multi-Step Reasoning
- Large Language Models Are Reasoning Teachers
- Teaching Small Language Models to Reason
- Large Language Models Can Self-Improve
- Explanations from Large Language Models Make Small Reasoners Better
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
- AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
- Towards a Unified View of Parameter-Efficient Transfer Learning
- Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
- MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
- Parameter-Efficient Transfer Learning for NLP
- citations
- Star
- [arXiv
- [paper
- [code
- QLoRA: Efficient Finetuning of Quantized LLMs
- Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
- KronA: Parameter Efficient Tuning with Kronecker Adapter
- DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
- LoRA: Low-Rank Adaptation of Large Language Models
- P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
- The Power of Scale for Parameter-Efficient Prompt Tuning
- Factual Probing Is [MASK
- GPT Understands, Too
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
- Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
- Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Improved Baselines with Visual Instruction Tuning
- Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
- LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
- Visual Instruction Tuning
- Towards Efficient Visual Adaption via Structural Re-parameterization
- [Code
- LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction
- Chinese Open Instruction Generalist: A Preliminary Release
- OpenAssistant Conversations -- Democratizing Large Language Model Alignment
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Crosslingual Generalization through Multitask Finetuning
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
- ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
- MetaICL: Learning to Learn In Context
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
- Cross-Task Generalization via Natural Language Crowdsourcing Instructions
- UnifiedQA: Crossing Format Boundaries With a Single QA System
- Self-Alignment with Instruction Backtranslation
- Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
- Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
- The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
- CoEdIT: Text Editing by Task-Specific Instruction Tuning
- LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
- Instruction Tuning with GPT-4
- [Blog
- [Blog
- [Code
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- Self-Instruct: Aligning Language Models with Self-Generated Instructions
- Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
- Fine-Tuning Language Models with Advantage-Induced Policy Alignment
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- Training language models to follow instructions with human feedback
- Preference Ranking Optimization for Human Alignment
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- RRHF: Rank Responses to Align Language Models with Human Feedback without tears
- Calibrating Sequence likelihood Improves Conditional Language Generation
- citations
- Star
- [arXiv
- [paper
- [code
- An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
- Mixed Autoencoder for Self-supervised Visual Representation Learning
- Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners
- MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Go Wider Instead of Deeper
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- [Paper
- Scaling Instruction-Finetuned Language Models
- Language Models are Few-Shot Learners
- [Paper
- [Code
- Diverse Demonstrations Improve In-context Compositional Generalization
- Complementary Explanations for Effective In-Context Learning
- Automatic Chain of Thought Prompting in Large Language Models
- Complexity-Based Prompting for Multi-Step Reasoning
- Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation
- [Paper
- Selective Annotation Makes Language Models Better Few-Shot Learners
- What Makes Good In-Context Examples for GPT-3?
- DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
- Learning to Retrieve In-Context Examples for Large Language Models
- Dr.ICL: Demonstration-Retrieved In-context Learning
- Finding Support Examples for In-Context Learning
- Compositional Exemplars for In-context Learning
- Learning To Retrieve Prompts for In-Context Learning
- Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
- [Paper
- [Code
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- Large Language Models are Zero-Shot Reasoners
- [Paper
- [Code
- Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
- Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
- Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- MathPrompter: Mathematical Reasoning using Large Language Models
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
- PAL: Program-aided Language Models
- Automatic Chain of Thought Prompting in Large Language Models
- Complexity-Based Prompting for Multi-Step Reasoning
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Reasoning with Language Model is Planning with World Model
- Automatic Model Selection with Large Language Models for Reasoning
- Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Large Language Model Guided Tree-of-Thought
- Self-Evaluation Guided Beam Search for Reasoning
- Complexity-Based Prompting for Multi-Step Reasoning
- Making Large Language Models Better Reasoners with Step-Aware Verifier
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
- Generating Sequences by Learning to Self-Correct
- PEER: A Collaborative Language Model
- Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision
- Think about it! Improving defeasible reasoning by first modeling the question scenario
- Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
- InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
- Is Self-Repair a Silver Bullet for Code Generation?
- Improving Factuality and Reasoning in Language Models through Multiagent Debate
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
- Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
- Self-Edit: Fault-Aware Code Editor for Code Generation
- Progressive-Hint Prompting Improves Reasoning in Large Language Models
- Self-collaboration Code Generation via ChatGPT
- Teaching Large Language Models to Self-Debug
- REFINER: Reasoning Feedback on Intermediate Representation
- Self-Refine: Iterative Refinement with Self-Feedback
- Guiding Language Model Reasoning with Planning Tokens
- AutoAgents: A Framework for Automatic Agent Generation
- AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
- MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
- Voyager: An Open-Ended Embodied Agent with Large Language Models
- ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
- CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
- Making Language Models Better Tool Learners with Execution Feedback
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
- Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
- OpenAGI: When LLM Meets Domain Experts
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- Reflexion: Language Agents with Verbal Reinforcement Learning
- ART: Automatic multi-step reasoning and tool-use for large language models
- [Code
- Toolformer: Language Models Can Teach Themselves to Use Tools
- Visual Programming: Compositional visual reasoning without training
- ReAct: Synergizing Reasoning and Acting in Language Models
Programming Languages
Keywords
large-language-models
14
llm
10
natural-language-processing
5
deep-learning
5
multimodal
4
chain-of-thought
4
pytorch
4
in-context-learning
3
instruction-tuning
3
semantic-segmentation
3
sam
3
dataset
3
chatgpt
3
segment-anything-model
3
machine-learning
3
self-supervised-learning
3
pretrained-models
3
chatbot
3
vision-language-model
3
commonsense-reasoning
3
vision-language
3
image-classification
3
nlp
3
pre-training
2
rlhf
2
awesome-list
2
computer-vision
2
gpt-3
2
visual-question-answering
2
prompt-engineering
2
reasoning
2
llm-inference
2
chinese
2
flash-attention
2
artificial-intelligence
2
fine-tuning-llm
2
gpt
2
image-text-retrieval
2
long-context
2
action-recognition
2
vision-transformer
2
foundation-model
2
multi-modality
2
multi-modal
2
video-understanding
2
gpt-4
2
multimodal-large-language-models
2
vision-language-transformer
2
segment-anything
2
object-detection
2