Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models
Last synced: 2 days ago
JSON representation
-
1 Relevant Surveys and Links
-
3 Reasoning Tasks
-
3.7 Multimodal Reasoning
- [Project
- [Code
- [Code
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- [Paper
- [Code
- [Blog
- [Paper
- [Paper
- [Project
- Kosmos-2: Grounding Multimodal Large Language Models to the World
- BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
- Language Is Not All You Need: Aligning Perception with Language Models
- [Project
- MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
- Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
- [Paper
- DetGPT: Detect What You Need via Reasoning
- DePlot: One-shot visual language reasoning by plot-to-table translation
- MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
- LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
- LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
- On Evaluating Adversarial Robustness of Large Vision-Language Models
- Evaluating Object Hallucination in Large Vision-Language Models
- On the Hidden Mystery of OCR in Large Multimodal Models
- Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
- Evaluating Understanding on Conceptual Abstraction Benchmarks
- Communicating Natural Programs to Humans and Machines
- CIDEr: Consensus-based Image Description Evaluation
- [Paper
-
3.3 Logical Reasoning
- [Code
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Code
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- [Paper
- Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
- LogicLLM: Exploring Self-supervised Logic-enhanced Training for Large Language Models
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
- Explicit Planning Helps Language Models in Logical Reasoning
- Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
- Weakly Supervised Neural Symbolic Learning for Cognitive Tasks
- [Paper
- NeuPSL: Neural Probabilistic Soft Logic
- Generating Natural Language Proofs with Verifier-Guided Search
- Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
- MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning
- Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design
- [Paper
- Neural probabilistic logic programming in DeepProbLog
- [Paper
- Abductive Learning with Ground Knowledge Base
- [Paper
- Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text
- Transformers as Soft Reasoners over Language
- Neural Module Networks for Reasoning over Text
- The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
- FOLIO: Natural Language Reasoning with First-Order Logic
- ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- [Paper
- Inductive logic programming at 30
- [Paper
- [Paper
- AR-LSAT: Investigating Analytical Reasoning of Text
- [Paper
-
3.1 Commonsense Reasoning
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Paper
- citations
- [arXiv
- [paper
- [project
- Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
- [Paper
- Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
- [Paper
- Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
- [Paper
- [Project
- [Paper
- Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
- [Paper
- [Project
- Language Models of Code are Few-Shot Commonsense Learners
- [Paper
- A Systematic Investigation of Commonsense Knowledge in Large Language Models
- [Paper
- Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense
- [Paper
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning
- [Paper
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
- [Paper
- [Project
- ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
- [Paper
- NEWTON: Are Large Language Models Capable of Physical Reasoning?
- [Paper
- ESPRIT: Explaining Solutions to Physical Reasoning Tasks
- [Paper
- PIQA: Reasoning about Physical Commonsense in Natural Language
- [Paper
- [Project
- Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
- [Paper
- [Project
- PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
- [Paper
- PROST: Physical Reasoning of Objects through Space and Time
- [Paper
- GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
- [Paper
- [Project
- Probing Physical Reasoning with Counter-Commonsense Context
- LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
- [Paper
- UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
- [Paper
- Differentiable Open-Ended Commonsense Reasoning
- CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
- Abductive Commonsense Reasoning
- PHYRE: A New Benchmark for Physical Reasoning
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale
- MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
- HellaSwag: Can a Machine Really Finish Your Sentence?
- SocialIQA: Commonsense Reasoning about Social Interactions
- [Paper
- SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
- [Paper
- [Paper
-
3.2 Mathematical Reasoning
- [Code
- [Code
- [Paper
- [Code
- The Lean 4 Theorem Prover and Programming Language
- [code
- [Code
- [Code
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- Star
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models
- [Paper
- [Project
- [Paper
- MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
- MultiModalQA: Complex Question Answering over Text, Tables and Images
- Deep Learning in Neural Networks: An Overview
- [Paper
- [Paper
- Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
- Are NLP Models really able to Solve Simple Math Word Problems?
- [Paper
- Measuring Mathematical Problem Solving With the MATH Dataset
- How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation
- [Paper
- Learn to Solve Algebra Word Problems Using Quadratic Programming
- [Paper
- Learning to Automatically Solve Algebra Word Problems
- [Paper
- [Paper
- [Paper
- [Blog
- UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
- GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
- Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
- Solving Geometry Problems: Combining Text and Diagram Interpretation
- [Paper
- LEGO-Prover: Neural Theorem Proving with Growing Libraries
- Lyra: Orchestrating Dual Correction in Automated Theorem Proving
- DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function
- [Paper
- Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
- Magnushammer: A Transformer-based Approach to Premise Selection
- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
- Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis
- Autoformalization with Large Language Models
- [Paper
- HyperTree Proof Search for Neural Theorem Proving
- Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
- Formal Mathematics Statement Curriculum Learning
- The Lean 4 Theorem Prover and Programming Language
- TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
- Proof Artifact Co-training for Theorem Proving with Language Models
- Generative Language Modeling for Automated Theorem Proving
- Formal Verification of Hardware Components in Critical Systems
- [Paper
- [Paper
- Learning to Prove Theorems via Interacting with Proof Assistants
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- [Paper
- TacticToe: Learning to Prove with Tactics
- [Paper
- [Paper
- [Slides
- [Paper
- [Paper
- [Project
- [Paper
- SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
- ScienceWorld: Is your Agent Smarter than a 5th Grader?
- [Book
- ARB: Advanced Reasoning Benchmark for Large Language Models
- TheoremQA: A Theorem-driven Question Answering dataset
- Language Models are Multilingual Chain-of-Thought Reasoners
- [Paper
- Training Verifiers to Solve Math Word Problems
- [Paper
- [Blog
- IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
- FinQA: A Dataset of Numerical Reasoning over Financial Data
- Program Synthesis with Large Language Models
- HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
- A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
- AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry
- Measuring Coding Challenge Competence With APPS
- TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
- TSQA: Tabular Scenario Based Question Answering
- Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems
- HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
- DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
- Natural Questions: A Benchmark for Question Answering Research
- [Paper
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
- The Web as a Knowledge-base for Answering Complex Questions
- Variational Reasoning for Question Answering with Knowledge Graph
- From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
- [Paper
- Deep Neural Solver for Math Word Problems
- [Paper
- Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
- Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks
- [Paper
- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
- [Paper
- Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems
- The Value of Semantic Parse Labeling for Knowledge Base Question Answering
- [Paper
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Key-Value Memory Networks for Directly Reading Documents
- MAWPS: A Math Word Problem Repository
- [Paper
- Automatically Solving Number Word Problems by Semantic Parsing and Reasoning
- [Paper
- Compositional Semantic Parsing on Semi-Structured Tables
- Parsing Algebraic Word Problems into Equations
- [Paper
- Learning to Solve Arithmetic Word Problems with Verb Categorization
- [Paper
- Semantic Parsing on Freebase from Question-Answer Pairs
- [Paper
- Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
- [Paper
- [Paper
- The ATIS Spoken Language Systems Pilot Corpus
- [Paper
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- Evaluating Large Language Models Trained on Code
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
- [Paper
- The Lean 4 Theorem Prover and Programming Language
-
3.5 Visual Reasoning
- 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
- Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
- VLGrammar: Grounded Grammar Induction of Vision and Language
- Attention over learned object embeddings enables complex visual reasoning
- PointLLM: Empowering Large Language Models to Understand Point Clouds
- 3D-LLM: Injecting the 3D World into Large Language Models
- SQA3D: Situated Question Answering in 3D Scenes
- PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
- OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
-
3.6 Audio Reasoning
- [Code
- [Code
- M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
- [Paper
- Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
- [Paper
- Self-Supervised Speech Representation Learning: A Review
- SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
- data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
- SUPERB: Speech processing Universal PERformance Benchmark
- Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Conformer: Convolution-augmented Transformer for Speech Recognition
- Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
- An Unsupervised Autoregressive Model for Speech Representation Learning
- Representation Learning with Contrastive Predictive Coding
- [Paper
- Neural Discrete Representation Learning
- Large-Scale Domain Adaptation via Teacher-Student Learning
- XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
- MLS: A Large-Scale Multilingual Dataset for Speech Research
- A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
- Libri-Light: A Benchmark for ASR with Limited or No Supervision
- Common Voice: A Massively-Multilingual Speech Corpus
-
3.8 Agent Reasoning
- [Paper
- [Paper
- [Paper
- [Code
- [Paper
- [Paper
- [Code
- [Code
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [arXiv
- [paper
- [project
- [Paper
- [Paper
- citations
- [arXiv
- [paper
- [project
- citations
- [Paper
- [Paper
- [Paper
- [Paper
- Vision-Language Foundation Models as Effective Robot Imitators
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- RT-1: Robotics Transformer for Real-World Control at Scale
- Skill Induction and Planning with Latent Language
- A Generalist Agent
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
- Pre-Trained Language Models for Interactive Decision-Making
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning
- Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
- [Paper
- [Paper
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
- Code as Policies: Language Model Programs for Embodied Control
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
- [Paper
- Statler: State-Maintaining Language Models for Embodied Reasoning
- Collaborating with language models for embodied reasoning
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
- Measuring and Narrowing the Compositionality Gap in Language Models
- Inner Monologue: Embodied Reasoning through Planning with Language Models
- Federated Large Language Model: A Position Paper
- Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems
- Building Cooperative Embodied Agents Modularly with Large Language Models
- [Paper
- [Book
- DriveLM: Driving with Graph Visual Question Answering
- [Paper
- LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
- [Paper
- [Project
- DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
- [Paper
- LMDrive: Closed-Loop End-to-End Driving with Large Language Models
- [Paper
- Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
- Vision Language Models in Autonomous Driving and Intelligent Transportation Systems
- DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
- MotionLM: Multi-Agent Motion Forecasting as Language Modeling
- End-to-end Autonomous Driving: Challenges and Frontiers
- Graph-based Topology Reasoning for Driving Scenes
- Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
- [Paper
- Language Prompt for Autonomous Driving
- Language Conditioned Traffic Generation
- NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
- BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
- iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
- Habitat 2.0: Training Home Assistants to Rearrange their Habitat
- RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
- Grounding Human-to-Vehicle Advice for Self-driving Vehicles
- Habitat: A Platform for Embodied AI Research
- Gibson Env: Real-World Perception for Embodied Agents
- VirtualHome: Simulating Household Activities via Programs
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
3.9 Other Tasks and Applications
- [Paper
- [Paper
- [Paper
- [code
- [code
- [paper
- [code
- [code
- [code
- [paper
- [code
- [code
- [code
- [code
- [paper
- [News
- [Paper
- [Paper
- citations
- Star
- [arXiv
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- Theory of Mind Might Have Spontaneously Emerged in Large Language Models
- [Paper
- [Paper
- Large Language Models Are Not Strong Abstract Reasoners
- BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information
- Thinking Like a Skeptic: Defeasible Inference in Natural Language
- [Paper
- KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion
- [Paper
- [arXiv
- [arXiv
- [arXiv
- [Paper
- [arXiv
- [arXiv
- [arXiv
- [Paper
- [arXiv
- [arXiv
- Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
- Uni-RNA: Universal Pre-Trained Models Revolutionize RNA Research
- [Paper
- HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
- DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins
- GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information
- [News
- [Paper
- ProGen2: Exploring the Boundaries of Protein Language Models
- [Paper
- PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
- [Paper
- [Paper
- [Paper
- citations
- Star
- [arXiv
- [paper
- [project
- [huggingface
- citations
- [medRxiv
- [paper
- citations
- Star
- [paper
- citations
- [paper
- citations
- [paper
- citations
- Star
- [paper
- citations
- [paper
- citations
- [arXiv
- citations
- Star
- [paper
- citations
- Star
- [paper
- citations
- Star
- [arXiv
- [paper
- citations
- [paper
- citations
- [paper
- citations
- [arXiv
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [News
- [Paper
- [paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [medRxiv
- [paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [Paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [News
- [paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [paper
- [Paper
- [Paper
- [Paper
- [News
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [paper
- [Paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [paper
- [paper
- [News
- [Paper
- [Paper
-
3.4 Causal Reasoning
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- Understanding Causality with Large Language Models: Feasibility and Opportunities
- Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
- Can large language models build causal graphs?
- Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis
- [Paper
- [Paper
- Causal Parrots: Large Language Models May Talk Causality But Are Not Causal
- Causal Discovery with Language Models as Imperfect Experts
- From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data
- Can Large Language Models Infer Causation from Correlation?
- The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
- Probing for Correlations of Causal Facts: Large Language Models and Causality
- [Paper
- Can Large Language Models Distinguish Cause from Effect?
- [Paper
- Learning Faithful Representations of Causal Graphs
- [Paper
- InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance
- [Paper
- Towards Causal Representation Learning
- CausaLM: Causal Model Explanation Through Counterfactual Language Models
- Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation
- Elements of Causal Inference: Foundations and Learning Algorithms
- [Book
- [Book
- [Paper
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
- Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios
- [Paper
- [Paper
- CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
- Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere
- [Paper
- Distinguishing cause from effect using observational data: methods and benchmarks
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
-
2 Foundation Models
-
2.1 Language Foundation Models
- [Code
- [Code
- [Code
- [Code
- Mistral 7B
- [Paper
- Qwen Technical Report
- [Paper
- [Project
- [Paper
- [Project
- PaLM 2 Technical Report
- PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
- [Paper
- GPT-4 Technical Report
- [Paper
- [Blog
- [Blog
- PaLM: Scaling Language Modeling with Pathways
- [Paper
- Finetuned Language Models Are Zero-Shot Learners
- PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
- [Paper
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- [Paper
- [Paper
-
2.2 Vision Foundation Models
- [code
- [code
- [Code
- [Code
- [Code
- [Code
- [Code
- [code
- [Code
- [code
- [Code
- [code
- [stable diffusion
- [Code
- [code
- [Implementation
- [paper
- citations
- Star
- [arXiv
- [paper
- [project
- citations
- Star
- citations
- [paper
- citations
- Star
- citations
- Star
- [paper
- Star
- citations
- Star
- [paper
- citations
- [arXiv
- [paper
- Explain Any Concept: Segment Anything Meets Concept-Based Explanation
- [Paper
- Segment and Track Anything
- [Paper
- SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
- [Paper
- Edit Everything: A Text-Guided Generative System for Images Editing
- [Paper
- Inpaint Anything: Segment Anything Meets Image Inpainting
- [Paper
- [arXiv
- [Paper
- [blog
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- [Paper
- [arXiv
- [paper
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- [Paper
- [arXiv
- [Paper
- Resolution-robust Large Mask Inpainting with Fourier Convolutions
- [Paper
- [arXiv
- [Paper
- [arXiv
- [Paper
- [paper
-
2.3 Multimodal Foundation Models
- [Code
- [code
- [code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Paper
- [Code
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [blog
- Star
- [arXiv
- [paper
- citations
- citations
- [Paper
- [Paper
- [Paper
- [arXiv
- [paper
- Gemini: A Family of Highly Capable Multimodal Models
- [Paper
- [Project
- [paper
- [Paper
- [Blog
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- [Paper
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- [Paper
- [Code
- Caption Anything: Interactive Image Description with Diverse Multimodal Controls
- [Paper
- Scalable Mask Annotation for Video Text Spotting
- [Paper
- Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models
- [Paper
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
- [Paper
- CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
- [Paper
- One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
- GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
- [Paper
- From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
- CoCa: Contrastive Captioners are Image-Text Foundation Models
- [Paper
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- [Paper
- Learning to Prompt for Vision-Language Models
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [Paper
- [Paper
- [Paper
-
2.4 Reasoning Applications
-
-
4 Reasoning Techniques
-
4.1 Pre-Training
- [Code
- [Code
- [Code
- [Code
- [Code
- [Code
- [Blog
- [Paper
- [Blog
- [Paper
- [Paper
- The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling
- Recipes for building an open-domain chatbot
- CLUE: A Chinese Language Understanding Evaluation Benchmark
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Complexity of Word Collocation Networks: A Preliminary Structural Analysis
- MOFI: Learning Image Representations from Noisy Entity Annotated Images
- Revisiting Weakly Supervised Pre-Training of Visual Perception Models
- ImageNet-21K Pretraining for the Masses
- Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
- ImageNet Large Scale Visual Recognition Challenge
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
- ImageBind: One Embedding Space To Bind Them All
- DataComp: In search of the next generation of multimodal datasets
- LAION-5B: An open large-scale dataset for training next generation image-text models
- Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
- RedCaps: web-curated image-text data created by the people, for the people
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
- WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
- Im2Text: Describing Images Using 1 Million Captioned Photographs
- Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Attention Is All You Need
- LLaMA: Open and Efficient Foundation Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- GLM-130B: An Open Bilingual Pre-trained Model
- OPT: Open Pre-trained Transformer Language Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- [Paper
- [Paper
- Improving CLIP Training with Language Rewrites
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- Scaling Language-Image Pre-training via Masking
- DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
- K-LITE: Learning Transferable Visual Models with External Knowledge
- FILIP: Fine-grained Interactive Language-Image Pre-Training
- Efficient Streaming Language Models with Attention Sinks
- Retentive Network: A Successor to Transformer for Large Language Models
- LongNet: Scaling Transformers to 1,000,000,000 Tokens
- RWKV: Reinventing RNNs for the Transformer Era
- Hyena Hierarchy: Towards Larger Convolutional Language Models
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- Long Range Language Modeling via Gated State Spaces
- Diagonal State Spaces are as Effective as Structured State Spaces
- Efficiently Modeling Long Sequences with Structured State Spaces
- Flamingo: a Visual Language Model for Few-Shot Learning
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- [Paper
- Learning Transferable Visual Models From Natural Language Supervision
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- [Paper
-
4.5 In-Context Learning
- [Code
- [Code
- [Paper
- [Code
- PAL: Program-aided Language Models
- [Paper
- Scaling Instruction-Finetuned Language Models
- Diverse Demonstrations Improve In-context Compositional Generalization
- Complementary Explanations for Effective In-Context Learning
- Automatic Chain of Thought Prompting in Large Language Models
- Complexity-Based Prompting for Multi-Step Reasoning
- Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation
- Selective Annotation Makes Language Models Better Few-Shot Learners
- What Makes Good In-Context Examples for GPT-3?
- DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
- Learning to Retrieve In-Context Examples for Large Language Models
- Dr.ICL: Demonstration-Retrieved In-context Learning
- Finding Support Examples for In-Context Learning
- Compositional Exemplars for In-context Learning
- Learning To Retrieve Prompts for In-Context Learning
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
- Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
- Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- MathPrompter: Mathematical Reasoning using Large Language Models
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
- Automatic Model Selection with Large Language Models for Reasoning
- Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
- Large Language Model Guided Tree-of-Thought
- Self-Evaluation Guided Beam Search for Reasoning
- Making Large Language Models Better Reasoners with Step-Aware Verifier
- Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
- Generating Sequences by Learning to Self-Correct
- PEER: A Collaborative Language Model
- Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision
- Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
- InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
- Is Self-Repair a Silver Bullet for Code Generation?
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
- Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
- Self-Edit: Fault-Aware Code Editor for Code Generation
- Progressive-Hint Prompting Improves Reasoning in Large Language Models
- Self-collaboration Code Generation via ChatGPT
- Teaching Large Language Models to Self-Debug
- REFINER: Reasoning Feedback on Intermediate Representation
- Self-Refine: Iterative Refinement with Self-Feedback
- Reasoning with Language Model is Planning with World Model
- Improving Factuality and Reasoning in Language Models through Multiagent Debate
- Language Models are Few-Shot Learners
- [Paper
- Think about it! Improving defeasible reasoning by first modeling the question scenario
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- [Paper
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Large Language Models are Zero-Shot Reasoners
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
-
4.2 Fine-Tuning
- [code
- Improved Baselines with Visual Instruction Tuning
- Visual Instruction Tuning
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
- MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
- WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
- Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
- Let's Verify Step by Step
- Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
- Specializing Smaller Language Models towards Multi-Step Reasoning
- Teaching Small Language Models to Reason
- Large Language Models Can Self-Improve
- Explanations from Large Language Models Make Small Reasoners Better
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
- AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
- Towards a Unified View of Parameter-Efficient Transfer Learning
- Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
- MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
- Parameter-Efficient Transfer Learning for NLP
- QLoRA: Efficient Finetuning of Quantized LLMs
- Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
- KronA: Parameter Efficient Tuning with Kronecker Adapter
- DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
- LoRA: Low-Rank Adaptation of Large Language Models
- P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
- The Power of Scale for Parameter-Efficient Prompt Tuning
- Factual Probing Is [MASK
- GPT Understands, Too
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
- Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
- Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
- LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
- Towards Efficient Visual Adaption via Structural Re-parameterization
- citations
- Star
- [arXiv
- [paper
- Large Language Models Are Reasoning Teachers
-
4.3 Alignment Training
- [Code
- [Code
- LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction
- Chinese Open Instruction Generalist: A Preliminary Release
- OpenAssistant Conversations -- Democratizing Large Language Model Alignment
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Crosslingual Generalization through Multitask Finetuning
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
- ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
- MetaICL: Learning to Learn In Context
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
- Cross-Task Generalization via Natural Language Crowdsourcing Instructions
- UnifiedQA: Crossing Format Boundaries With a Single QA System
- Self-Alignment with Instruction Backtranslation
- Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
- Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
- The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
- CoEdIT: Text Editing by Task-Specific Instruction Tuning
- LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
- Instruction Tuning with GPT-4
- [Blog
- [Blog
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- Self-Instruct: Aligning Language Models with Self-Generated Instructions
- Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
- Fine-Tuning Language Models with Advantage-Induced Policy Alignment
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- Training language models to follow instructions with human feedback
- Preference Ranking Optimization for Human Alignment
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- RRHF: Rank Responses to Align Language Models with Human Feedback without tears
- Calibrating Sequence likelihood Improves Conditional Language Generation
-
4.4 Mixture of Experts (MoE)
- [code
- citations
- Star
- [arXiv
- [paper
- An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
- Mixed Autoencoder for Self-supervised Visual Representation Learning
- Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners
- MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Go Wider Instead of Deeper
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- [Paper
- [paper
-
4.6 Autonomous Agent
- [Code
- Guiding Language Model Reasoning with Planning Tokens
- AutoAgents: A Framework for Automatic Agent Generation
- AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
- MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
- Voyager: An Open-Ended Embodied Agent with Large Language Models
- ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
- CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
- Making Language Models Better Tool Learners with Execution Feedback
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
- Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
- OpenAGI: When LLM Meets Domain Experts
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- Reflexion: Language Agents with Verbal Reinforcement Learning
- ART: Automatic multi-step reasoning and tool-use for large language models
- Visual Programming: Compositional visual reasoning without training
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
- Toolformer: Language Models Can Teach Themselves to Use Tools
- ReAct: Synergizing Reasoning and Acting in Language Models
-
-
0 Survey
Programming Languages
Categories
Sub Categories
3.9 Other Tasks and Applications
403
3.2 Mathematical Reasoning
194
3.3 Logical Reasoning
168
3.8 Agent Reasoning
135
2.3 Multimodal Foundation Models
83
3.1 Commonsense Reasoning
78
2.2 Vision Foundation Models
66
4.1 Pre-Training
62
4.5 In-Context Learning
57
3.4 Causal Reasoning
54
4.2 Fine-Tuning
41
4.3 Alignment Training
33
3.7 Multimodal Reasoning
31
3.6 Audio Reasoning
26
2.1 Language Foundation Models
26
4.6 Autonomous Agent
20
4.4 Mixture of Experts (MoE)
16
3.5 Visual Reasoning
10
2.4 Reasoning Applications
10
Keywords
large-language-models
15
llm
12
chain-of-thought
6
natural-language-processing
6
chatgpt
5
deep-learning
5
nlp
4
reasoning
4
pytorch
4
commonsense-reasoning
4
multimodal
4
gpt-3
3
dataset
3
vision-language-model
3
machine-learning
3
segment-anything-model
3
self-supervised-learning
3
artificial-intelligence
3
segment-anything
3
sam
3
semantic-segmentation
3
gpt
3
image-classification
3
chatbot
3
vision-language
3
pretrained-models
3
instruction-tuning
3
in-context-learning
3
prompt-engineering
3
visual-question-answering
2
image-captioning
2
long-context
2
fine-tuning-llm
2
flash-attention
2
chinese
2
zero-shot
2
multi-modal
2
image-text-retrieval
2
gpt-4
2
foundation-models
2
action-recognition
2
foundation-model
2
video-understanding
2
object-detection
2
vision-language-transformer
2
computer-vision
2
vision-transformer
2
paper-list
2
llm-inference
2
pre-training
2