Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

List: Awesome-Reasoning-Foundation-Models

Last synced: 7 days ago
JSON representation

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Awesome Lists containing this project

README

        

# Awesome-Reasoning-Foundation-Models
[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10298864.svg)](https://doi.org/10.5281/zenodo.10298864)
[![arXiv](https://img.shields.io/badge/arXiv-arxiv.org/abs/2312.11562-.svg)](https://arxiv.org/abs/2312.11562)

![overview](assets/0_reasoning.jpg)

[`survey.pdf`](https://arxiv.org/pdf/2312.11562.pdf) |
A curated list of awesome **large AI models**, or **foundation models**, for **reasoning**.

We organize the current [foundation models](#2-foundation-models) into three categories: *language foundation models*, *vision foundation models*, and *multimodal foundation models*.
Further, we elaborate the foundation models in [reasoning tasks](#3-reasoning-tasks), including *commonsense*, *mathematical*, *logical*, *causal*, *visual*, *audio*, *multimodal*, *agent reasoning*, etc.
[Reasoning techniques](#4-reasoning-techniques), including *pre-training*, *fine-tuning*, *alignment training*, *mixture of experts*, *in-context learning*, and *autonomous agent*, are also summarized.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute! See [CONTRIBUTING](CONTRIBUTING.md).

## Table of Contents

table of contents

- [0 Survey](#0-survey)
- [1 Relevant Surveys](#1-relevant-surveys-and-links)
- [2 Foundation Models](#2-foundation-models)
- [2.1 Language Foundation Models](#21-language-foundation-models)
- [2.2 Vision Foundation Models](#22-vision-foundation-models)
- [2.3 Multimodal Foundation Models](#23-multimodal-foundation-models)
- [2.4 Reasoning Applications](#24-reasoning-applications)
- [3 Reasoning Tasks](#3-reasoning-tasks)
- [3.1 Commonsense Reasoning](#31-commonsense-reasoning)
- [3.2 Mathematical Reasoning](#32-mathematical-reasoning)
- [3.3 Logical Reasoning](#33-logical-reasoning)
- [3.4 Causal Reasoning](#34-causal-reasoning)
- [3.5 Visual Reasoning](#35-visual-reasoning)
- [3.6 Audio Reasoning](#36-audio-reasoning)
- [3.7 Multimodal Reasoning](#37-multimodal-reasoning)
- [3.8 Agent Reasoning](#38-agent-reasoning)
- [3.9 Other Tasks and Applications](#39-other-tasks-and-applications)
- [4 Reasoning Techniques](#4-reasoning-techniques)
- [4.1 Pre-Training](#41-pre-training)
- [4.2 Fine-Tuning](#42-fine-tuning)
- [4.3 Alignment Training](#43-alignment-training)
- [4.4 Mixture of Experts (MoE)](#44-mixture-of-experts-moe)
- [4.5 In-Context Learning](#45-in-context-learning)
- [4.6 Autonomous Agent](#46-autonomous-agent)

## 0 Survey

![overview](assets/1_overview.jpg)

This repository is primarily based on the following paper:

>[**A Survey of Reasoning with Foundation Models**](https://arxiv.org/abs/2312.11562)

>
> [Jiankai Sun](),
[Chuanyang Zheng](https://chuanyang-zheng.github.io/),
[Enze Xie](https://xieenze.github.io/),
[Zhengying Liu](https://scholar.google.com/citations?user=DFme0joAAAAJ&hl=fr),
[Ruihang Chu](),
[Jianing Qiu](),
[Jiaqi Xu](),
[Mingyu Ding](),
[Hongyang Li](https://lihongyang.info/),
[Mengzhe Geng](),
[Yue Wu](),
[Wenhai Wang](https://whai362.github.io/),
[Junsong Chen](),
[Zhangyue Yin](),
[Xiaozhe Ren](),
[Jie Fu](https://bigaidream.github.io/),
[Junxian He](https://jxhe.github.io/),
[Wu Yuan](http://www.bme.cuhk.edu.hk/yuan/),
[Qi Liu](https://leuchine.github.io/),
[Xihui Liu](https://xh-liu.github.io/),
[Yu Li](https://liyu95.com/),
[Hao Dong](https://zsdonghao.github.io/),
[Yu Cheng](https://scholar.google.com/citations?user=ORPxbV4AAAAJ&hl=zh-CN),
[Ming Zhang](https://scholar.google.com/citations?user=LbzoQBsAAAAJ&hl=en),
[Pheng Ann Heng](http://www.cse.cuhk.edu.hk/~pheng/),
[Jifeng Dai](https://jifengdai.org/),
[Ping Luo](http://luoping.me/),
[Jingdong Wang](https://jingdongwang2017.github.io/),
[Ji-Rong Wen](https://scholar.google.com/citations?user=tbxCHJgAAAAJ&hl=zh-CN),
[Xipeng Qiu](https://xpqiu.github.io/),
[Yike Guo](https://cse.hkust.edu.hk/admin/people/faculty/profile/yikeguo),
[Hui Xiong](https://scholar.google.com/citations?user=cVDF1tkAAAAJ&hl=en),
[Qun Liu](https://liuquncn.github.io/index_zh.html), and
[Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ&hl=en)

If you find this repository helpful, please consider citing:

```bibtex
@article{sun2023survey,
title={A Survey of Reasoning with Foundation Models},
author={Sun, Jiankai and Zheng, Chuanyang and Xie, Enze and Liu, Zhengying and Chu, Ruihang and Qiu, Jianing and Xu, Jiaqi and Ding, Mingyu and Li, Hongyang and Geng, Mengzhe and others},
journal={arXiv preprint arXiv:2312.11562},
year={2023}
}
```

## 1 Relevant Surveys and Links

relevant surveys

[(Back-to-Top)](#table-of-contents)

- Combating Misinformation in the Age of LLMs: Opportunities and Challenges
\-
[[arXiv](https://arxiv.org/abs/2311.05656)]
[[Link](https://llm-misinformation.github.io/)]

- The Rise and Potential of Large Language Model Based Agents: A Survey
\-
[[arXiv](https://arxiv.org/abs/2309.07864)]
[[Link](https://github.com/WooooDyy/LLM-Agent-Paper-List)]

- Multimodal Foundation Models: From Specialists to General-Purpose Assistants
\-
[[arXiv](https://arxiv.org/abs/2309.10020)]
[[Tutorial](https://vlp-tutorial.github.io/2023/)]

- A Survey on Multimodal Large Language Models
\-
[[arXiv](https://arxiv.org/abs/2306.13549)]
[[Link](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)]

- Interactive Natural Language Processing
\-
[[arXiv](https://arxiv.org/abs/2305.13246)]
[[Link](https://github.com/InteractiveNLP-Team/awesome-InteractiveNLP-papers)]

- A Survey of Large Language Models
\-
[[arXiv](https://arxiv.org/abs/2303.18223)]
[[Link](https://github.com/RUCAIBox/LLMSurvey)]

- Self-Supervised Multimodal Learning: A Survey
\-
[[arXiv](https://arxiv.org/abs/2304.01008)]
[[Link](https://github.com/ys-zong/awesome-self-supervised-multimodal-learning)]

- Large AI Models in Health Informatics: Applications, Challenges, and the Future
\-
[[arXiv](https://arxiv.org/abs/2303.11568)]
[[Paper](https://ieeexplore.ieee.org/document/10261199)]
[[Link](https://github.com/Jianing-Qiu/Awesome-Healthcare-Foundation-Models)]

- Towards Reasoning in Large Language Models: A Survey
\-
[[arXiv](https://arxiv.org/abs/2212.10403)]
[[Paper](https://aclanthology.org/2023.findings-acl.67.pdf)]
[[Link](https://github.com/jeffhj/LM-reasoning)]

- Reasoning with Language Model Prompting: A Survey
\-
[[arXiv](https://arxiv.org/abs/2212.09597)]
[[Paper](https://aclanthology.org/2023.acl-long.294.pdf)]
[[Link](https://github.com/zjunlp/Prompt4ReasoningPapers)]

- Awesome Multimodal Reasoning
\-
[[Link](https://github.com/atfortes/Awesome-Multimodal-Reasoning)]

## 2 Foundation Models

foundation models

[(Back-to-Top)](#table-of-contents)

![foundation_models](assets/22_foundation_models.jpg)

### Table of Contents - 2

foundation models (table of contents)

[(Back-to-Top)](#table-of-contents)

- [2 Foundation Models](#2-foundation-models)
- [2.1 Language Foundation Models](#21-language-foundation-models)
- [2.2 Vision Foundation Models](#22-vision-foundation-models)
- [2.3 Multimodal Foundation Models](#23-multimodal-foundation-models)
- [2.4 Reasoning Applications](#24-reasoning-applications)

### 2.1 Language Foundation Models

LFMs

[Foundation Models (Back-to-Top)](#2-foundation-models)

- `2023/10` | `Mistral` | [Mistral 7B](https://arxiv.org/abs/2310.06825)
\-
[[Paper](https://arxiv.org/pdf/2310.06825.pdf)]
[[Code](https://github.com/mistralai/mistral-src?tab=readme-ov-file)]

- `2023/09` | `Qwen` | [Qwen Technical Report](https://arxiv.org/abs/2309.16609)
\-
[[Paper](https://arxiv.org/pdf/2309.16609.pdf)]
[[Code](https://github.com/QwenLM/Qwen)]
[[Project](https://tongyi.aliyun.com/qianwen/)]

- `2023/07` | `Llama 2` | [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)
\-
[[Paper](https://arxiv.org/pdf/2307.09288.pdf)]
[[Code](https://github.com/facebookresearch/llama)]
[[Blog](https://ai.meta.com/llama/)]

- `2023/07` | `InternLM` | InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities
\-
[[Paper](https://github.com/InternLM/InternLM-techreport/blob/main/InternLM.pdf)]
[[Code](https://github.com/InternLM/InternLM)]
[[Project](https://internlm.intern-ai.org.cn)]

- `2023/05` | `PaLM 2` | [PaLM 2 Technical Report](https://arxiv.org/abs/2305.10403)
\-

- `2023/03` | `PanGu-Σ` | [PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing](https://arxiv.org/pdf/2303.10845.pdf)
\-
[[Paper](https://arxiv.org/abs/2303.10845)]

- `2023/03` | `Vicuna` | Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
\-
[[Blog](https://lmsys.org/blog/2023-03-30-vicuna/)]
[[Code](https://github.com/lm-sys/FastChat)]

- `2023/03` | `GPT-4` | [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)
\-
[[Paper](https://arxiv.org/pdf/2303.08774.pdf)]
[[Blog](https://openai.com/research/gpt-4)]

- `2023/02` | `LLaMA` | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
\-
[[Paper](https://arxiv.org/pdf/2302.13971.pdf)]
[[Code](https://github.com/facebookresearch/llama)]
[[Blog](https://ai.meta.com/blog/large-language-model-llama-meta-ai/)]

- `2022/11` | `ChatGPT` | Chatgpt: Optimizing language models for dialogue
\-
[[Blog](https://openai.com/blog/chatgpt)]

- `2022/04` | `PaLM` | [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311)
\-
[[Paper](https://arxiv.org/pdf/2204.02311.pdf)]
[[Blog](https://blog.research.google/2022/04/pathways-language-model-palm-scaling-to.html)]

- `2021/09` | `FLAN` | [Finetuned Language Models Are Zero-Shot Learners](https://arxiv.org/abs/2109.01652)
\-

- `2021/07` | `Codex` | [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374)
\-

- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
\-
[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)]
[[Code](https://github.com/openai/gpt-3)]

- `2021/04` | `PanGu-α` | [PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation](https://arxiv.org/abs/2104.12369)
\-
[[Paper](https://arxiv.org/pdf/2104.12369.pdf)]
[[Code](https://github.com/huawei-noah/Pretrained-Language-Model)]

- `2019/08` | `Sentence-BERT` | [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
](https://arxiv.org/abs/1908.10084)
\-

- `2019/07` | `RoBERTa` | [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)
\-

- `2018/10` | `BERT` | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
\-
[[Paper](https://aclanthology.org/N19-1423.pdf)]
[[Code](https://github.com/google-research/bert)]
[[Blog](https://blog.research.google/2018/11/open-sourcing-bert-state-of-art-pre.html)]

---

### 2.2 Vision Foundation Models

VFMs

[Foundation Models (Back-to-Top)](#2-foundation-models)

- `2024/01` | `Depth Anything`
| `Yang et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:267061016?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/LiheYoung/Depth-Anything.svg?style=social&label=Star)

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

[[arXiv](https://arxiv.org/abs/2401.10891)]
[[paper](https://arxiv.org/pdf/2401.10891.pdf)]
[[code](https://github.com/LiheYoung/Depth-Anything)]
[[project](https://depth-anything.github.io/)]

- `2023/05` | `SAA+`
| `Cao et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:258762545?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/caoyunkang/Segment-Any-Anomaly.svg?style=social&label=Star)

Segment Any Anomaly without Training via Hybrid Prompt Regularization

[[arXiv](https://arxiv.org/abs/2305.10724)]
[[paper](https://arxiv.org/pdf/2305.10724.pdf)]
[[code](https://github.com/caoyunkang/Segment-Any-Anomaly)]

- `2023/05` | `Explain Any Concept` | [Explain Any Concept: Segment Anything Meets Concept-Based Explanation](https://arxiv.org/abs/2305.10289)
\-
[[Paper](https://arxiv.org/pdf/2305.10289.pdf)]
[[Code](https://github.com/Jerry00917/samshap)]

- `2023/05` | `SAM-Track` | [Segment and Track Anything](https://arxiv.org/abs/2305.06558)
\-
[[Paper](https://arxiv.org/pdf/2305.06558.pdf)]
[[Code](https://github.com/z-x-yang/Segment-and-Track-Anything)]

- `2023/05` | `SAMRS` | [SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model](https://arxiv.org/abs/2305.02034)
\-
[[Paper](https://arxiv.org/pdf/2305.02034.pdf)]
[[Code](https://github.com/ViTAE-Transformer/SAMRS)]

- `2023/04` | `Edit Everything` | [Edit Everything: A Text-Guided Generative System for Images Editing](https://arxiv.org/abs/2304.14006)
\-
[[Paper](https://arxiv.org/pdf/2304.14006.pdf)]
[[Code](https://github.com/DefengXie/Edit_Everything)]

- `2023/04` | `Inpaint Anything` | [Inpaint Anything: Segment Anything Meets Image Inpainting](https://arxiv.org/abs/2304.06790)
\-
[[Paper](https://arxiv.org/pdf/2304.06790.pdf)]
[[Code](https://github.com/geekyutao/Inpaint-Anything)]

- `2023/04` | `SAM`
| `Kirillov et al., ICCV 2023`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:257952310?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/facebookresearch/segment-anything.svg?style=social&label=Star)

Segment Anything

[[arXiv](https://arxiv.org/abs/2304.02643)]
[[paper](https://openaccess.thecvf.com/content/ICCV2023/html/Kirillov_Segment_Anything_ICCV_2023_paper.html)]
[[code](https://github.com/facebookresearch/segment-anything)]
[[blog](https://segment-anything.com/)]

- `2023/03` | `VideoMAE V2` | [VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://arxiv.org/abs/2303.16727)
\-
[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf)]
[[Code](https://github.com/OpenGVLab/VideoMAEv2)]

- `2023/03` | `Grounding DINO`
| `Liu et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:257427307?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/IDEA-Research/GroundingDINO.svg?style=social&label=Star)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

[[arXiv](https://arxiv.org/abs/2303.05499)]
[[paper](https://arxiv.org/pdf/2303.05499.pdf)]
[[code](https://github.com/IDEA-Research/GroundingDINO)]

- `2022/03` | `VideoMAE` | [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602)
\-
[[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/416f9cb3276121c42eebb86352a4354a-Abstract-Conference.html)]
[[Code](https://github.com/MCG-NJU/VideoMAE)]

- `2021/12` | `Stable Diffusion`
| `Rombach et al., CVPR 2022`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:245335280?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/CompVis/latent-diffusion.svg?style=social&label=Star)

High-Resolution Image Synthesis with Latent Diffusion Models

[[arXiv](https://arxiv.org/abs/2112.10752)]
[[paper](https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html)]
[[code](https://github.com/CompVis/latent-diffusion)]
[[stable diffusion](https://github.com/Stability-AI/stablediffusion)
![Star](https://img.shields.io/github/stars/Stability-AI/stablediffusion.svg?style=social&label=Star)]

- `2021/09` | `LaMa` | [Resolution-robust Large Mask Inpainting with Fourier Convolutions](https://arxiv.org/abs/2109.07161)
\-
[[Paper](https://openaccess.thecvf.com/content/WACV2022/papers/Suvorov_Resolution-Robust_Large_Mask_Inpainting_With_Fourier_Convolutions_WACV_2022_paper.pdf)]
[[Code](https://github.com/advimman/lama)]

- `2021/03` | `Swin`
| `Liu et al., ICCV 2021`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:232352874?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/microsoft/Swin-Transformer.svg?style=social&label=Star)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

[[arXiv](https://arxiv.org/abs/2103.14030)]
[[paper](https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper.html)]
[[code](https://github.com/microsoft/Swin-Transformer)]

- `2020/10` | `ViT`
| `Dosovitskiy et al., ICLR 2021`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:225039882?fields=citationCount&query=%24.citationCount&label=citations)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

[[arXiv](https://arxiv.org/abs/2010.11929)]
[[paper](https://openreview.net/forum?id=YicbFdNTTy)]
[[Implementation](https://github.com/lucidrains/vit-pytorch)]

---

### 2.3 Multimodal Foundation Models

MFMs

[Foundation Models (Back-to-Top)](#2-foundation-models)

- `2024/01` | `LLaVA-1.6`
| `Liu et al.`

LLaVA-1.6: Improved reasoning, OCR, and world knowledge

[[code](https://github.com/haotian-liu/LLaVA)]
[[blog](https://llava-vl.github.io/blog/2024-01-30-llava-1-6/)]

- `2024/01` | `MouSi`
| `Fan et al.`
![Star](https://img.shields.io/github/stars/FudanNLPLAB/MouSi.svg?style=social&label=Star)

MouSi: Poly-Visual-Expert Vision-Language Models

[[arXiv](https://arxiv.org/abs/2401.17221)]
[[paper](https://arxiv.org/pdf/2401.17221.pdf)]
[[code](https://github.com/FudanNLPLAB/MouSi)]

- `2023/12` | `InternVL`
| `Chen et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:266521410?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/OpenGVLab/InternVL.svg?style=social&label=Star)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

[[arXiv](https://arxiv.org/abs/2312.14238)]
[[paper](https://arxiv.org/pdf/2312.14238.pdf)]
[[code](https://github.com/OpenGVLab/InternVL)]

- `2023/12` | `Gemini` | [Gemini: A Family of Highly Capable Multimodal Models](https://arxiv.org/abs/2312.11805)
\-
[[Paper](https://arxiv.org/pdf/2312.11805.pdf)]
[[Project](https://deepmind.google/technologies/gemini/#introduction)]

- `2023/10` | `LLaVA-1.5`
| `Liu et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:263672058?fields=citationCount&query=%24.citationCount&label=citations)

Improved Baselines with Visual Instruction Tuning
[[arXiv](https://arxiv.org/abs/2310.03744)]
[[paper](https://arxiv.org/pdf/2310.03744.pdf)]
[[code](https://github.com/haotian-liu/LLaVA)]
[[project](https://llava-vl.github.io)]

- `2023/09` | `GPT-4V` | GPT-4V(ision) System Card
\-
[[Paper](https://cdn.openai.com/papers/GPTV_System_Card.pdf)]
[[Blog](https://openai.com/research/gpt-4v-system-card)]

- `2023/08` | `Qwen-VL` | [Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond](https://arxiv.org/abs/2308.12966)
\-
[[Paper](https://arxiv.org/pdf/2308.12966.pdf)]
[[Code](https://github.com/QwenLM/Qwen-VL)]

- `2023/05` | `InstructBLIP` | [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500)
\-
[[Paper](https://openreview.net/pdf?id=vvoWPYqZJA)]
[[Code](https://github.com/salesforce/LAVIS/tree/main/projects/instructblip)]

- `2023/05` | `Caption Anything` | [Caption Anything: Interactive Image Description with Diverse Multimodal Controls](https://arxiv.org/abs/2305.02677)
\-
[[Paper](https://arxiv.org/pdf/2305.02677.pdf)]
[[Code](https://github.com/ttengwang/Caption-Anything)]

- `2023/05` | `SAMText` | [Scalable Mask Annotation for Video Text Spotting](https://arxiv.org/abs/2305.01443)
\-
[[Paper](https://arxiv.org/pdf/2305.01443.pdf)]
[[Code](https://github.com/ViTAE-Transformer/SAMText)]

- `2023/04` | `Text2Seg` | [Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models](https://arxiv.org/abs/2304.10597)
\-
[[Paper](https://arxiv.org/pdf/2304.10597.pdf)]

- `2023/04` | `MiniGPT-4` | [MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592)
\-

- `2023/04` | `LLaVA` | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485)
\-
[[Paper](https://openreview.net/pdf?id=w0H2xGHlkw)]
[[Code](https://github.com/haotian-liu/LLaVA)]
[[Project](https://llava-vl.github.io)]

- `2023/04` | `CLIP Surgery` | [CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks](https://arxiv.org/abs/2304.05653)
\-
[[Paper](https://arxiv.org/pdf/2304.05653.pdf)]
[[Code](https://github.com/xmed-lab/CLIP_Surgery)]

- `2023/03` | `UniDiffuser` | [One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale](https://arxiv.org/abs/2303.06555)
\-

- `2023/01` | `GALIP` | [GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis](https://arxiv.org/abs/2301.12959)
\-
[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Tao_GALIP_Generative_Adversarial_CLIPs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf)]
[[Code](https://github.com/tobran/GALIP)]

- `2023/01` | `BLIP-2` | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597)
\-
[[Paper](https://proceedings.mlr.press/v202/li23q.html)]
[[Code](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)]

- `2022/12` | `Img2Prompt` | [From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models](https://arxiv.org/abs/2212.10846)
\-

- `2022/05` | `CoCa` | [CoCa: Contrastive Captioners are Image-Text Foundation Models](https://arxiv.org/abs/2205.01917)
\-
[[Paper](https://openreview.net/forum?id=Ee277P3AYC)]

- `2022/01` | `BLIP` | [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086)
\-
[[Paper](https://proceedings.mlr.press/v162/li22n.html)]
[[Code](https://github.com/salesforce/BLIP)]

- `2021/09` | `CoOp` | [Learning to Prompt for Vision-Language Models](https://arxiv.org/abs/2109.01134)
\-
[[Paper](https://link.springer.com/article/10.1007/s11263-022-01653-1)]
[[Code](https://github.com/KaiyangZhou/CoOp)]

- `2021/02` | `CLIP` | [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
\-
[[Paper](https://proceedings.mlr.press/v139/radford21a/radford21a.pdf)]
[[Code](https://github.com/openai/CLIP)]
[[Blog](https://openai.com/research/clip)]

---

### 2.4 Reasoning Applications

reasoning applications

[Foundation Models (Back-to-Top)](#2-foundation-models)

- `2022/06` | `Minerva` | [Solving Quantitative Reasoning Problems with Language Models](https://arxiv.org/abs/2206.14858)
\-
[[Paper](https://openreview.net/pdf?id=IFXTZERXdM7)]
[[Blog](https://blog.research.google/2022/06/minerva-solving-quantitative-reasoning.html)]

- `2022/06` | `BIG-bench` | [Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models](https://arxiv.org/abs/2206.04615)
\-
[[Paper](https://openreview.net/pdf?id=uyTL5Bvosj)]
[[Code](https://github.com/google/BIG-bench)]

- `2022/05` | `Zero-shot-CoT` | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916)
\-
[[Paper](https://openreview.net/pdf?id=e2TBb5y0yFf)]
[[Code](https://github.com/kojima-takeshi188/zero_shot_cot)]

- `2022/03` | `STaR` | [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465)
\-
[[Paper](https://openreview.net/pdf?id=_3ELRdg2sgI)]
[[Code](https://github.com/ezelikman/STaR)]

- `2021/07` | `MWP-BERT` | [MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving](https://arxiv.org/abs/2107.13435)
\-
[[Paper](https://aclanthology.org/2022.findings-naacl.74.pdf)]
[[Code](https://github.com/LZhenwen/MWP-BERT)]

- `2017/05` | `AQUA-RAT` | [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146)
\-
[[Paper](https://aclanthology.org/P17-1015.pdf)]
[[Code](https://github.com/google-deepmind/AQuA)]

---

## 3 Reasoning Tasks

reasoning tasks

[(Back-to-Top)](#table-of-contents)

### Table of Contents - 3

reasoning tasks (table of contents)

- [3 Reasoning Tasks](#3-reasoning-tasks)
- [3.1 Commonsense Reasoning](#31-commonsense-reasoning)
- [3.1.1 Commonsense Question and Answering (QA)](#311-commonsense-question-and-answering-qa)
- [3.1.2 Physical Commonsense Reasoning](#312-physical-commonsense-reasoning)
- [3.1.3 Spatial Commonsense Reasoning](#313-spatial-commonsense-reasoning)
- [3.1.x Benchmarks, Datasets, and Metrics](#31x-benchmarks-datasets-and-metrics)
- [3.2 Mathematical Reasoning](#32-mathematical-reasoning)
- [3.2.1 Arithmetic Reasoning](#321-arithmetic-reasoning)
- [3.2.2 Geometry Reasoning](#322-geometry-reasoning)
- [3.2.3 Theorem Proving](#323-theorem-proving)
- [3.2.4 Scientific Reasoning](#324-scientific-reasoning)
- [3.2.x Benchmarks, Datasets, and Metrics](#32x-benchmarks-datasets-and-metrics)
- [3.3 Logical Reasoning](#33-logical-reasoning)
- [3.3.1 Propositional Logic](#331-propositional-logic)
- [3.3.2 Predicate Logic](#332-predicate-logic)
- [3.3.x Benchmarks, Datasets, and Metrics](#33x-benchmarks-datasets-and-metrics)
- [3.4 Causal Reasoning](#34-causal-reasoning)
- [3.4.1 Counterfactual Reasoning](#341-counterfactual-reasoning)
- [3.4.x Benchmarks, Datasets, and Metrics](#34x-benchmarks-datasets-and-metrics)
- [3.5 Visual Reasoning](#35-visual-reasoning)
- [3.5.1 3D Reasoning](#351-3d-reasoning)
- [3.5.x Benchmarks, Datasets, and Metrics](#35x-benchmarks-datasets-and-metrics)
- [3.6 Audio Reasoning](#36-audio-reasoning)
- [3.6.1 Speech](#361-speech)
- [3.6.x Benchmarks, Datasets, and Metrics](#36x-benchmarks-datasets-and-metrics)
- [3.7 Multimodal Reasoning](#37-multimodal-reasoning)
- [3.7.1 Alignment](#371-alignment)
- [3.7.2 Generation](#372-generation)
- [3.7.3 Multimodal Understanding](#373-multimodal-understanding)
- [3.7.x Benchmarks, Datasets, and Metrics](#37x-benchmarks-datasets-and-metrics)
- [3.8 Agent Reasoning](#38-agent-reasoning)
- [3.8.1 Introspective Reasoning](#381-introspective-reasoning)
- [3.8.2 Extrospective Reasoning](#382-extrospective-reasoning)
- [3.8.3 Multi-agent Reasoning](#383-multi-agent-reasoning)
- [3.8.4 Driving Reasoning](#384-driving-reasoning)
- [3.8.x Benchmarks, Datasets, and Metrics](#38x-benchmarks-datasets-and-metrics)
- [3.9 Other Tasks and Applications](#39-other-tasks-and-applications)
- [3.9.1 Theory of Mind (ToM)](#391-theory-of-mind-tom)
- [3.9.2 LLMs for Weather Prediction](#392-llms-for-weather-prediction)
- [3.9.3 Abstract Reasoning](#393-abstract-reasoning)
- [3.9.4 Defeasible Reasoning](#394-defeasible-reasoning)
- [3.9.5 Medical Reasoning](#395-medical-reasoning)
- [3.9.6 Bioinformatics Reasoning](#396-bioinformatics-reasoning)
- [3.9.7 Long-Chain Reasoning](#397-long-chain-reasoning)

### 3.1 Commonsense Reasoning

commonsense reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.1 Commonsense Reasoning](#31-commonsense-reasoning)
- [3.1.1 Commonsense Question and Answering (QA)](#311-commonsense-question-and-answering-qa)
- [3.1.2 Physical Commonsense Reasoning](#312-physical-commonsense-reasoning)
- [3.1.3 Spatial Commonsense Reasoning](#313-spatial-commonsense-reasoning)
- [3.1.x Benchmarks, Datasets, and Metrics](#31x-benchmarks-datasets-and-metrics)


- `2023/12` | [Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models](https://arxiv.org/abs/2312.17661)
\-
[[Paper](https://arxiv.org/pdf/2312.17661.pdf)]
[[Code](https://github.com/EternityYW/Gemini-Commonsense-Evaluation/)]

- `2023/05` | `LLM-MCTS` | [Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://arxiv.org/abs/2305.14078)
\-
[[Paper](https://openreview.net/pdf?id=Wjp1AYB8lH)]
[[Code](https://github.com/1989Ryan/llm-mcts)]
[[Project](https://llm-mcts.github.io)]

- `2023/05` | Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation
\-
[[Paper](https://aclanthology.org/2023.findings-eacl.28.pdf)]
[[Code](https://github.com/LHRYANG/CommonGen)]

- `2022/11` | `DANCE` | [Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles](https://arxiv.org/abs/2211.16504)
\-
[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Ye_Improving_Commonsense_in_Vision-Language_Models_via_Knowledge_Graph_Riddles_CVPR_2023_paper.pdf)]
[[Code](https://github.com/pleaseconnectwifi/DANCE)]
[[Project](https://shuquanye.com/DANCE_website)]

- `2022/10` | `CoCoGen` | [Language Models of Code are Few-Shot Commonsense Learners](https://arxiv.org/abs/2210.07128)
\-
[[Paper](https://aclanthology.org/2022.emnlp-main.90.pdf)]
[[Code](https://github.com/reasoning-machines/CoCoGen)]

- `2021/10` | [A Systematic Investigation of Commonsense Knowledge in Large Language Models](https://arxiv.org/abs/2111.00607)
\-
[[Paper](https://aclanthology.org/2022.emnlp-main.812.pdf)]

- `2021/05` | [Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense](https://arxiv.org/abs/2105.05913)
\-
[[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9383453)]

#### 3.1.1 Commonsense Question and Answering (QA)

- `2019/06` | `CoS-E` | [Explain Yourself! Leveraging Language Models for Commonsense Reasoning](https://arxiv.org/abs/1906.02361)
\-
[[Paper](https://aclanthology.org/P19-1487.pdf)]
[[Code](https://github.com/salesforce/cos-e)]

- `2018/11` | `CQA` | [CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge](https://arxiv.org/abs/1811.00937)
\-
[[Paper](https://aclanthology.org/N19-1421.pdf)]
[[Code](https://github.com/jonathanherzig/commonsenseqa)]
[[Project](https://www.tau-nlp.sites.tau.ac.il/commonsenseqa)]

- `2016/12` | `ConceptNet` | [ConceptNet 5.5: An Open Multilingual Graph of General Knowledge](https://arxiv.org/abs/1612.03975)
\-
[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/11164)]
[[Project](https://conceptnet.io)]

#### 3.1.2 Physical Commonsense Reasoning

- `2023/10` | `NEWTON` | [NEWTON: Are Large Language Models Capable of Physical Reasoning?](https://arxiv.org/abs/2310.07018)
\-
[[Paper](https://arxiv.org/pdf/2310.07018.pdf)]
[[Code](https://github.com/NewtonReasoning/Newton)]
[[Project](https://newtonreasoning.github.io)]

- `2022/03` | `PACS` | [PACS: A Dataset for Physical Audiovisual CommonSense Reasoning](https://arxiv.org/abs/2203.11130)
\-
[[Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136970286.pdf)]
[[Code](https://github.com/samuelyu2002/PACS)]

- `2021/10` | `VRDP` | [Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language](https://arxiv.org/abs/2110.15358)
\-
[[Paper](https://openreview.net/pdf?id=lk1ORT35tbi)]
[[Code](https://github.com/dingmyu/VRDP)]

- `2020/05` | `ESPRIT` | [ESPRIT: Explaining Solutions to Physical Reasoning Tasks](https://arxiv.org/abs/2005.00730)
\-
[[Paper](https://aclanthology.org/2020.acl-main.706.pdf)]
[[Code](https://github.com/salesforce/esprit)]

- `2019/11` | `PIQA` | [PIQA: Reasoning about Physical Commonsense in Natural Language](https://arxiv.org/abs/1911.11641)
\-
[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/6239)]
[[Project](https://leaderboard.allenai.org/physicaliqa/submissions/public)]

#### 3.1.3 Spatial Commonsense Reasoning

- `2024/01` | `SpatialVLM`
| `Chen et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:267069344?fields=citationCount&query=%24.citationCount&label=citations)

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

[[arXiv](https://arxiv.org/abs/2401.12168)]
[[paper](https://arxiv.org/pdf/2401.12168.pdf)]
[[project](https://spatial-vlm.github.io/)]

- `2022/03` | [Things not Written in Text: Exploring Spatial Commonsense from Visual Signals](https://arxiv.org/abs/2203.08075)
\-
[[Paper](https://aclanthology.org/2022.acl-long.168.pdf)]
[[Code](https://github.com/xxxiaol/spatial-commonsense)]

- `2021/06` | `PROST` | [PROST: Physical Reasoning of Objects through Space and Time](https://arxiv.org/abs/2106.03634)
\-
[[Paper](https://aclanthology.org/2021.findings-acl.404.pdf)]
[[Code](https://github.com/nala-cub/prost)]

- `2019/02` | `GQA` | [GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering](https://arxiv.org/abs/1902.09506)
\-
[[Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)]
[[Project](https://cs.stanford.edu/people/dorarad/gqa/index.html)]

#### 3.1.x Benchmarks, Datasets, and Metrics

- `2023/06` | `CConS` | [Probing Physical Reasoning with Counter-Commonsense Context](https://arxiv.org/abs/2306.02258)
\-

- `2023/05` | `SummEdits` | [LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond](https://arxiv.org/abs/2305.14540)
\-
[[Paper](https://arxiv.org/pdf/2305.14540.pdf)]
[[Code](https://github.com/salesforce/factualNLG)]

- `2021/03` | `RAINBOW` | [UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark](https://arxiv.org/abs/2103.13009)
\-

- `2020/11` | `ProtoQA` | ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning
\-
[[Paper](https://aclanthology.org/2020.emnlp-main.85.pdf)]

- `2020/10` | `DrFact` | [Differentiable Open-Ended Commonsense Reasoning](https://arxiv.org/abs/2010.14439)

- `2019/11` | `CommonGen` | [CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning](https://arxiv.org/abs/1911.03705)

- `2019/08` | `Cosmos QA` | [Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning](https://arxiv.org/abs/1909.00277)

- `2019/08` | `αNLI` | [Abductive Commonsense Reasoning](https://arxiv.org/abs/1908.05739)
\-

- `2019/08` | `PHYRE` | [PHYRE: A New Benchmark for Physical Reasoning](https://arxiv.org/abs/1908.05656)
\-

- `2019/07` | `WinoGrande` | [WinoGrande: An Adversarial Winograd Schema Challenge at Scale](https://arxiv.org/abs/1907.10641)
\-

- `2019/05` | `MathQA` | [MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms](https://arxiv.org/abs/1905.13319)
\-

- `2019/05` | `HellaSwag` | [HellaSwag: Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830)
\-

- `2019/04` | `Social IQa` | [SocialIQA: Commonsense Reasoning about Social Interactions](https://arxiv.org/abs/1904.09728)
\-
[[Paper](https://aclanthology.org/D19-1454.pdf)]

- `2018/08` | `SWAG` | [SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference](https://arxiv.org/abs/1808.05326)
\-

- `2002/07` | `BLEU` | BLEU: a Method for Automatic Evaluation of Machine Translation
\-
[[Paper](https://aclanthology.org/P02-1040.pdf)]

---

### 3.2 Mathematical Reasoning

mathematical reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.2 Mathematical Reasoning](#32-mathematical-reasoning)
- [3.2.1 Arithmetic Reasoning](#321-arithmetic-reasoning)
- [3.2.2 Geometry Reasoning](#322-geometry-reasoning)
- [3.2.3 Theorem Proving](#323-theorem-proving)
- [3.2.4 Scientific Reasoning](#324-scientific-reasoning)
- [3.2.x Benchmarks, Datasets, and Metrics](#32x-benchmarks-datasets-and-metrics)


- `2023/10` | `MathVista` | [MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models](https://arxiv.org/abs/2310.02255)
\-
[[Paper](https://openreview.net/forum?id=KUNzEQMWU7)]
[[Code](https://github.com/lupantech/MathVista)]
[[Project](https://mathvista.github.io/)]
| `Lu et al., ICLR 2024`

- `2022/11` | Tokenization in the Theory of Knowledge
\-
[[Paper](https://www.mdpi.com/2673-8392/3/1/24)]

- `2022/06` | `MultiHiertt` | [MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data](https://arxiv.org/abs/2206.01347)

- `2021/04` | `MultiModalQA` | [MultiModalQA: Complex Question Answering over Text, Tables and Images](https://arxiv.org/abs/2104.06039)

- `2017/05` | [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146)

- `2014/04` | [Deep Learning in Neural Networks: An Overview](https://arxiv.org/abs/1404.7828)
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/S0893608014002135)]

- `2004` | Wittgenstein on philosophy of logic and mathematics
\-
[[Paper](https://www.pdcnet.org/gfpj/content/gfpj_2004_0025_0002_0227_0288)]

- `1989` | `CLP` | Connectionist Learning Procedures
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/0004370289900490)]

#### 3.2.1 Arithmetic Reasoning

[Mathematical Reasoning (Back-to-Top)](#32-mathematical-reasoning)

- `2022/09` | `PromptPG` | [Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning](https://arxiv.org/abs/2209.14610)

- `2022/01` | [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
\-

- `2021/03` | `SVAMP` | [Are NLP Models really able to Solve Simple Math Word Problems?](https://arxiv.org/abs/2103.07191)
\-
[[Paper](https://aclanthology.org/2021.naacl-main.168.pdf)]
[[Code](https://github.com/arkilpatel/SVAMP)]

- `2021/03` | `MATH` | [Measuring Mathematical Problem Solving With the MATH Dataset](https://arxiv.org/abs/2103.03874)
\-

- `2016/08` | [How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation](https://aclanthology.org/P16-1084/)
\-
[[Paper](https://aclanthology.org/P16-1084.pdf)]

- `2015/09` | [Learn to Solve Algebra Word Problems Using Quadratic Programming](https://aclanthology.org/D15-1096/)
\-
[[Paper](https://aclanthology.org/D15-1096.pdf)]

- `2014/06` | `Alg514` | [Learning to Automatically Solve Algebra Word Problems](https://aclanthology.org/P14-1026/)
\-
[[Paper](https://aclanthology.org/P14-1026.pdf)]

#### 3.2.2 Geometry Reasoning

[Mathematical Reasoning (Back-to-Top)](#32-mathematical-reasoning)

- `2024/01` | `AlphaGeometry` | Solving olympiad geometry without human demonstrations
\-
[[Paper](https://www.nature.com/articles/s41586-023-06747-5)]
[[Code](https://github.com/google-deepmind/alphageometry)]
[[Blog](https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/)]
| `Trinh et al., Nature`

- `2022/12` | `UniGeo` / `Geoformer` | [UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression](https://arxiv.org/abs/2212.02746)

- `2021/05` | `GeoQA` / `NGS` | [GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning](https://arxiv.org/abs/2105.14517)

- `2021/05` | `Geometry3K` / `Inter-GPS` | [Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning](https://arxiv.org/abs/2105.04165)

- `2015/09` | `GeoS` | [Solving Geometry Problems: Combining Text and Diagram Interpretation](https://aclanthology.org/D15-1171/)
\-
[[Paper](https://aclanthology.org/D15-1171.pdf)]

#### 3.2.3 Theorem Proving

[Mathematical Reasoning (Back-to-Top)](#32-mathematical-reasoning)

- `2020/10` | `Prover` | [LEGO-Prover: Neural Theorem Proving with Growing Libraries](https://arxiv.org/abs/2310.00656)
\-

- `2023/09` | `Lyra` | [Lyra: Orchestrating Dual Correction in Automated Theorem Proving](https://arxiv.org/abs/2309.15806)

- `2023/06` | `DT-Solver` | [DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function](https://aclanthology.org/2023.acl-long.706/)
\-
[[Paper](https://aclanthology.org/2023.acl-long.706.pdf)]

- `2023/05` | [Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving](https://arxiv.org/abs/2305.16366)

- `2023/03` | `Magnushammer` | [Magnushammer: A Transformer-based Approach to Premise Selection](https://arxiv.org/abs/2303.04488)

- `2022/10` | `DSP` | [Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs](https://arxiv.org/abs/2210.12283)
\-

- `2022/05` | [Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis](https://arxiv.org/abs/2205.14229)

- `2022/05` | [Autoformalization with Large Language Models](https://arxiv.org/abs/2205.12615)
\-
[[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/d0c6bc641a56bebee9d985b937307367-Paper-Conference.pdf)]

- `2022/05` | `HTPS` | [HyperTree Proof Search for Neural Theorem Proving](https://arxiv.org/abs/2205.11491)

- `2022/05` | `Thor` | [Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers](https://arxiv.org/abs/2205.10893)
\-

- `2022/02` | [Formal Mathematics Statement Curriculum Learning](https://arxiv.org/abs/2202.01344)
\-

- `2021/07` | `Lean 4` | [The Lean 4 Theorem Prover and Programming Language](https://link.springer.com/chapter/10.1007/978-3-030-79876-5_37)
\-

- `2021/02` | `TacticZero` | [TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning](https://arxiv.org/abs/2102.09756)
\-

- `2021/02` | `PACT` | [Proof Artifact Co-training for Theorem Proving with Language Models](https://arxiv.org/abs/2102.06203)
\-

- `2020/09` | `GPT-f` |[Generative Language Modeling for Automated Theorem Proving](https://arxiv.org/abs/2009.03393)
\-

- `2019/07` | [Formal Verification of Hardware Components in Critical Systems](https://www.hindawi.com/journals/wcmc/2020/7346763/)
\-
[[Paper](https://downloads.hindawi.com/journals/wcmc/2020/7346763.pdf?_gl=1*1yjtq1u*_ga*MjA3MTczMzQzOC4xNjk5NjE3NDI1*_ga_NF5QFMJT5V*MTY5OTYxNzQyNC4xLjEuMTY5OTYxNzQ2Ni4xOC4wLjA.&_ga=2.180805351.1310949615.1699617425-2071733438.1699617425)]

- `2019/06` | `Metamath` | A Computer Language for Mathematical Proofs
\-
[[Paper](http://de.metamath.org/downloads/metamath.pdf)]

- `2019/05` | `CoqGym` | [Learning to Prove Theorems via Interacting with Proof Assistants](https://arxiv.org/abs/1905.09381)

- `2018/12` | `AlphaZero` | [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/10.1126/science.aar6404)
\-
[[Paper](https://www.science.org/doi/pdf/10.1126/science.aar6404)]

- `2018/04` | `TacticToe` | [TacticToe: Learning to Prove with Tactics](https://arxiv.org/abs/1804.00596)

- `2015/08` | `Lean` | The Lean Theorem Prover (system description)
\-
[[Paper](https://lean-lang.org/papers/system.pdf)]

- `2010/07` | Three Years of Experience with Sledgehammer, a Practical Link between Automatic and Interactive Theorem Provers
\-
[[Paper](https://www.cl.cam.ac.uk/~lp15/papers/Automation/paar.pdf)]

- `2010/04` | Formal Methods at Intel - An Overview
\-
[[Slides](https://shemesh.larc.nasa.gov/NFM2010/talks/harrison.pdf)]

- `2005/07` | Combining Simulation and Formal Verification for Integrated Circuit Design Validation
\-
[[Paper](https://s2.smu.edu/~mitch/ftp_dir/pubs/wmsci05.pdf)]

- `2003` | Extracting a Formally Verified, Fully Executable
Compiler from a Proof Assistant
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/S1571066105825988/pdf?md5=10b884badea7fe0e46c38b9419fbcca6&pid=1-s2.0-S1571066105825988-main.pdf&_valck=1)]

- `1996` | `Coq` | The Coq Proof Assistant-Reference Manual
\-
[[Project](https://coq.inria.fr/documentation)]

- `1994` | `Isabelle` | Isabelle: A Generic Theorem Prover
\-
[[Paper](https://link.springer.com/content/pdf/10.1007/bfb0030558.pdf)]

#### 3.2.4 Scientific Reasoning

[Mathematical Reasoning (Back-to-Top)](#32-mathematical-reasoning)

- `2023/07` | `SciBench` | [SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models](https://arxiv.org/abs/2307.10635)
\-

- `2022/09` | `ScienceQA` | [Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](https://arxiv.org/abs/2209.09513)

- `2022/03` | `ScienceWorld` | [ScienceWorld: Is your Agent Smarter than a 5th Grader?](https://arxiv.org/abs/2203.07540)

- `2012` | Current Topics in Children's Learning and Cognition
\-
[[Book](https://www.intechopen.com/books/654)]

#### 3.2.x Benchmarks, Datasets, and Metrics

[Mathematical Reasoning (Back-to-Top)](#32-mathematical-reasoning)

- `2024/01` | `MathBench`
![Star](https://img.shields.io/github/stars/open-compass/MathBench.svg?style=social&label=Star)

MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

[[code](https://github.com/open-compass/MathBench)]

- `2023/08` | `Math23K-F` / `MAWPS-F` / `FOMAS` | [Guiding Mathematical Reasoning via Mastering Commonsense Formula Knowledge](https://dl.acm.org/doi/abs/10.1145/3580305.3599375)
\-
[[Paper](https://dl.acm.org/doi/pdf/10.1145/3580305.3599375)]

- `2023/07` | `ARB` | [ARB: Advanced Reasoning Benchmark for Large Language Models](https://arxiv.org/abs/2307.13692)
\-

- `2023/05` | `SwiftSage` | [SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks](https://arxiv.org/abs/2305.17390)
\-

- `2023/05` | `TheoremQA` | [TheoremQA: A Theorem-driven Question Answering dataset](https://arxiv.org/abs/2305.12524)
\-

- `2022/10` | `MGSM` | [Language Models are Multilingual Chain-of-Thought Reasoners](https://arxiv.org/abs/2210.03057)
\-
[[Paper](https://openreview.net/pdf?id=fR3wGCk-IXp)]
[[Code](https://github.com/google-research/url-nlp)]

- `2021/10` | `GSM8K` | [Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168)
\-
[[Paper](https://arxiv.org/pdf/2110.14168.pdf)]
[[Code](https://github.com/openai/grade-school-math)]
[[Blog](https://openai.com/research/solving-math-word-problems)]

- `2021/10` | `IconQA` | [IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning](https://arxiv.org/abs/2110.13214)
\-

- `2021/09` | `FinQA` | [FinQA: A Dataset of Numerical Reasoning over Financial Data](https://arxiv.org/abs/2109.00122)
\-

- `2021/08` | `MBPP` / `MathQA-Python` | [Program Synthesis with Large Language Models](https://arxiv.org/abs/2108.07732)

- `2021/08` | `HiTab` / `EA` | [HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation](https://arxiv.org/abs/2108.06712)

- `2021/07` | `HumanEval` / `Codex` | [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374)
\-

- `2021/06` | `ASDiv` / `CLD` | [A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers](https://arxiv.org/abs/2106.15772)
\-

- `2021/06` | `AIT-QA` | [AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry](https://arxiv.org/abs/2106.12944)
\-

- `2021/05` | `APPS` | [Measuring Coding Challenge Competence With APPS](https://arxiv.org/abs/2105.09938)
\-

- `2021/05` | `TAT-QA` | [TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance](https://arxiv.org/abs/2105.07624)

- `2021/03` | `SVAMP` | [Are NLP Models really able to Solve Simple Math Word Problems?](https://arxiv.org/abs/2103.07191)
\-

- `2021/01` | `TSQA` / `MAP` / `MRR` | [TSQA: Tabular Scenario Based Question Answering](https://arxiv.org/abs/2101.11429)

- `2020/10` | `HMWP` | [Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems](https://arxiv.org/abs/2010.06823)
\-

- `2020/04` | `HybridQA` | [HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data](https://arxiv.org/abs/2004.07347)

- `2019/03` | `DROP` | [DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs](https://arxiv.org/abs/1903.00161)
\-

- `2019` | `NaturalQuestions` | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/)
\-
[[Paper](https://aclanthology.org/Q19-1026.pdf)]

- `2018/09` | `HotpotQA` | [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
\-

- `2018/09` | `Spider` | [Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task](https://arxiv.org/abs/1809.08887)
\-

- `2018/03` | `ComplexWebQuestions` | [The Web as a Knowledge-base for Answering Complex Questions](https://arxiv.org/abs/1803.06643)
\-

- `2017/12` | `MetaQA` | [Variational Reasoning for Question Answering with Knowledge Graph](https://arxiv.org/abs/1709.04071)
\-

- `2017/09` | `GEOS++` | [From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems](https://aclanthology.org/D17-1081/)
\-
[[Paper](https://aclanthology.org/D17-1081.pdf)]

- `2017/09` | `Math23k` | [Deep Neural Solver for Math Word Problems](https://aclanthology.org/D17-1088/)
\-
[[Paper](https://aclanthology.org/D17-1088.pdf)]

- `2017/08` | `WikiSQL` / `Seq2SQL` | [Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning](https://arxiv.org/abs/1709.00103)
\-

- `2017/08` | [Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks](https://aclanthology.org/S17-1029/)
\-
[[Paper](https://aclanthology.org/S17-1029.pdf)]

- `2017/05` | `TriviaQA` | [TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension](https://arxiv.org/abs/1705.03551)
\-

- `2017/05` | `GeoShader` | Synthesis of Solutions for Shaded Area Geometry Problems
\-
[[Paper](https://cdn.aaai.org/ocs/15416/15416-68619-1-PB.pdf)]

- `2016/09` | `DRAW-1K` | [Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems](https://arxiv.org/abs/1609.07197)
\-

- `2016/08` | `WebQSP` | [The Value of Semantic Parse Labeling for Knowledge Base Question Answering](https://aclanthology.org/P16-2033/)
\-
[[Paper](https://aclanthology.org/P16-2033.pdf)]

- `2016/06` | `SQuAD` | [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250)
\-

- `2016/06` | `WikiMovies` | [Key-Value Memory Networks for Directly Reading Documents](https://arxiv.org/abs/1606.03126)
\-

- `2016/06` | `MAWPS` | [MAWPS: A Math Word Problem Repository](https://aclanthology.org/N16-1136/)
\-
[[Paper](https://aclanthology.org/N16-1136.pdf)]

- `2015/09` | `Dolphin1878` | [Automatically Solving Number Word Problems by Semantic Parsing and Reasoning](https://aclanthology.org/D15-1135/)
\-
[[Paper](https://aclanthology.org/D15-1135.pdf)]

- `2015/08` | `WikiTableQA` | [Compositional Semantic Parsing on Semi-Structured Tables](https://arxiv.org/abs/1508.00305)
\-

- `2015` | `SingleEQ` | [Parsing Algebraic Word Problems into Equations](https://aclanthology.org/Q15-1042/)
\-
[[Paper](https://aclanthology.org/Q15-1042.pdf)]

- `2015` | `DRAW` | DRAW: A Challenging and Diverse Algebra Word Problem Set
\-
[[Paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tech_rep.pdf)]

- `2014/10` | `Verb395` | [Learning to Solve Arithmetic Word Problems with Verb Categorization](https://aclanthology.org/D14-1058/)
\-
[[Paper](https://aclanthology.org/D14-1058.pdf)]

- `2013/10` | `WebQuestions` | [Semantic Parsing on Freebase from Question-Answer Pairs](https://aclanthology.org/D13-1160/)
\-
[[Paper](https://aclanthology.org/D13-1160.pdf)]

- `2013/08` | `Free917` | [Large-scale Semantic Parsing via Schema Matching and Lexicon Extension](https://aclanthology.org/P13-1042/)
\-
[[Paper](https://aclanthology.org/P13-1042.pdf)]

- `2002/04` | `NMI` | [Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions](https://dl.acm.org/doi/10.1162/153244303321897735)
\-
[[Paper](https://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf)]

- `1990` | `ATIS` | [The ATIS Spoken Language Systems Pilot Corpus](https://aclanthology.org/H90-1021/)
\-
[[Paper](https://aclanthology.org/H90-1021.pdf)]

---

### 3.3 Logical Reasoning

logical reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.3 Logical Reasoning](#33-logical-reasoning)
- [3.3.1 Propositional Logic](#331-propositional-logic)
- [3.3.2 Predicate Logic](#332-predicate-logic)
- [3.3.x Benchmarks, Datasets, and Metrics](#33x-benchmarks-datasets-and-metrics)


- `2023/10` | `LogiGLUE` | [Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models](https://arxiv.org/abs/2310.00836)
\-

- `2023/05` | `LogicLLM` | [LogicLLM: Exploring Self-supervised Logic-enhanced Training for Large Language Models](https://arxiv.org/abs/2305.13718)
\-

- `2023/05` | `Logic-LM` | [Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning](https://arxiv.org/abs/2305.12295)
\-

- `2023/03` | `LEAP` | [Explicit Planning Helps Language Models in Logical Reasoning](https://arxiv.org/abs/2303.15714)
\-

- `2023/03` | [Sparks of Artificial General Intelligence: Early experiments with GPT-4](https://arxiv.org/abs/2303.12712)
\-

- `2022/10` | `Entailer` | [Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning](https://arxiv.org/abs/2210.12217)
\-

- `2022/06` | `NeSyL` | [Weakly Supervised Neural Symbolic Learning for Cognitive Tasks](https://ojs.aaai.org/index.php/AAAI/article/view/20533)
\-
[[Paper](https://ojs.aaai.org/index.php/AAAI/article/view/20533/20292)]

- `2022/05` | `NeuPSL` | [NeuPSL: Neural Probabilistic Soft Logic](https://arxiv.org/abs/2205.14268)
\-

- `2022/05` | `NLProofS` | [Generating Natural Language Proofs with Verifier-Guided Search](https://arxiv.org/abs/2205.12443)
\-

- `2022/05` | `Least-to-Most Prompting` | [Least-to-Most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625)
\-

- `2022/05` | `SI` | [Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning](https://arxiv.org/abs/2205.09712)
\-

- `2022/05` | `MERIt` | [MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning](https://aclanthology.org/2022.findings-acl.276/)
\-

- `2022/03` | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171)
\-

- `2021/11` | `NSPS` | [Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design](https://proceedings.mlr.press/v155/sun21a.html)
\-
[[Paper](https://proceedings.mlr.press/v155/sun21a/sun21a.pdf)]

- `2021/09` | `DeepProbLog` | [Neural probabilistic logic programming in DeepProbLog](https://www.sciencedirect.com/science/article/pii/S0004370221000552)
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/S0004370221000552/pdfft?md5=1e6b82d50854f317478e487da9e75473&pid=1-s2.0-S0004370221000552-main.pdf)]

- `2021/08` | `GABL` | [Abductive Learning with Ground Knowledge Base](https://www.ijcai.org/proceedings/2021/250)
\-
[[Paper](https://www.ijcai.org/proceedings/2021/0250.pdf)]

- `2021/05` | `LReasoner` | [Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text](https://arxiv.org/abs/2105.03659)
\-

- `2020/02` | `RuleTakers` | [Transformers as Soft Reasoners over Language](https://arxiv.org/abs/2002.05867)
\-

- `2019/12` | `NMN-Drop` | [Neural Module Networks for Reasoning over Text](https://arxiv.org/abs/1912.04971)
\-

- `2019/04` | `NS-CL` | [The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision](https://arxiv.org/abs/1904.12584)
\-

- `2012` | Logical Reasoning and Learning
\-
[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-1428-6_790#:~:text=Logical%20reasoning%20is%20a%20form,of%20science%20and%20artificial%20intelligence.)]

#### 3.3.1 Propositional Logic

- `2022/09` | Propositional Reasoning via Neural Transformer
Language Models
\-
[[Paper](https://www.cs.cmu.edu/~oscarr/pdf/publications/2022_nesy.pdf)]

#### 3.3.2 Predicate Logic

- `2021/06` | `ILP` | [Inductive logic programming at 30](https://link.springer.com/article/10.1007/s10994-021-06089-1)
\-
[[Paper](https://link.springer.com/content/pdf/10.1007/s10994-021-06089-1.pdf)]

- `2011` | Statistical Relational Learning
\-
[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-0-387-30164-8_786)]

#### 3.3.x Benchmarks, Datasets, and Metrics

- `2022/10` | `PrOntoQA` | [Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought](https://arxiv.org/abs/2210.01240)
\-

- `2022/09` | `FOLIO` | [FOLIO: Natural Language Reasoning with First-Order Logic](https://arxiv.org/abs/2209.00840)
\-

- `2022/06` | `BIG-bench` | [Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models](https://arxiv.org/abs/2206.04615)
\-
[[Paper](https://openreview.net/pdf?id=uyTL5Bvosj)]
[[Code](https://github.com/google/BIG-bench)]

- `2021/04` | `AR-LSAT` | [AR-LSAT: Investigating Analytical Reasoning of Text](https://arxiv.org/pdf/2104.06598)
\-
[[Paper](https://aclanthology.org/2022.findings-naacl.177/)]
[[Code](https://github.com/zhongwanjun/AR-LSAT)]

- `2020/12` | `ProofWriter` [ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language](https://arxiv.org/abs/2012.13048)
\-

---

### 3.4 Causal Reasoning

causal reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.4 Causal Reasoning](#34-causal-reasoning)
- [3.4.1 Counterfactual Reasoning](#341-counterfactual-reasoning)
- [3.4.x Benchmarks, Datasets, and Metrics](#34x-benchmarks-datasets-and-metrics)


- `2023/08` | [Causal Parrots: Large Language Models May Talk Causality But Are Not Causal](https://arxiv.org/abs/2308.13067)

- `2023/07` | [Causal Discovery with Language Models as Imperfect Experts](https://arxiv.org/abs/2307.02390)
\-

- `2023/06` | [From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data](https://arxiv.org/abs/2306.16902)
\-

- `2023/06` | `Corr2Cause` | [Can Large Language Models Infer Causation from Correlation?](https://arxiv.org/abs/2306.05836)
\-

- `2023/05` | `Code-LLMs` | [The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code](https://arxiv.org/abs/2305.19213)
\-

- `2023/04` | [Understanding Causality with Large Language Models: Feasibility and Opportunities](https://arxiv.org/abs/2304.05524)
\-

- `2023/04` | [Causal Reasoning and Large Language Models: Opening a New Frontier for Causality](https://arxiv.org/abs/2305.00050)
\-

- `2023/03` | [Can large language models build causal graphs?](https://arxiv.org/abs/2303.05279)
\-

- `2023/01` | [Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis](https://arxiv.org/abs/2301.13819)
\-

- `2022/09` | [Probing for Correlations of Causal Facts: Large Language Models and Causality](https://openreview.net/forum?id=UPwzqPOs4-)
\-
[[Paper](https://openreview.net/pdf?id=UPwzqPOs4-)]

- `2022/07` | [Can Large Language Models Distinguish Cause from Effect?](https://openreview.net/forum?id=ucHh-ytUkOH&referrer=%5Bthe%20profile%20of%20Mrinmaya%20Sachan%5D(%2Fprofile%3Fid%3D~Mrinmaya_Sachan3))
\-
[[Paper](https://openreview.net/pdf?id=ucHh-ytUkOH)]

- `2021/08` | [Learning Faithful Representations of Causal Graphs](https://aclanthology.org/2021.acl-long.69/)
\-
[[Paper](https://aclanthology.org/2021.acl-long.69.pdf)]

- `2021/05` | `InferBERT` | [InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance](https://www.frontiersin.org/articles/10.3389/frai.2021.659622/full)
\-
[[Paper](https://www.frontiersin.org/articles/10.3389/frai.2021.659622/pdf?isPublishedV2=False)]

- `2021/02` | [Towards Causal Representation Learning](https://arxiv.org/abs/2102.11107)
\-

- `2020/05` | `CausaLM` | [CausaLM: Causal Model Explanation Through Counterfactual Language Models](https://arxiv.org/abs/2005.13407)
\-

- `2019/06` | [Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation](https://arxiv.org/abs/1906.01732)
\-

- `2017` | [Elements of Causal Inference: Foundations and Learning Algorithms](https://mitpress.mit.edu/9780262037310/elements-of-causal-inference/)
\-
[[Book](https://library.oapen.org/bitstream/id/056a11be-ce3a-44b9-8987-a6c68fce8d9b/11283.pdf)]

- `2016` | Actual Causality
\-
[[Book](https://direct.mit.edu/books/oa-monograph/3451/Actual-Causality)]

- `2013` | Causal Reasoning
\-
[[Paper](https://psycnet.apa.org/record/2012-26298-046)]

#### 3.4.1 Counterfactual Reasoning

- `2023/07` | [Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks](https://arxiv.org/abs/2307.02477)
\-

- `2023/05` | [Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios](https://arxiv.org/abs/2305.16572)
\-

- `2007` | The Rational Imagination: How People Create Alternatives to Reality
\-
[[Paper](https://scholar.archive.org/work/zjwdgk7r6vefxaole362qftqji/access/wayback/http://www.tara.tcd.ie/bitstream/handle/2262/39428/Precis%20of%20The%20Rational%20Imagination%20-%20How%20People%20Create%20Alternatives%20to%20Reality.pdf?sequence=1)]

- `1986` | Norm theory: Comparing reality to its alternatives
\-
[[Paper](https://psycnet.apa.org/record/1986-21899-001)]

#### 3.4.x Benchmarks, Datasets, and Metrics

- `2021/12` | `CRASS` | [CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models](https://arxiv.org/abs/2112.11941)
\-

- `2021/08` | `Arctic sea ice` | [Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere](https://www.frontiersin.org/articles/10.3389/fdata.2021.642182/full)
\-
[[Paper](https://www.frontiersin.org/articles/10.3389/fdata.2021.642182/pdf?isPublishedV2=False)]

- `2014/12` | `CauseEffectPairs` | [Distinguishing cause from effect using observational data: methods and benchmarks](https://arxiv.org/abs/1412.3773)
\-

---

### 3.5 Visual Reasoning

visual reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.5 Visual Reasoning](#35-visual-reasoning)
- [3.5.1 3D Reasoning](#351-3d-reasoning)
- [3.5.x Benchmarks, Datasets, and Metrics](#35x-benchmarks-datasets-and-metrics)


- `2022/11` | `G-VUE` | [Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation](https://arxiv.org/abs/2211.15402)
\-

- `2021/03` | `VLGrammar` | [VLGrammar: Grounded Grammar Induction of Vision and Language](https://arxiv.org/abs/2103.12975)
\-

- `2020/12` | [Attention over learned object embeddings enables complex visual reasoning](https://arxiv.org/abs/2012.08508)
\-

#### 3.5.1 3D Reasoning

- `2023/08` | `PointLLM` | [PointLLM: Empowering Large Language Models to Understand Point Clouds](https://arxiv.org/abs/2308.16911)
\-

- `2023/08` | `3D-VisTA` | [3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment](https://arxiv.org/abs/2308.04352)
\-

- `2023/07` | `3D-LLM` | [3D-LLM: Injecting the 3D World into Large Language Models](https://arxiv.org/abs/2307.12981)
\-

- `2022/10` | `SQA3D` | [SQA3D: Situated Question Answering in 3D Scenes](https://arxiv.org/abs/2210.07474)
\-

#### 3.5.x Benchmarks, Datasets, and Metrics

- `2021/12` | `PTR` | [PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning](https://arxiv.org/abs/2112.05136)
\-

- `2019/05` | `OK-VQA` | [OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge](https://arxiv.org/abs/1906.00067)
\-

- `2016/12` | `CLEVR` | [CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning](https://arxiv.org/abs/1612.06890)
\-

---

### 3.6 Audio Reasoning

audio reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.6 Audio Reasoning](#36-audio-reasoning)
- [3.6.1 Speech](#361-speech)
- [3.6.x Benchmarks, Datasets, and Metrics](#36x-benchmarks-datasets-and-metrics)


- `2023/11` | `M2UGen` | [M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models](https://arxiv.org/abs/2311.11255)
\-
[[Paper](https://arxiv.org/pdf/2311.11255.pdf)]
[[Code](https://github.com/crypto-code/M2UGen)]

- `2023/08` | `MU-LLaMA` | [Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning](https://arxiv.org/abs/2308.11276)
\-
[[Paper](https://arxiv.org/pdf/2308.11276.pdf)]
[[Code](https://github.com/crypto-code/MU-LLaMA)]

- `2022/05` | [Self-Supervised Speech Representation Learning: A Review](https://arxiv.org/abs/2205.10643)
\-

#### 3.6.1 Speech

- `2022/03` | `SUPERB-SG` | [SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities](https://arxiv.org/abs/2203.06849)
\-

- `2022/02` | `Data2Vec` | [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555)
\-

- `2021/10` | `WavLM` | [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900)
\-

- `2021/06` | `HuBERT` | [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447)
\-

- `2021/05` | `SUPERB` | [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
\-

- `2020/10` | `Speech SIMCLR` | [Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning](https://arxiv.org/abs/2010.13991)
\-

- `2020/06` | `Wav2Vec 2.0` | [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)
\-

- `2020/05` | `Conformer` | [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
\-

- `2019/10` | `Mockingjay` | [Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders](https://arxiv.org/abs/1910.12638)
\-

- `2019/04` | `APC` | [An Unsupervised Autoregressive Model for Speech Representation Learning](https://arxiv.org/abs/1904.03240)
\-

- `2018/07` | `CPC` | [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748)
\-

- `2018/04` | `Speech-Transformer` | Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
\-
[[Paper](https://ieeexplore.ieee.org/document/8462506)]

- `2017/11` | `VQ-VAE` | [Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937)
\-

- `2017/08` | [Large-Scale Domain Adaptation via Teacher-Student Learning](https://arxiv.org/abs/1708.05466)
\-

#### 3.6.x Benchmarks, Datasets, and Metrics

- `2022/03` | `SUPERB-SG` | [SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities](https://arxiv.org/abs/2203.06849)
\-

- `2021/11` | `VoxPopuli` / `XLS-R` | [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296)
\-

- `2021/05` | `SUPERB` | [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
\-

- `2020/12` | `Multilingual LibriSpeech` | [MLS: A Large-Scale Multilingual Dataset for Speech Research](https://arxiv.org/abs/2012.03411)
\-

- `2020/05` | `Didi Dictation` / `Didi Callcenter` | [A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition](https://arxiv.org/abs/2005.09862)
\-

- `2019/12` | `Libri-Light` | [Libri-Light: A Benchmark for ASR with Limited or No Supervision](https://arxiv.org/abs/1912.07875)
\-

- `2019/12` | `Common Voice` | [Common Voice: A Massively-Multilingual Speech Corpus](https://arxiv.org/abs/1912.06670)
\-

---

### 3.7 Multimodal Reasoning

multimodal reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.7 Multimodal Reasoning](#37-multimodal-reasoning)
- [3.7.1 Alignment](#371-alignment)
- [3.7.2 Generation](#372-generation)
- [3.7.3 Multimodal Understanding](#373-multimodal-understanding)
- [3.7.x Benchmarks, Datasets, and Metrics](#37x-benchmarks-datasets-and-metrics)


- `2023/12` | [A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise]()
\-
[[Paper](https://arxiv.org/pdf/2312.12436.pdf)]
[[Project](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)]

#### 3.7.1 Alignment

- `2023/01` | `BLIP-2` | [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597)
\-
[[Paper](https://proceedings.mlr.press/v202/li23q.html)]
[[Code](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)]

#### 3.7.2 Generation

- `2023/10` | `DALL·E 3` | Improving Image Generation with Better Captions
\-
[[Paper](https://cdn.openai.com/papers/dall-e-3.pdf)]
[[Project](https://openai.com/dall-e-3)]

- `2023/06` | `Kosmos-2` | [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824)
\-

- `2023/05` | `BiomedGPT` | [BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks](https://arxiv.org/abs/2305.17100)
\-

- `2023/03` | `Visual ChatGPT` | [Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/abs/2303.04671)
\-

- `2023/02` | `Kosmos-1` | [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045)
\-

- `2022/07` | `Midjourney`
\-
[[Project](https://www.midjourney.com/home)]

- `2022/04` | `Flamingo` | [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
\-

- `2021/12` | `MAGMA` | [MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning](https://arxiv.org/abs/2112.05253)
\-

#### 3.7.3 Multimodal Understanding

- `2023/09` | `Q-Bench` | [Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision](https://arxiv.org/abs/2309.14181)
\-
[[Paper](https://arxiv.org/pdf/2309.14181.pdf)]
[[Code](https://github.com/Q-Future/Q-Bench)]

- `2023/05` | `DetGPT` | [DetGPT: Detect What You Need via Reasoning](https://arxiv.org/abs/2305.14167)
\-

- `2023/03` | `Vicuna` | Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
\-
[[Blog](https://lmsys.org/blog/2023-03-30-vicuna/)]
[[Code](https://github.com/lm-sys/FastChat)]

- `2022/12` | `DePlot` | [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505)
\-

- `2022/12` | `MatCha` | [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662)
\-

#### 3.7.x Benchmarks, Datasets, and Metrics

- `2023/06` | `LVLM-eHub` | [LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models](https://arxiv.org/abs/2306.09265)
\-

- `2023/06` | `LAMM` | [LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark](https://arxiv.org/abs/2306.06687)
\-

- `2023/05` | `AttackVLM` | [On Evaluating Adversarial Robustness of Large Vision-Language Models](https://arxiv.org/abs/2305.16934)
\-

- `2023/05` | `POPE` | [Evaluating Object Hallucination in Large Vision-Language Models](https://arxiv.org/abs/2305.10355)
\-

- `2023/05` | `MultimodalOCR` | [On the Hidden Mystery of OCR in Large Multimodal Models](https://arxiv.org/abs/2305.07895)
\-

- `2022/10` | `ObjMLM` | [Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training](https://arxiv.org/abs/2210.07688)

- `2022/06` | `RAVEN` / `ARC` | [Evaluating Understanding on Conceptual Abstraction Benchmarks](https://arxiv.org/abs/2206.14187)
\-

- `2021/06` | `LARC` | [Communicating Natural Programs to Humans and Machines](https://arxiv.org/abs/2106.07824)
\-

- `2014/11` | `CIDEr` / `PASCAL-50S` / `ABSTRACT-50S` | [CIDEr: Consensus-based Image Description Evaluation](https://arxiv.org/abs/1411.5726)
\-

---

### 3.8 Agent Reasoning

agent reasoning

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.8 Agent Reasoning](#38-agent-reasoning)
- [3.8.1 Introspective Reasoning](#381-introspective-reasoning)
- [3.8.2 Extrospective Reasoning](#382-extrospective-reasoning)
- [3.8.3 Multi-agent Reasoning](#383-multi-agent-reasoning)
- [3.8.4 Driving Reasoning](#384-driving-reasoning)
- [3.8.x Benchmarks, Datasets, and Metrics](#38x-benchmarks-datasets-and-metrics)


- `2024/01` | `AutoRT`
| `Ahn et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:266906759?fields=citationCount&query=%24.citationCount&label=citations)

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

[[arXiv](https://arxiv.org/abs/2401.12963)]
[[paper](https://arxiv.org/pdf/2401.12963.pdf)]
[[project](https://auto-rt.github.io/)]

- `2023/11` | `OpenFlamingo` | [Vision-Language Foundation Models as Effective Robot Imitators](https://arxiv.org/abs/2311.01378)
\-

- `2023/07` | `RT-2` | [RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control](https://arxiv.org/abs/2307.15818)
\-

- `2023/05` | `RAP` | [Reasoning with Language Model is Planning with World Model](https://arxiv.org/abs/2305.14992)
\-

- `2023/03` | `PaLM-E`
| `Driess et al., ICML 2023`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:257364842?fields=citationCount&query=%24.citationCount&label=citations)

PaLM-E: An Embodied Multimodal Language Model

[[arXiv](https://arxiv.org/abs/2303.03378)]
[[paper](https://icml.cc/virtual/2023/poster/23969)]
[[project](https://palm-e.github.io/)]

- `2022/12` | `RT-1` | [RT-1: Robotics Transformer for Real-World Control at Scale](https://arxiv.org/abs/2212.06817)

- `2022/10` | [Skill Induction and Planning with Latent Language](https://arxiv.org/abs/2110.01517)
\-

- `2022/05` | `Gato` | [A Generalist Agent](https://arxiv.org/abs/2205.06175)
\-

- `2022/04` | `SMs` | [Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language](https://arxiv.org/abs/2204.00598)
\-

- `2022/02` | | [Pre-Trained Language Models for Interactive Decision-Making](https://arxiv.org/abs/2202.01771)
\-

- `2022/01` | `Language-Planner` | [Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents](https://arxiv.org/abs/2201.07207)
\-

- `2021/11` | [Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning](https://arxiv.org/abs/2111.03189)
\-

- `2020/09` | [Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions](https://arxiv.org/abs/2009.14259)
\-

- `2016/01` | `AlphaGo` | Mastering the game of Go with deep neural networks and tree search
\-
[[Paper](https://www.nature.com/articles/nature16961)]

- `2014/05` | Gesture in reasoning: An embodied perspective
\-
[[Paper](https://www.taylorfrancis.com/chapters/edit/10.4324/9781315775845-19/gesture-reasoning-martha-alibali-rebecca-boncoddo-autumn-hostetter)]

#### 3.8.1 Introspective Reasoning

- `2022/11` | `PAL` | [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435)
\-

- `2022/09` | `ProgPrompt` | [ProgPrompt: Generating Situated Robot Task Plans using Large Language Models](https://arxiv.org/abs/2209.11302)
\-

- `2022/09` | `Code as Policies` | [Code as Policies: Language Model Programs for Embodied Control](https://arxiv.org/abs/2209.07753)
\-

- `2022/04` | `SayCan` | [Do As I Can, Not As I Say: Grounding Language in Robotic Affordances](https://arxiv.org/abs/2204.01691)
\-

- `2012` | Introspective Learning and Reasoning
\-
[[Paper](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-1428-6_1802)]

#### 3.8.2 Extrospective Reasoning

- `2023/06` | `Statler` | [Statler: State-Maintaining Language Models for Embodied Reasoning](https://arxiv.org/abs/2306.17840)
\-

- `2023/02` | `Planner-Actor-Reporter` | [Collaborating with language models for embodied reasoning](https://arxiv.org/abs/2302.00763)
\-

- `2023/02` | `Toolformer` | [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
\-

- `2022/12` | `LLM-Planner` | [LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models](https://arxiv.org/abs/2212.04088)
\-

- `2022/10` | `ReAct` | [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
\-

- `2022/10` | `Self-Ask` | [Measuring and Narrowing the Compositionality Gap in Language Models](https://arxiv.org/abs/2210.03350)
\-

- `2022/07` | `Inner Monologue` | [Inner Monologue: Embodied Reasoning through Planning with Language Models](https://arxiv.org/abs/2207.05608)
\-

#### 3.8.3 Multi-agent Reasoning

- `2023/07` | `Federated LLM` | [Federated Large Language Model: A Position Paper](https://arxiv.org/abs/2307.08925)
\-

- `2023/07` | [Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems](https://arxiv.org/abs/2307.06187)
\-

- `2023/07` | `Co-LLM-Agents` | [Building Cooperative Embodied Agents Modularly with Large Language Models](https://arxiv.org/abs/2307.02485)
\-

- `2023/05` | [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325)
\-

- `2017/02` | `FIoT` | FIoT: An agent-based framework for self-adaptive and self-organizing applications based on the Internet of Things
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/S0020025516313664)]

- `2004` | A Practical Guide to the IBM Autonomic Computing Toolkit
\-
[[Book](https://books.google.com.hk/books/about/A_Practical_Guide_to_the_IBM_Autonomic_C.html?id=XHeoSgAACAAJ&redir_esc=y)]

#### 3.8.4 Driving Reasoning

- `2023/12` | `DriveLM` | [DriveLM: Driving with Graph Visual Question Answering](https://arxiv.org/abs/2312.14150)
\-
[[Paper](https://arxiv.org/pdf/2312.14150.pdf)]
[[Code](https://github.com/OpenDriveLab/DriveLM)]

- `2023/12` | `LiDAR-LLM` | [LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding](https://arxiv.org/abs/2312.14074)
\-
[[Paper](https://arxiv.org/pdf/2312.14074.pdf)]
[[Project](https://sites.google.com/view/lidar-llm)]

- `2023/12` | `DriveMLM` | [DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving](https://arxiv.org/abs/2312.09245)
\-
[[Paper](https://arxiv.org/pdf/2312.09245.pdf)]
[[Code](https://github.com/OpenGVLab/DriveMLM)]

- `2023/12` | `LMDrive` | [LMDrive: Closed-Loop End-to-End Driving with Large Language Models](https://arxiv.org/abs/2312.07488)
\-
[[Paper](https://arxiv.org/pdf/2312.07488.pdf)]
[[Code](https://github.com/opendilab/LMDrive)]

- `2023/10` | [Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving](https://arxiv.org/abs/2310.16639)
\-

- `2023/10` | [Vision Language Models in Autonomous Driving and Intelligent Transportation Systems](https://arxiv.org/abs/2310.14414)
\-

- `2023/10` | `DriveGPT4` | [DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model](https://arxiv.org/abs/2310.01412)
\-

- `2023/09` | `MotionLM` | [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://arxiv.org/abs/2309.16534)
\-

- `2023/06` | [End-to-end Autonomous Driving: Challenges and Frontiers](https://arxiv.org/abs/2306.16927)
\-

- `2023/04` | [Graph-based Topology Reasoning for Driving Scenes](https://arxiv.org/abs/2304.05277)
\-

- `2022/09` | [Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe](https://arxiv.org/abs/2209.05324)
\-

- `2021/11` | Artificial intelligence: A powerful paradigm for scientific research
\-
[[Paper](https://www.sciencedirect.com/science/article/pii/S2666675821001041)]

#### 3.8.x Benchmarks, Datasets, and Metrics

- `2023/12` | `DriveLM` | [DriveLM: Driving with Graph Visual Question Answering](https://arxiv.org/abs/2312.14150)
\-
[[Paper](https://arxiv.org/pdf/2312.14150.pdf)]
[[Code](https://github.com/OpenDriveLab/DriveLM)]

- `2023/09` | `NuPrompt` / `PromptTrack` | [Language Prompt for Autonomous Driving](https://arxiv.org/abs/2309.04379)
\-

- `2023/07` | `LCTGen` | [Language Conditioned Traffic Generation](https://arxiv.org/abs/2307.07947)

- `2023/05` | `NuScenes-QA` | [NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario](https://arxiv.org/abs/2305.14836)
\-

- `2022/06` | `BEHAVIOR-1K` | [BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation](https://proceedings.mlr.press/v205/li23a.html)
\-

- `2021/08` | `iGibson` | [iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks](https://arxiv.org/abs/2108.03272)
\-

- `2021/06` | `Habitat 2.0` | [Habitat 2.0: Training Home Assistants to Rearrange their Habitat](https://arxiv.org/abs/2106.14405)z
\-

- `2020/04` | `RoboTHOR` | [RoboTHOR: An Open Simulation-to-Real Embodied AI Platform](https://arxiv.org/abs/2004.06799)
\-

- `2019/11` | `HAD` | [Grounding Human-to-Vehicle Advice for Self-driving Vehicles](https://arxiv.org/abs/1911.06978)
\-

- `2019/04` | `Habitat` | [Habitat: A Platform for Embodied AI Research](https://arxiv.org/abs/1904.01201)
\-

- `2018/08` | `Gibson` | [Gibson Env: Real-World Perception for Embodied Agents](https://arxiv.org/abs/1808.10654)
\-

- `2018/06` | `VirtualHome` | [VirtualHome: Simulating Household Activities via Programs](https://arxiv.org/abs/1806.07011)
\-

---

### 3.9 Other Tasks and Applications

other tasks and applications

[Reasoning Tasks (Back-to-Top)](#3-reasoning-tasks)

- [3.9 Other Tasks and Applications](#39-other-tasks-and-applications)
- [3.9.1 Theory of Mind (ToM)](#391-theory-of-mind-tom)
- [3.9.2 LLMs for Weather Prediction](#392-llms-for-weather-prediction)
- [3.9.3 Abstract Reasoning](#393-abstract-reasoning)
- [3.9.4 Defeasible Reasoning](#394-defeasible-reasoning)
- [3.9.5 Medical Reasoning](#395-medical-reasoning)
- [3.9.6 Bioinformatics Reasoning](#396-bioinformatics-reasoning)
- [3.9.7 Long-Chain Reasoning](#397-long-chain-reasoning)

#### 3.9.1 Theory of Mind (ToM)

- `2023/02` | `ToM` | [Theory of Mind Might Have Spontaneously Emerged in Large Language Models](https://arxiv.org/abs/2302.02083)
\-

#### 3.9.2 LLMs for Weather Prediction

- `2022/09` | `MetNet-2` | Deep learning for twelve hour precipitation forecasts
\-
[[Paper](https://www.nature.com/articles/s41467-022-32483-x)]

- `2023/07` | `Pangu-Weather` | Accurate medium-range global weather forecasting with 3D neural networks
\-
[[Paper](https://www.nature.com/articles/s41586-023-06185-3)]

#### 3.9.3 Abstract Reasoning

- `2023/05` | [Large Language Models Are Not Strong Abstract Reasoners](https://arxiv.org/abs/2305.19555)
\-

#### 3.9.4 Defeasible Reasoning

- `2023/06` | `BoardgameQA` | [BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information](https://arxiv.org/abs/2306.07934)
\-

- `2021/10` | `CURIOUS` | [Think about it! Improving defeasible reasoning by first modeling the question scenario](https://arxiv.org/abs/2110.12349)
\-

- `2020/11` | `Defeasible NLI` / `δ-NLI` | [Thinking Like a Skeptic: Defeasible Inference in Natural Language](https://aclanthology.org/2020.findings-emnlp.418/)
\-
[[Paper](https://aclanthology.org/2020.findings-emnlp.418.pdf)]

- `2020/04` | `KACC` | [KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion](https://arxiv.org/abs/2004.13631)
\-

- `2009/01` | A Recursive Semantics for Defeasible Reasoning
\-
[[Paper](https://link.springer.com/chapter/10.1007/978-0-387-98197-0_9)]

#### 3.9.5 Medical Reasoning

- `2024/01` | `CheXagent` / `CheXinstruct` / `CheXbench`
| `Chen et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:267069358?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/Stanford-AIMI/CheXagent.svg?style=social&label=Star)

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

[[arXiv](https://arxiv.org/abs/2401.12208)]
[[paper](https://arxiv.org/pdf/2401.12208.pdf)]
[[code](https://github.com/Stanford-AIMI/CheXagent)]
[[project](https://stanford-aimi.github.io/chexagent.html)]
[[huggingface](https://huggingface.co/stanford-crfm/BioMedLM)]

- `2024/01` | `EchoGPT`
| `Chao et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:267042212?fields=citationCount&query=%24.citationCount&label=citations)

EchoGPT: A Large Language Model for Echocardiography Report Summarization

[[medRxiv](https://www.medrxiv.org/content/10.1101/2024.01.18.24301503)]
[[paper](https://www.medrxiv.org/content/10.1101/2024.01.18.24301503v1.full.pdf)]

- `2023/10` | `GPT4V-Medical-Report`
| `Yan et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:264805701?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/ZhilingYan/GPT4V-Medical-Report.svg?style=social&label=Star)

Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

[[arXiv](https://arxiv.org/abs/2310.19061)]
[[paper](https://arxiv.org/pdf/2307.15189.pdf)]
[[code](https://github.com/ZhilingYan/GPT4V-Medical-Report)]

- `2023/10` | `VisionFM`
| `Qiu et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:263828921?fields=citationCount&query=%24.citationCount&label=citations)

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

[[arXiv](https://arxiv.org/abs/2310.04992)]
[[paper](https://arxiv.org/pdf/2310.04992.pdf)]

- `2023/09`
| `Yang et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:263310951?fields=citationCount&query=%24.citationCount&label=citations)

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

[[arXiv](https://arxiv.org/abs/2309.17421)]
[[paper](https://arxiv.org/pdf/2309.17421.pdf)]

- `2023/09` | `RETFound`
| `Zhou et al., Nature`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:264168236?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/rmaphoh/RETFound_MAE.svg?style=social&label=Star)

A foundation model for generalizable disease detection from retinal images

[[paper](https://www.nature.com/articles/s41586-023-06555-x)]
[[code](https://github.com/rmaphoh/RETFound_MAE)]

- `2023/08` | `ELIXR`
| `Xu et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:260378981?fields=citationCount&query=%24.citationCount&label=citations)

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

[[arXiv](https://arxiv.org/abs/2308.01317)]
[[paper](https://arxiv.org/pdf/2308.01317.pdf)]

- `2023/07` | `Med-Flamingo`
| `Moor et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:260316059?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/snap-stanford/med-flamingo.svg?style=social&label=Star)

Med-Flamingo: a Multimodal Medical Few-shot Learner

[[arXiv](https://arxiv.org/abs/2307.15189)]
[[paper](https://arxiv.org/pdf/2307.15189.pdf)]
[[code](https://github.com/snap-stanford/med-flamingo)]

- `2023/07` | `Med-PaLM M`
| `Tu et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:260164663?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/kyegomez/Med-PaLM.svg?style=social&label=Star)

Towards Generalist Biomedical AI

[[arXiv](https://arxiv.org/abs/2307.14334)]
[[paper](https://arxiv.org/pdf/2307.14334.pdf)]
[[code](https://github.com/kyegomez/Med-PaLM)]

- `2023/06` | `Endo-FM`
| `Wang et al., MICCAI 2023`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:259287248?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/med-air/Endo-FM.svg?style=social&label=Star)

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train

[[arXiv](https://arxiv.org/abs/2306.16741)]
[[paper](https://link.springer.com/chapter/10.1007/978-3-031-43996-4_10)]
[[code](https://github.com/med-air/Endo-FM)]

- `2023/06` | `XrayGPT`
| `Thawkar et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:259145194?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/mbzuai-oryx/XrayGPT.svg?style=social&label=Star)

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

\-
[[arXiv](https://arxiv.org/abs/2306.07971)]
[[paper](https://arxiv.org/pdf/2306.07971.pdf)]
[[code](https://github.com/mbzuai-oryx/XrayGPT)]

- `2023/06` | `LLaVA-Med`
| `Li et al., NeurIPS 2023`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:258999820?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/microsoft/LLaVA-Med.svg?style=social&label=Star)

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

[[arXiv](https://arxiv.org/abs/2306.00890)]
[[paper](https://neurips.cc/virtual/2023/poster/73643)]
[[code](https://github.com/microsoft/LLaVA-Med)]

- `2023/05` | `HuatuoGPT`
| `Zhang et al., Findings of EMNLP 2023`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:258865566?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/FreedomIntelligence/HuatuoGPT.svg?style=social&label=Star)

HuatuoGPT, Towards Taming Language Model to Be a Doctor

[[arXiv](https://arxiv.org/abs/2305.15075)]
[[paper](https://aclanthology.org/2023.findings-emnlp.725/)]
[[code](https://github.com/FreedomIntelligence/HuatuoGPT)]

- `2023/05` | `Med-PaLM 2`
| `Singhal et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:258715226?fields=citationCount&query=%24.citationCount&label=citations)

Towards Expert-Level Medical Question Answering with Large Language Models

[[arXiv](https://arxiv.org/abs/2305.09617)]
[[paper](https://arxiv.org/pdf/2305.09617.pdf)]

- `2022/12` | ` Med-PaLM` / `MultiMedQA` / `HealthSearchQA`
| `Singhal et al., Nature`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:255124952?fields=citationCount&query=%24.citationCount&label=citations)

Large Language Models Encode Clinical Knowledge

[[arXiv](https://arxiv.org/abs/2212.13138)]
[[paper](https://www.nature.com/articles/s41586-023-06291-2)]

#### 3.9.6 Bioinformatics Reasoning

- `2023/07` | `Prot2Text` | [Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers](https://arxiv.org/abs/2307.14367)
\-

- `2023/07` | `Uni-RNA` | [Uni-RNA: Universal Pre-Trained Models Revolutionize RNA Research](https://www.biorxiv.org/content/10.1101/2023.07.11.548588v1)
\-

- `2023/07` | `RFdiffusion` | De novo design of protein structure and function with RFdiffusion
\-
[[Paper](https://www.nature.com/articles/s41586-023-06415-8)]

- `2023/06` | `HyenaDNA` | [HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution](https://arxiv.org/abs/2306.15794)
\-

- `2023/06` | `DrugGPT` | [DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins](https://www.biorxiv.org/content/10.1101/2023.06.29.543848v1)
\-

- `2023/04` | `GeneGPT` | [GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information](https://arxiv.org/abs/2304.09667)
\-

- `2023/04` | Drug discovery companies are customizing ChatGPT: here’s how
\-
[[News](https://www.nature.com/articles/s41587-023-01788-7)]

- `2023/01` | `ProGen` | Large language models generate functional protein sequences across diverse families
\-
[[Paper](https://www.nature.com/articles/s41587-022-01618-2)]

- `2022/06` | `ProGen2` | [ProGen2: Exploring the Boundaries of Protein Language Models](https://arxiv.org/abs/2206.13517)
\-

- `2021/07` | `AlphaFold` | Highly accurate protein structure prediction with AlphaFold
\-
[[Paper](https://www.nature.com/articles/s41586-021-03819-2)]

#### 3.9.7 Long-Chain Reasoning

- `2022/12` | `Fine-tune-CoT` | [Large Language Models Are Reasoning Teachers](https://arxiv.org/abs/2212.10071)
\-

- `2021/09` | `PlaTe` | [PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks](https://arxiv.org/abs/2109.04869)
\-

---

## 4 Reasoning Techniques

reasoning techniques

[(Back-to-Top)](#table-of-contents)

### Table of Contents - 4

reasoning techniques (table of contents)

- [4 Reasoning Techniques](#4-reasoning-techniques)
- [4.1 Pre-Training](#41-pre-training)
- [4.1.1 Data](#411-data)
- [a. Data - Text](#a-data---text)
- [b. Data - Image](#b-data---image)
- [c. Data - Multimodality](#c-data---multimodality)
- [4.1.2 Network Architecture](#412-network-architecture)
- [a. Encoder-Decoder](#a-encoder-decoder)
- [b. Decoder-Only](#b-decoder-only)
- [c. CLIP Variants](#c-clip-variants)
- [d. Others](#d-others)
- [4.2 Fine-Tuning](#42-fine-tuning)
- [4.2.1 Data](#421-data)
- [4.2.2 Parameter-Efficient Fine-tuning](#422-parameter-efficient-fine-tuning)
- [a. Adapter Tuning](#a-adapter-tuning)
- [b. Low-Rank Adaptation](#b-low-rank-adaptation)
- [c. Prompt Tuning](#c-prompt-tuning)
- [d. Partial Parameter Tuning](#d-partial-parameter-tuning)
- [e. Mixture-of-Modality Adaption](#e-mixture-of-modality-adaption)
- [4.3 Alignment Training](#43-alignment-training)
- [4.3.1 Data](#431-data)
- [a. Data - Human](#a-data---human)
- [b. Data - Synthesis](#b-data---synthesis)
- [4.3.2 Training Pipeline](#432-training-pipeline)
- [a. Online Human Preference Training](#a-online-human-preference-training)
- [b. Offline Human Preference Training](#b-offline-human-preference-training)
- [4.4 Mixture of Experts (MoE)](#44-mixture-of-experts-moe)
- [4.5 In-Context Learning](#45-in-context-learning)
- [4.5.1 Demonstration Example Selection](#451-demonstration-example-selection)
- [a. Prior-Knowledge Approach](#a-prior-knowledge-approach)
- [b. Retrieval Approach](#b-retrieval-approach)
- [4.5.2 Chain-of-Thought](#452-chain-of-thought)
- [a. Zero-Shot CoT](#a-zero-shot-cot)
- [b. Few-Shot CoT](#b-few-shot-cot)
- [c. Multiple Paths Aggregation](#c-multiple-paths-aggregation)
- [4.5.3 Multi-Round Prompting](#453-multi-round-prompting)
- [a. Learned Refiners](#a-learned-refiners)
- [b. Prompted Refiners](#b-prompted-refiners)
- [4.6 Autonomous Agent](#46-autonomous-agent)

### 4.1 Pre-Training

pre-training

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- [4.1 Pre-Training](#41-pre-training)
- [4.1.1 Data](#411-data)
- [a. Data - Text](#a-data---text)
- [b. Data - Image](#b-data---image)
- [c. Data - Multimodality](#c-data---multimodality)
- [4.1.2 Network Architecture](#412-network-architecture)
- [a. Encoder-Decoder](#a-encoder-decoder)
- [b. Decoder-Only](#b-decoder-only)
- [c. CLIP Variants](#c-clip-variants)
- [d. Others](#d-others)

#### 4.1.1 Data

##### a. Data - Text

- `2023/07` | `peS2o` | peS2o (Pretraining Efficiently on S2ORC) Dataset
\-
[[Code](https://github.com/allenai/pes2o)]

- `2023/05` | `ROOTS` / `BLOOM` | [The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset](https://arxiv.org/abs/2303.03915)
\-

- `2023/04` | `RedPajama` | RedPajama: an Open Dataset for Training Large Language Models
\-
[[Code](https://github.com/togethercomputer/RedPajama-Data)]

- `2020/12` | `The Pile` | [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
\-

- `2020/04` | `Reddit` | [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637)
\-

- `2020/04` | `CLUE` | [CLUE: A Chinese Language Understanding Evaluation Benchmark](https://arxiv.org/abs/2004.05986)
\-

- `2019/10` | `C4` | [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
\-

- `2013/10` | `Gutenberg` | [Complexity of Word Collocation Networks: A Preliminary Structural Analysis](https://arxiv.org/abs/1310.5111)

##### b. Data - Image

- `2023/06` | `I2E` / `MOFI` | [MOFI: Learning Image Representations from Noisy Entity Annotated Images](https://arxiv.org/abs/2306.07952)
\-

- `2022/01` | `SWAG` [Revisiting Weakly Supervised Pre-Training of Visual Perception Models](https://arxiv.org/abs/2201.08371)
\-

- `2021/04` | `ImageNet-21K` | [ImageNet-21K Pretraining for the Masses](https://arxiv.org/abs/2104.10972)
\-

- `2017/07` | `JFT` | [Revisiting Unreasonable Effectiveness of Data in Deep Learning Era](https://arxiv.org/abs/1707.02968)
\-

- `2014/09` | `ImageNet` | [ImageNet Large Scale Visual Recognition Challenge](https://arxiv.org/abs/1409.0575)
\-

##### c. Data - Multimodality

- `2023/09` | `Point-Bind` | [Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following](https://arxiv.org/abs/2309.00615)
\-

- `2023/05` | `ImageBind` | [ImageBind: One Embedding Space To Bind Them All](https://arxiv.org/abs/2305.05665)
\-

- `2023/04` | `DataComp` | [DataComp: In search of the next generation of multimodal datasets](https://arxiv.org/abs/2304.14108)
\-

- `2022/10` | `LAION-5B` | [LAION-5B: An open large-scale dataset for training next generation image-text models](https://arxiv.org/abs/2210.08402)
\-

- `2022/08` | `Shutterstock` | [Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP](https://arxiv.org/abs/2208.05516)
\-

- `2022/08` | `COYO-700M` | COYO-700M: Image-Text Pair Dataset
\-
[[Code](https://github.com/kakaobrain/coyo-dataset)]

- `2022/04` | `M3W` | [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
\-

- `2021/11` | `RedCaps` | [RedCaps: web-curated image-text data created by the people, for the people](https://arxiv.org/abs/2111.11431)
\-

- `2021/11` | `LAION-400M` | [LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs](https://arxiv.org/abs/2111.02114)
\-

- `2021/03` | `WIT` | [WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning](https://arxiv.org/abs/2103.01913)
\-

- `2011/12` | `Im2Text` / `SBU` | [Im2Text: Describing Images Using 1 Million Captioned Photographs](https://papers.nips.cc/paper_files/paper/2011/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html)
\-

#### 4.1.2 Network Architecture

- `2023/04` | [Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder](https://arxiv.org/abs/2304.04052)
\-

##### a. Encoder-Decoder

- `2019/10` | `BART` | [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
\-

- `2019/10` | `T5` | [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
\-

- `2018/10` | `BERT` | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
\-
[[Paper](https://aclanthology.org/N19-1423.pdf)]
[[Code](https://github.com/google-research/bert)]
[[Blog](https://blog.research.google/2018/11/open-sourcing-bert-state-of-art-pre.html)]

- `2017/06` | `Transformer` | [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
\-

##### b. Decoder-Only

- `2023/07` | `Llama 2` | [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)
\-
[[Paper](https://arxiv.org/pdf/2307.09288.pdf)]
[[Code](https://github.com/facebookresearch/llama)]
[[Blog](https://ai.meta.com/llama/)]

- `2023/02` | `LLaMA` | [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
\-
[[Paper](https://arxiv.org/pdf/2302.13971.pdf)]
[[Code](https://github.com/facebookresearch/llama)]
[[Blog](https://ai.meta.com/blog/large-language-model-llama-meta-ai/)]

- `2022/11` | `BLOOM` | [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model](https://arxiv.org/abs/2211.05100)
\-

- `2022/10` | `GLM` | [GLM-130B: An Open Bilingual Pre-trained Model](https://arxiv.org/abs/2210.02414)
\-

- `2022/05` | `OPT` | [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068)
\-

- `2021/12` | `Gopher` | [Scaling Language Models: Methods, Analysis & Insights from Training Gopher](https://arxiv.org/abs/2112.11446)
\-

- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
\-
[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)]
[[Code](https://github.com/openai/gpt-3)]

- `2019/02` | `GPT-2` | Language Models are Unsupervised Multitask Learners
\-
[[Paper](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)]

- `2018/06` | `GPT-1` | Improving Language Understanding by Generative Pre-Training
\-
[[Paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)]

##### c. CLIP Variants

- `2023/05` | `LaCLIP` | [Improving CLIP Training with Language Rewrites](https://arxiv.org/abs/2305.20088)
\-

- `2023/04` | `DetCLIPv2` | [DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment](https://arxiv.org/abs/2304.04514)
\-

- `2022/12` | `FLIP` | [Scaling Language-Image Pre-training via Masking](https://arxiv.org/abs/2212.00794)
\-

- `2022/09` | `DetCLIP` | [DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection](https://arxiv.org/abs/2209.09407)
\-

- `2022/04` | `K-LITE` | [K-LITE: Learning Transferable Visual Models with External Knowledge](https://arxiv.org/abs/2204.09222)
\-

- `2021/11` | `FILIP` | [FILIP: Fine-grained Interactive Language-Image Pre-Training](https://arxiv.org/abs/2111.07783)
\-

- `2021/02` | `CLIP` | [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
\-
[[Paper](https://proceedings.mlr.press/v139/radford21a/radford21a.pdf)]
[[Code](https://github.com/openai/CLIP)]
[[Blog](https://openai.com/research/clip)]

##### d. Others

- `2023/09` | `StreamingLLM` | [Efficient Streaming Language Models with Attention Sinks](https://arxiv.org/abs/2309.17453)
\-

- `2023/07` | `RetNet` | [Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621)
\-

- `2023/07` | | [LongNet: Scaling Transformers to 1,000,000,000 Tokens](https://arxiv.org/abs/2307.02486)
\-

- `2023/05` | `RWKV` | [RWKV: Reinventing RNNs for the Transformer Era](https://arxiv.org/abs/2305.13048)
\-

- `2023/02` | `Hyena` | [Hyena Hierarchy: Towards Larger Convolutional Language Models](https://arxiv.org/abs/2302.10866)
\-

- `2022/12` | `H3` | [Hungry Hungry Hippos: Towards Language Modeling with State Space Models](https://arxiv.org/abs/2212.14052)
\-

- `2022/06` | `GSS` | [Long Range Language Modeling via Gated State Spaces](https://arxiv.org/abs/2206.13947)
\-

- `2022/03` | `DSS` | [Diagonal State Spaces are as Effective as Structured State Spaces](https://arxiv.org/abs/2203.14343)
\-

- `2021/10` | `S4` | [Efficiently Modeling Long Sequences with Structured State Spaces](https://arxiv.org/abs/2111.00396)
\-

---

### 4.2 Fine-Tuning

fine-tuning

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- [4.2 Fine-Tuning](#42-fine-tuning)
- [4.2.1 Data](#421-data)
- [4.2.2 Parameter-Efficient Fine-tuning](#422-parameter-efficient-fine-tuning)
- [a. Adapter Tuning](#a-adapter-tuning)
- [b. Low-Rank Adaptation](#b-low-rank-adaptation)
- [c. Prompt Tuning](#c-prompt-tuning)
- [d. Partial Parameter Tuning](#d-partial-parameter-tuning)
- [e. Mixture-of-Modality Adaption](#e-mixture-of-modality-adaption)

#### 4.2.1 Data

- `2023/09` | `MetaMath` | [MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models](https://arxiv.org/abs/2309.12284)
\-

- `2023/09` | `MAmmoTH` | [MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning](https://arxiv.org/abs/2309.05653)
\-

- `2023/08` | `WizardMath` | [WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct](https://arxiv.org/abs/2308.09583)
\-

- `2023/08` | `RFT` | [Scaling Relationship on Learning Mathematical Reasoning with Large Language Models](https://arxiv.org/abs/2308.01825)
\-

- `2023/05` | `PRM800K` / `` | [Let's Verify Step by Step](https://arxiv.org/abs/2305.20050)
\-

- `2023/05` | `Distilling Step-by-Step` | [Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes](https://arxiv.org/abs/2305.02301)
\-

- `2023/01` | [Specializing Smaller Language Models towards Multi-Step Reasoning](https://arxiv.org/abs/2301.12726)
\-

- `2022/12` | `Fine-tune-CoT` | [Large Language Models Are Reasoning Teachers](https://arxiv.org/abs/2212.10071)
\-

- `2022/12` | [Teaching Small Language Models to Reason](https://arxiv.org/abs/2212.08410)
\-

- `2022/10` | [Large Language Models Can Self-Improve](https://arxiv.org/abs/2210.11610)
\-

- `2022/10` | [Explanations from Large Language Models Make Small Reasoners Better](https://arxiv.org/abs/2210.06726)
\-

#### 4.2.2 Parameter-Efficient Fine-tuning

##### a. Adapter Tuning

- `2023/03` | `LLaMA-Adapter` | [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://arxiv.org/abs/2303.16199)
\-

- `2022/05` | `AdaMix` | [AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning](https://arxiv.org/abs/2205.12410)
\-

- `2021/10` | [Towards a Unified View of Parameter-Efficient Transfer Learning](https://arxiv.org/abs/2110.04366)
\-

- `2021/06` | `Compacter` | [Compacter: Efficient Low-Rank Hypercomplex Adapter Layers](https://arxiv.org/abs/2106.04647)
\-

- `2020/04` | `MAD-X` | [MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer](https://arxiv.org/abs/2005.00052)
\-

- `2019/02` | `Adapter` | [Parameter-Efficient Transfer Learning for NLP](https://arxiv.org/abs/1902.00751)
\-

##### b. Low-Rank Adaptation

- `2023/09` | `LongLoRA` / `LongAlpaca-12k`
| `Chen et al., ICLR 2024`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:262084134?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/dvlab-research/LongLoRA.svg?style=social&label=Star)

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

[[arXiv](https://arxiv.org/abs/2309.12307)]
[[paper](https://openreview.net/forum?id=6PmJoRfdaK)]
[[code](https://github.com/dvlab-research/LongLoRA)]

- `2023/05` | `QLoRA` | [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
\-

- `2023/03` | `AdaLoRA` | [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.10512)
\-

- `2022/12` | `KronA` | [KronA: Parameter Efficient Tuning with Kronecker Adapter](https://arxiv.org/abs/2212.10650)
\-

- `2022/10` | `DyLoRA` | [DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation](https://arxiv.org/abs/2210.07558)
\-

- `2021/06` | `LoRA` | [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
\-

##### c. Prompt Tuning

- `2021/10` | `P-Tuning v2` | [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/abs/2110.07602)
\-

- `2021/04` | `Prompt Tuning` | [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691)
\-

- `2021/04` | `OptiPrompt` | [Factual Probing Is [MASK]: Learning vs. Learning to Recall](https://arxiv.org/abs/2104.05240)
\-

- `2021/03` | `P-Tuning` | [GPT Understands, Too](https://arxiv.org/abs/2103.10385)
\-

- `2021/01` | `Prefix-Tuning` | [Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://arxiv.org/abs/2101.00190)
\-

##### d. Partial Parameter Tuning

- `2023/04` | `DiffFit` | [DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2304.06648)
\-

- `2022/10` | `SSF` | [Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning](https://arxiv.org/abs/2210.08823)
\-

- `2021/09` | `Child-Tuning` | [Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning](https://arxiv.org/abs/2109.05687)
\-

- `2021/06` | `BitFit` | [BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models](https://arxiv.org/abs/2106.10199)
\-

##### e. Mixture-of-Modality Adaption

- `2023/10` | `LLaVA-1.5` | [Improved Baselines with Visual Instruction Tuning](https://arxiv.org/abs/2310.03744)
\-

- `2023/05` | `MMA` / `LaVIN` | [Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models](https://arxiv.org/abs/2305.15023)
\-

- `2023/04` | `LLaMA-Adapter V2` | [LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model](https://arxiv.org/abs/2304.15010)
\-

- `2023/04` | `LLaVA` | [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485)
\-

- `2023/02` | `RepAdapter` | [Towards Efficient Visual Adaption via Structural Re-parameterization](https://arxiv.org/abs/2302.08106)
\-

---

### 4.3 Alignment Training

alignment training

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- [4.3 Alignment Training](#43-alignment-training)
- [4.3.1 Data](#431-data)
- [a. Data - Human](#a-data---human)
- [b. Data - Synthesis](#b-data---synthesis)
- [4.3.2 Training Pipeline](#432-training-pipeline)
- [a. Online Human Preference Training](#a-online-human-preference-training)
- [b. Offline Human Preference Training](#b-offline-human-preference-training)

#### 4.3.1 Data

##### a. Data - Human

- `2023/06` | `Dolly` | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
\-
[[Code](https://github.com/databrickslabs/dolly)]

- `2023/04` | `LongForm` | [LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction](https://arxiv.org/abs/2304.08460)
\-

- `2023/04` | `COIG` | [Chinese Open Instruction Generalist: A Preliminary Release](https://arxiv.org/abs/2304.07987)
\-

- `2023/04` | `OpenAssistant Conversations` | [OpenAssistant Conversations -- Democratizing Large Language Model Alignment](https://arxiv.org/abs/2304.07327)
\-

- `2023/01` | `Flan 2022` | [The Flan Collection: Designing Data and Methods for Effective Instruction Tuning](https://arxiv.org/abs/2301.13688)
\-

- `2022/11` | `xP3` | [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786)
\-

- `2022/04` | `Super-NaturalInstructions` | [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705)
\-

- `2021/11` | `ExT5` | [ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning](https://arxiv.org/abs/2111.10952)
\-

- `2021/10` | `MetaICL` | [MetaICL: Learning to Learn In Context](https://arxiv.org/abs/2110.15943)
\-

- `2021/10` | `P3` | [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207)
\-

- `2021/04` | `CrossFit` | [CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP](https://arxiv.org/abs/2104.08835)
\-

- `2021/04` | `NATURAL INSTRUCTIONS` | [Cross-Task Generalization via Natural Language Crowdsourcing Instructions](https://arxiv.org/abs/2104.08773)
\-

- `2020/05` | `UnifiedQA` | [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
\-

##### b. Data - Synthesis

- `2023/08` | `Instruction Backtranslation` | [Self-Alignment with Instruction Backtranslation](https://arxiv.org/abs/2308.06259)
\-

- `2023/05` | `Dynosaur` | [Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation](https://arxiv.org/abs/2305.14327)
\-

- `2023/05` | `UltraChat` | [Enhancing Chat Language Models by Scaling High-quality Instructional Conversations](https://arxiv.org/abs/2305.14233)
\-

- `2023/05` | `CoT Collection` | [The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning](https://arxiv.org/abs/2305.14045)
\-

- `2023/05` | `CoEdIT` | [CoEdIT: Text Editing by Task-Specific Instruction Tuning](https://arxiv.org/abs/2305.09857)
\-

- `2023/04` | `LaMini-LM` | [LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://arxiv.org/abs/2304.14402)
\-

- `2023/04` | `GPT-4-LLM` | [Instruction Tuning with GPT-4](https://arxiv.org/abs/2304.03277)
\-

- `2023/04` | `Koala` | Koala: A Dialogue Model for Academic Research
\-
[[Blog](https://bair.berkeley.edu/blog/2023/04/03/koala/)]

- `2023/03` | `Alpaca` | Alpaca: A Strong, Replicable Instruction-Following Model
\-
[[Blog](https://crfm.stanford.edu/2023/03/13/alpaca.html)]

- `2023/03` | `GPT4All` | GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
\-
[[Code](https://github.com/nomic-ai/gpt4all)]

- `2022/12` | `OPT-IML` / `OPT-IML Bench` | [OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization](https://arxiv.org/abs/2212.12017)
\-

- `2022/12` | `Self-Instruct` | [Self-Instruct: Aligning Language Models with Self-Generated Instructions](https://arxiv.org/abs/2212.10560)
\-

- `2022/12` | `Unnatural Instructions` | [Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor](https://arxiv.org/abs/2212.09689)
\-

#### 4.3.2 Training Pipeline

##### a. Online Human Preference Training

- `2023/06` | `APA` | [Fine-Tuning Language Models with Advantage-Induced Policy Alignment](https://arxiv.org/abs/2306.02231)
\-

- `2023/04` | `RAFT` | [RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment](https://arxiv.org/abs/2304.06767)
\-

- `2022/03` | `InstructGPT` / `RLHF` | [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
\-

##### b. Offline Human Preference Training

- `2023/06` | `PRO` | [Preference Ranking Optimization for Human Alignment](https://arxiv.org/abs/2306.17492)
\-

- `2023/05` | `DPO` | [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
\-

- `2023/04` | `RRHF` | [RRHF: Rank Responses to Align Language Models with Human Feedback without tears](https://arxiv.org/abs/2304.05302)
\-

- `2022/09` | `SLiC` | [Calibrating Sequence likelihood Improves Conditional Language Generation](https://arxiv.org/abs/2210.00045)
\-

---

### 4.4 Mixture of Experts (MoE)

mixture of experts

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- `2024/01` | `MoE-LLaVA`
| `Lin et al.`
![citations](https://img.shields.io/badge/dynamic/json?url=https://api.semanticscholar.org/graph/v1/paper/CorpusID:267311517?fields=citationCount&query=%24.citationCount&label=citations)
![Star](https://img.shields.io/github/stars/PKU-YuanGroup/MoE-LLaVA.svg?style=social&label=Star)

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

[[arXiv](https://arxiv.org/abs/2401.15947)]
[[paper](https://arxiv.org/pdf/2401.15947.pdf)]
[[code](https://github.com/PKU-YuanGroup/MoE-LLaVA)]

- `2023/06` | [An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training](https://arxiv.org/abs/2306.17165)
\-

- `2023/03` | `MixedAE` | [Mixed Autoencoder for Self-supervised Visual Representation Learning](https://arxiv.org/abs/2303.17152)
\-

- `2022/12` | `Mod-Squad` | [Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners](https://arxiv.org/abs/2212.08066)
\-

- `2022/04` | `MoEBERT` | [MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation](https://arxiv.org/abs/2204.07675)
\-

- `2021/12` | `GLaM` | [GLaM: Efficient Scaling of Language Models with Mixture-of-Experts](https://arxiv.org/abs/2112.06905)
\-

- `2021/07` | `WideNet` | [Go Wider Instead of Deeper](https://arxiv.org/abs/2107.11817)
\-

- `2021/01` | `Switch Transformers` | [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961)
\-

- `2020/06` | `GShard` | [GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding](https://arxiv.org/abs/2006.16668)
\-

- `2017/01` | `Sparsely-Gated Mixture-of-Experts` | [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://arxiv.org/abs/1701.06538)
\-

- `1991/03` | Adaptive Mixtures of Local Experts
\-
[[Paper](https://ieeexplore.ieee.org/document/6797059)]

---

### 4.5 In-Context Learning

in-context learning

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- [4.5 In-Context Learning](#45-in-context-learning)
- [4.5.1 Demonstration Example Selection](#451-demonstration-example-selection)
- [a. Prior-Knowledge Approach](#a-prior-knowledge-approach)
- [b. Retrieval Approach](#b-retrieval-approach)
- [4.5.2 Chain-of-Thought](#452-chain-of-thought)
- [a. Zero-Shot CoT](#a-zero-shot-cot)
- [b. Few-Shot CoT](#b-few-shot-cot)
- [c. Multiple Paths Aggregation](#c-multiple-paths-aggregation)
- [4.5.3 Multi-Round Prompting](#453-multi-round-prompting)
- [a. Learned Refiners](#a-learned-refiners)
- [b. Prompted Refiners](#b-prompted-refiners)


- `2022/10` | `FLAN-T5` | [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
\-

- `2021/05` | `GPT-3` | [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
\-
[[Paper](https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)]
[[Code](https://github.com/openai/gpt-3)]

#### 4.5.1 Demonstration Example Selection

##### a. Prior-Knowledge Approach

- `2022/12` | [Diverse Demonstrations Improve In-context Compositional Generalization](https://arxiv.org/abs/2212.06800)
\-

- `2022/11` | [Complementary Explanations for Effective In-Context Learning](https://arxiv.org/abs/2211.13892)
\-

- `2022/10` | `Auto-CoT` | [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493)
\-

- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720)
\-

- `2022/10` | `EmpGPT-3` | [Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation](https://aclanthology.org/2022.coling-1.56/)
\-
[[Paper](https://github.com/passing2961/EmpGPT-3)]

- `2022/09` | [Selective Annotation Makes Language Models Better Few-Shot Learners](https://arxiv.org/abs/2209.01975)
\-

- `2021/01` | [What Makes Good In-Context Examples for GPT-3?](https://arxiv.org/abs/2101.06804)
\-

##### b. Retrieval Approach

- `2023/10` | `DQ-LoRe` | [DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning](https://arxiv.org/abs/2310.02954)
\-

- `2023/07` | `LLM-R` | [Learning to Retrieve In-Context Examples for Large Language Models](https://arxiv.org/abs/2307.07164)
\-

- `2023/05` | `Dr.ICL` | [Dr.ICL: Demonstration-Retrieved In-context Learning](https://arxiv.org/abs/2305.14128)
\-

- `2023/02` | `LENS` | [Finding Support Examples for In-Context Learning](https://arxiv.org/abs/2302.13539)
\-

- `2023/02` | `CEIL` | [Compositional Exemplars for In-context Learning](https://arxiv.org/abs/2302.05698)
\-

- `2021/12` | [Learning To Retrieve Prompts for In-Context Learning](https://arxiv.org/abs/2112.08633)
\-

#### 4.5.2 Chain-of-Thought

##### a. Zero-Shot CoT
- `2023/09` | `LoT` | [Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic](https://arxiv.org/abs/2309.13339)
\-
[[Paper](https://arxiv.org/abs/2309.13339)]
[[Code](https://github.com/xf-zhao/LoT)]

- `2023/05` | `Plan-and-Solve` | [Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models](https://arxiv.org/abs/2305.04091)
\-

- `2022/05` | `Zero-shot-CoT` | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916)
\-
[[Paper](https://openreview.net/pdf?id=e2TBb5y0yFf)]
[[Code](https://github.com/kojima-takeshi188/zero_shot_cot)]

##### b. Few-Shot CoT

- `2023/07` | `SoT` | [Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding](https://arxiv.org/abs/2307.15337)
\-

- `2023/05` | `Code Prompting` | [Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models](https://arxiv.org/abs/2305.18507)
\-

- `2023/05` | `GoT` | [Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models](https://arxiv.org/abs/2305.16582)
\-

- `2023/05` | `ToT` | [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601)
\-

- `2023/03` | `MathPrompter` | [MathPrompter: Mathematical Reasoning using Large Language Models](https://arxiv.org/abs/2303.05398)
\-

- `2022/11` | `PoT` | [Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588)
\-

- `2022/11` | `PAL` | [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435)
\-

- `2022/10` | `Auto-CoT` | [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493)
\-

- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720)
\-

- `2022/05` | `Least-to-Most Prompting` | [Least-to-Most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625)
\-

- `2022/01` | [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
\-

##### c. Multiple Paths Aggregation

- `2023/05` | `RAP` | [Reasoning with Language Model is Planning with World Model](https://arxiv.org/abs/2305.14992)
\-

- `2023/05` | [Automatic Model Selection with Large Language Models for Reasoning](https://arxiv.org/abs/2305.14333)
\-

- `2023/05` | `AdaptiveConsistency` | [Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860)
\-

- `2023/05` | `ToT` | [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601)
\-

- `2023/05` | `ToT` | [Large Language Model Guided Tree-of-Thought](https://arxiv.org/abs/2305.08291)
\-

- `2023/05` | [Self-Evaluation Guided Beam Search for Reasoning](https://arxiv.org/abs/2305.00633)
\-

- `2022/10` | `Complex CoT` | [Complexity-Based Prompting for Multi-Step Reasoning](https://arxiv.org/abs/2210.00720)
\-

- `2022/06` | `DIVERSE` | [Making Large Language Models Better Reasoners with Step-Aware Verifier](https://arxiv.org/abs/2206.02336)
\-

- `2022/03` | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171)
\-

#### 4.5.3 Multi-Round Prompting

##### a. Learned Refiners

- `2023/02` | `LLM-Augmenter` | [Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback](https://arxiv.org/abs/2302.12813)
\-

- `2022/10` | `Self-Correction` | [Generating Sequences by Learning to Self-Correct](https://arxiv.org/abs/2211.00053)
\-

- `2022/08` | `PEER` | [PEER: A Collaborative Language Model](https://arxiv.org/abs/2208.11663)
\-

- `2022/04` | `R3` | [Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision](https://arxiv.org/abs/2204.03685)
\-

- `2021/10` | `CURIOUS` | [Think about it! Improving defeasible reasoning by first modeling the question scenario](https://arxiv.org/abs/2110.12349)
\-

- `2020/05` | `DrRepair` | [Graph-based, Self-Supervised Program Repair from Diagnostic Feedback](https://arxiv.org/abs/2005.10636)
\-

##### b. Prompted Refiners

- `2023/06` | `InterCode` | [InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback](https://arxiv.org/abs/2306.14898)
\-

- `2023/06` | [Is Self-Repair a Silver Bullet for Code Generation?](https://arxiv.org/abs/2306.09896)
\-

- `2023/05` | [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325)
\-

- `2023/05` | `CRITIC` | [CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing](https://arxiv.org/abs/2305.11738)
\-

- `2023/05` | `GPT-Bargaining` | [Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback](https://arxiv.org/abs/2305.10142)
\-

- `2023/05` | `Self-Edit` | [Self-Edit: Fault-Aware Code Editor for Code Generation](https://arxiv.org/abs/2305.04087)
\-

- `2023/04` | `PHP` | [Progressive-Hint Prompting Improves Reasoning in Large Language Models](https://arxiv.org/abs/2304.09797)
\-

- `2023/04` | ` Self-collaboration` | [Self-collaboration Code Generation via ChatGPT](https://arxiv.org/abs/2304.07590)
\-

- `2023/04` | `Self-Debugging` | [Teaching Large Language Models to Self-Debug](https://arxiv.org/abs/2304.05128)
\-

- `2023/04` | `REFINER` | [REFINER: Reasoning Feedback on Intermediate Representation](https://arxiv.org/abs/2304.01904)
\-

- `2023/03` | `Self-Refine` | [Self-Refine: Iterative Refinement with Self-Feedback](https://arxiv.org/abs/2303.17651)
\-

---

### 4.6 Autonomous Agent

autonomous agent

[Reasoning Techniques (Back-to-Top)](#4-reasoning-techniques)

- `2023/10` | `planning tokens` | [Guiding Language Model Reasoning with Planning Tokens](https://aps.arxiv.org/abs/2310.05707)
\-

- `2023/09` | `AutoAgents` | [AutoAgents: A Framework for Automatic Agent Generation](https://arxiv.org/abs/2309.17288)
\-

- `2023/06` | `AssistGPT` | [AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn](https://arxiv.org/abs/2306.08640)
\-

- `2023/05` | `SwiftSage` | [SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks](https://arxiv.org/abs/2305.17390)
\-

- `2023/05` | `MultiTool-CoT` | [MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting](https://arxiv.org/abs/2305.16896)
\-

- `2023/05` | `Voyager` | [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291)
\-

- `2023/05` | `ChatCoT` | [ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models](https://arxiv.org/abs/2305.14323)
\-

- `2023/05` | `CREATOR` | [CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models](https://arxiv.org/abs/2305.14318)
\-

- `2023/05` | `TRICE` | [Making Language Models Better Tool Learners with Execution Feedback](https://arxiv.org/abs/2305.13068)
\-

- `2023/05` | `ToolkenGPT` | [ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings](https://arxiv.org/abs/2305.11554)
\-

- `2023/04` | `Chameleon` | [Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models](https://arxiv.org/abs/2304.09842)
\-

- `2023/04` | `OpenAGI` | [OpenAGI: When LLM Meets Domain Experts](https://arxiv.org/abs/2304.04370)
\-

- `2023/03` | `CAMEL` | [CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society](https://arxiv.org/abs/2303.17760)
\-

- `2023/03` | `HuggingGPT` | [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/abs/2303.17580)
\-

- `2023/03` | `Reflexion` | [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/abs/2303.11366)
\-

- `2023/03` | `ART` | [ART: Automatic multi-step reasoning and tool-use for large language models](https://arxiv.org/abs/2303.09014)
\-

- `2023/03` | `Auto-GPT` | Auto-GPT: An Autonomous GPT-4 Experiment
\-
[[Code](https://github.com/antony0596/auto-gpt)]

- `2023/02` | `Toolformer` | [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
\-

- `2022/11` | `VISPROG` | [Visual Programming: Compositional visual reasoning without training](https://arxiv.org/abs/2211.11559)
\-

- `2022/10` | `ReAct` | [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
\-

---