{"id":13536856,"url":"https://github.com/FreedomIntelligence/ReasoningNLP","last_synced_at":"2025-04-02T03:31:15.792Z","repository":{"id":58899800,"uuid":"534238820","full_name":"FreedomIntelligence/ReasoningNLP","owner":"FreedomIntelligence","description":"paper list on reasoning in NLP","archived":false,"fork":false,"pushed_at":"2023-11-13T12:00:00.000Z","size":156,"stargazers_count":184,"open_issues_count":2,"forks_count":15,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-11T22:19:51.702Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FreedomIntelligence.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-08T14:01:50.000Z","updated_at":"2025-02-10T21:18:11.000Z","dependencies_parsed_at":"2024-11-03T01:32:07.113Z","dependency_job_id":"bcc9d65f-93a2-4016-af00-a6d7cebaf374","html_url":"https://github.com/FreedomIntelligence/ReasoningNLP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FReasoningNLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FReasoningNLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FReasoningNLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FReasoningNLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FreedomIntelligence","download_url":"https://codeload.github.com/FreedomIntelligence/ReasoningNLP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246751113,"owners_count":20827836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T09:00:50.655Z","updated_at":"2025-04-02T03:31:15.526Z","avatar_url":"https://github.com/FreedomIntelligence.png","language":null,"funding_links":[],"categories":["Refer to Awesome Lists","A01_文本生成_文本对话","Uncategorized","Other Awesome Lists","Contributing"],"sub_categories":["大语言对话模型及数据","Uncategorized","2022"],"readme":"# ReasoningNLP\nPaper list on reasoning in NLP. Here we mainly collect papers about datasets and methods using PLMs (under the progress). \n\n**See our survey on natural language reasoning:**\n\n- [Natural  Language Reasoning, A Survey](https://arxiv.org/pdf/2303.14725.pdf)\n\nHere are anohter two related surveys on LLM prompting: \n\n- [Reasoning with Language Model Prompting: A Survey](https://arxiv.org/pdf/2212.09597.pdf) | [resources](https://github.com/zjunlp/Prompt4ReasoningPapers)\n- [Towards Reasoning in Large Language Models: A Survey](https://arxiv.org/pdf/2212.10403.pdf) | [resources](https://github.com/jeffhj/LM-reasoning)\n\n\n\n\u003ch2\u003eTable of Contents\u003c/h2\u003e\n\n- [Methodology](#1)\n   - [Reasoning Paradigm](#1.1)\n      - [End-to-End Reasoning](#1.1.1)\n      - [Forward Reasoning](#1.1.2)\n      - [Backward Reasoning](#1.1.3)\n   - [Learning Paradigm](#1.2)\n      - [Finetuning](#1.2.1)\n      - [In-Context Learning](#1.2.2)\n- [Natural Language Reasoning](#2)\n   - [Classical Logical Reasoning](#2.1)\n      - [Datasets \u0026 Benchmarks](#2.1.1)\n      - [Related Works](#2.1.2)\n   - [Natural Language Inference](#2.2)\n      - [Datasets \u0026 Benchmarks](#2.2.1)\n      - [Related Works](#2.2.2)\n   - [Multi-hop Question Answering](#2.3)\n      - [Datasets \u0026 Benchmarks](#2.3.1)\n      - [Related Works](#2.3.2)\n   - [Commonsense Reasoning](#2.4)\n      - [Datasets \u0026 Benchmarks](#2.4.1)\n      - [Related Works](#2.4.2)\n      - [Knowledge Bases](#2.4.3)\n   - [Complex Reasoning](#2.5)\n      - [Datasets \u0026 Benchmarks](#2.5.1)\n- [Knowledge Graph Reasoning](#knowledge-graph-reasoning)\n- [Mathematical Reasoning](#mathematical-reasoning)\n\n\n\n\u003ch2 id=\"1\"\u003eMethodology\u003c/h2\u003e\n\n\n\u003ch3 id=\"1.1\"\u003eReasoning Paradigm\u003c/h3\u003e\n\n\u003ch4 id=\"1.1.1\"\u003eEnd-to-End Reasoning\u003c/h4\u003e\n\n\u003ch4 id=\"1.1.2\"\u003eForward Reasoning\u003c/h4\u003e\n\n\u003ch4 id=\"1.1.3\"\u003eBackward Reasoning\u003c/h4\u003e\n\n\n\u003ch3 id=\"1.2\"\u003eLearning Paradigm\u003c/h3\u003e\n\n\u003ch4 id=\"1.2.1\"\u003eFinetuning\u003c/h4\u003e\n\n\u003ch4 id=\"1.2.2\"\u003eIn-Context Learning\u003c/h4\u003e\n\n1. **Show Your Work: Scratchpads for Intermediate Computation with Language Models** arXiv (2021)\n\n   *Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena* [[pdf](https://arxiv.org/pdf/2112.00114.pdf)]\n\n2. **Chain of Thought Prompting Elicits Reasoning in Large Language Models** arXiv (2022)\n\n   *Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou* [[pdf](https://arxiv.org/pdf/2201.11903.pdf)] [[project](https://github.com/jasonwei20/chain-of-thought-prompting)]\n\n3. **Self-Consistency Improves Chain of Thought Reasoning in Language Models** arXiv (2022)\n\n   *Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou* [[pdf](https://arxiv.org/pdf/2203.11171.pdf)]\n\n4. **STaR: Bootstrapping Reasoning With Reasoning** arXiv (2022)\n\n   *Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman* [[pdf](https://arxiv.org/pdf/2203.14465.pdf)]\n\n5. **Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning** arXiv (2022)\n\n   *Antonia Creswell, Murray Shanahan, Irina Higgins* [[pdf](https://arxiv.org/pdf/2205.09712.pdf)]\n\n6. **Least-to-Most Prompting Enables Complex Reasoning in Large Language Models** arXiv (2022)\n\n   *Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed Chi* [[pdf](https://arxiv.org/pdf/2205.10625.pdf)]\n\n7. **Large Language Models are Zero-Shot Reasoners** arXiv (2022)\n\n   *Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa* [[pdf](https://arxiv.org/pdf/2205.11916.pdf)] [[project](https://github.com/kojima-takeshi188/zero_shot_cot)]\n\n8. **Language models show human-like content effects on reasoning** arXiv (2022)\n\n   *Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill* [[pdf](https://arxiv.org/pdf/2207.07051.pdf)]\n\n9. **Language Model Cascades** arXiv (2022)\n\n   *David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-dickstein, Kevin Murphy, Charles Sutton* [[pdf](https://arxiv.org/pdf/2207.10342.pdf)] [[project](https://model-cascades.github.io/)]\n\n10. **Faithful Reasoning Using Large Language Models** arXiv (2022)\n\n      *Antonia Creswell, Murray Shanahan* [[pdf](https://arxiv.org/pdf/2208.14271.pdf)]\n\n11. **Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought** arXiv (2022)\n\n      *Abulhair Saparov, He He* [[pdf](https://arxiv.org/pdf/2210.01240.pdf)] [[project](https://github.com/asaparov/prontoqa)]\n\n12. **ThinkSum: Probabilistic reasoning over sets using large language models** arXiv (2022)\n\n      *Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic* [[pdf](https://arxiv.org/pdf/2210.01293.pdf)]\n\n13. **Measuring and Narrowing the Compositionality Gap in Language Models** arXiv (2022)\n\n      *Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis* [[pdf](https://arxiv.org/pdf/2210.03350.pdf)] [[project](https://github.com/ofirpress/self-ask)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003ch2 id=\"2\"\u003eNLP Topics\u003c/h2\u003e\n\n\u003ch3 id=\"2.1\"\u003eClassical Logical Reasoning\u003c/h3\u003e\n\nSome datasets explicitly target at philosophical reasoning types, e.g. deduction, abduction and induction. Thus, we call them as ``classical logical reasoning tasks''. A key characteristic of this topic is that tasks are mostly artificial to study reasoning. \n\n\n\u003ch4 id=\"2.1.1\"\u003eDatasets \u0026 Benchmarks\u003c/h4\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=\"2\" align=\"center\"\u003eInference Type\u003c/th\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003cth rowspan=\"8\" colspan=\"2\" align=\"center\" valign=\"middle\"\u003eDeductive Reasoning\u003c/th\u003e\n      \u003ctd align=\"center\"\u003ebabI-15\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eextraction\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1502.05698.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://fb.ai/babi\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eRuleTaker\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003e500k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://www.ijcai.org/proceedings/2020/0537.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/ruletaker\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, the first large-scale benchmark. D*, Birds-Electricity, ParaRules\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eProofWriter\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003e500k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.findings-acl.317.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/proofwriter\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eimprovement on RuleTaker, + open-world assumption\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003ePARARULE Plus\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e400k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://www.cs.ox.ac.uk/isg/conferences/tmp-proceedings/NeSy2022/paper15.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/Strong-AI-Lab/PARARULE-Plus\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eimprovement on ParaRules, addresses the depth imbalance issue\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eAAC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e710k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneration\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.iwcs-1.7.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/debatelab/aacorpus\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, syllogistic arguments\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eLogicInference\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e200k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneration\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2203.15099.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/google-research/google-research/tree/master/logic_inference_dataset\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, more tasks\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eFOLIO\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1.4k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification, generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2209.00840.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/Yale-LILY/FOLIO\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eexpert-written, annotate with FOL\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eRobustLR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e120k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.emnlp-main.653/\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/INK-USC/RobustLR\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, robustness on logical semantics\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003cth rowspan=\"6\" align=\"center\" valign=\"middle\"\u003eDefeasible Reasoning\u003c/th\u003e\n      \u003cth rowspan=\"2\" align=\"center\" valign=\"middle\"\u003e\u003ci\u003eAbductive Reasoning\u003c/i\u003e\u003c/th\u003e\n      \u003ctd align=\"center\"\u003eAbductionRules\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneration\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2203.12186.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/Strong-AI-Lab/AbductionRule\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003evariant of RuleTaker\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eART \u003cbr /\u003e (\u0026alpha;NLI, \u0026alpha;NLG)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e17.8k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification, generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://openreview.net/pdf?id=Byg1v1HKDB\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://abductivecommonsense.xyz/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ecommonsense, based on ROCStories\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003cth rowspan=\"3\" align=\"center\" valign=\"middle\"\u003e\u003ci\u003eInductive Reasoning\u003c/i\u003e\u003c/th\u003e\n      \u003ctd align=\"center\"\u003ebAbI-16\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eextraction\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1502.05698.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://fb.ai/babi\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, induce-then-deduce\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCLUTRR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eextraction\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D19-1458.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/facebookresearch/clutrr\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esynthetic, induce-then-deduce, kinship\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eDEER\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1.2k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneration\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2212.10923.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e project  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003einduce explicit natural language rule (human-authored) from natural language facts (Webs)\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003cth align=\"center\" valign=\"middle\"\u003e\u003ci\u003eOthers\u003c/i\u003e\u003c/th\u003e\n      \u003ctd align=\"center\"\u003edefeasibleNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e43.8k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclassification, generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.findings-emnlp.418.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/rudinger/defeasible-nli\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003edirection on evidence updation, based on the existing datasets\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\nPapers on dataset artifacts:\n\n1. **On the Paradox of Learning to Reason from Data** arXiv (2022)\n\n   *Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, Guy Van den Broeck* [[pdf](https://arxiv.org/pdf/2205.11502.pdf)] [[project](https://github.com/joshuacnf/paradox-learning2reason)]\n\n\n\n\n\u003ch4 id=\"2.1.2\"\u003eRelated Works\u003c/h4\u003e\nDeductive reasoning:\n\n1. **Transformers as Soft Reasoners over Language** IJCAI (2020)\n\n   *Peter Clark, Oyvind Tafjord, Kyle Richardson* [[pdf](https://www.ijcai.org/proceedings/2020/0537.pdf)] [[project](https://allenai.org/data/ruletaker)]\n\n2. **PRover: Proof Generation for Interpretable Reasoning over Rules** EMNLP (2020)\n\n   *Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal* [[pdf](https://aclanthology.org/2020.emnlp-main.9.pdf)] [[project](https://github.com/swarnaHub/PRover)]\n\n3. **multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning** NAACL (2021)\n\n   *Swarnadeep Saha, Prateek Yadav, Mohit Bansal* [[pdf](https://aclanthology.org/2021.naacl-main.287.pdf)] [[project](https://github.com/swarnaHub/multiPRover)]\n\n4. **Critical Thinking for Language Models** IWCS (2021)\n\n   *Gregor Betz, Christian Voigt, Kyle Richardson* [[pdf](https://aclanthology.org/2021.iwcs-1.7.pdf)] [[project](https://github.com/debatelab/aacorpus)]\n\n5. **Explainable Multi-hop Verbal Reasoning Through Internal Monologue** NAACL (2021)\n\n   *Zhengzhong Liang, Steven Bethard, Mihai Surdeanu* [[pdf](https://aclanthology.org/2021.naacl-main.97.pdf)] [[project](https://github.com/clulab/releases/tree/master/naacl2021-evr)]\n\n6. **ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language** ACL findings (2021)\n\n   *Oyvind Tafjord, Bhavana Dalvi, Peter Clark* [[pdf](https://aclanthology.org/2021.findings-acl.317.pdf)] [[project](https://allenai.org/data/proofwriter)]\n\n7. **Flexible Generation of Natural Language Deductions** EMNLP (2021)\n\n      *Kaj Bostrom, Xinyu Zhao, Swarat Chaudhuri, Greg Durrett* [[pdf](https://aclanthology.org/2021.emnlp-main.506.pdf)] [[project](https://github.com/alephic/ParaPattern)]\n\n8. **FaiRR: Faithful and Robust Deductive Reasoning over Natural Language** ACL (2022)\n\n   *Soumya Sanyal, Harman Singh, Xiang Ren* [[pdf](https://aclanthology.org/2022.acl-long.77.pdf)] [[project](https://github.com/INK-USC/FaiRR)]\n\n9. **Interpretable Proof Generation via Iterative Backward Reasoning** NAACL (2022)\n\n   *Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, Ruifeng Xu* [[pdf](https://aclanthology.org/2022.naacl-main.216.pdf)] [[project](https://github.com/find-knowledge/IBR)]\n\n10. **Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning** arXiv (2022)\n\n      *Antonia Creswell, Murray Shanahan, Irina Higgins* [[pdf](https://arxiv.org/pdf/2205.09712.pdf)]\n\n11. **Generating Natural Language Proofs with Verifier-Guided Search** arXiv (2022)\n\n      *Kaiyu Yang, Jia Deng, Danqi Chen* [[pdf](https://arxiv.org/pdf/2205.12443.pdf)] [[project](https://github.com/princeton-nlp/NLProofS)]\n\n12. **ROBUSTLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners** arXiv (2022)\n\n      *Soumya Sanyal, Zeyi Liao, Xiang Ren* [[pdf](https://arxiv.org/pdf/2205.12598.pdf)] [[project](https://github.com/INK-USC/RobustLR)]\n\n13. **Language models show human-like content effects on reasoning** arXiv (2022)\n\n      *Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill* [[pdf](https://arxiv.org/pdf/2207.07051.pdf)]\n\n14. **Faithful Reasoning Using Large Language Models** arXiv (2022)\n\n      *Antonia Creswell, Murray Shanahan* [[pdf](https://arxiv.org/pdf/2208.14271.pdf)]\n\n15. **Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought** arXiv (2022)\n\n      *Abulhair Saparov, He He* [[pdf](https://arxiv.org/pdf/2210.01240.pdf)] [[project](https://github.com/asaparov/prontoqa)]\n\n16. **LAMBADA: Backward Chaining for Automated Reasoning in Natural Language** arXiv (2022)\n\n      *Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran* [[pdf](https://arxiv.org/pdf/2212.13894.pdf)]\n\n\nDefeasible reasoning:\n\n1. **Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision** AAAI (2020)\n\n   *Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi* [[pdf](https://arxiv.org/pdf/2012.08012.pdf)] [[project](https://github.com/fabrahman/RationaleGen)]\n\n2. **Could you give me a hint ? Generating inference graphs for defeasible reasoning** ACL findings (2021)\n\n   *Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, Eduard H. Hovy* [[pdf](https://aclanthology.org/2021.findings-acl.456.pdf)] [[project](https://tinyurl.com/defeasiblegraphs)]\n\n\n3. **Think about it! Improving defeasible reasoning by first modeling the question scenario** EMNLP (2021)\n\n   *Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard H. Hovy* [[pdf](https://aclanthology.org/2021.emnlp-main.508.pdf)] [[project](https://github.com/madaan/thinkaboutit)]\n\n4. **Language Models as Inductive Reasoners** arXiv (2022)\n\n   *Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei* [[pdf](https://arxiv.org/pdf/2212.10923.pdf)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003ch3 id=\"2.2\"\u003eNatural Language Inference\u003c/h3\u003e\n\nTask: given a premise-hypothesis pair, classify it into three classes: entailment, contradiction and neutral. \n\nThere are mainly three types of premise-hypothesis pairs in NLI task: paraphrase, compound semantics understanding (CSU), and reasoning. Here we just consider the last one.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003e\u003c/th\u003e\n      \u003cth align=\"center\"\u003ePremise\u003c/th\u003e\n      \u003cth align=\"center\"\u003eHypothesis\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eParaphrase\u003c/th\u003e\n      \u003ctd align=\"center\"\u003eTwo doctors perform surgery on patient\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eDoctors are performing surgery\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eCSU\u003c/th\u003e\n      \u003ctd align=\"center\"\u003eTwo women are embracing while holding to go packages\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eTwo women are holding packages \u003cbr /\u003e (\u003ci\u003eTwo women are embracing\u003c/i\u003e)\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eReasoning\u003c/th\u003e\n      \u003ctd align=\"center\"\u003eA soccer game with multiple males playing \u003cbr /\u003e (\u003ci\u003eSoccer is a sport\u003c/i\u003e)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eSome men are playing a sport\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\u003ch4 id=\"2.2.1\"\u003eDatasets \u0026 Benchmarks\u003c/h4\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth colspan=4 align=\"center\"\u003egeneric\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e570k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D15-1075.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"nlp.stanford.edu/projects/snli/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ethe first large-scale NLI dataset \u003cbr/\u003e one of the most typical\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003ee-SNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://proceedings.neurips.cc/paper/2018/file/4c7a167bb329bd92580a99ce422d6fa6-Paper.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/OanaMariaCamburu/e-SNLI\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eannotate natural language explanations for SNLI\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMultiNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e433k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/N18-1101.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://cims.nyu.edu/~sbowman/multinli/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ecover more styles and topics than SNLI \u003cbr/\u003e one of the most typical\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eDebiasedNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e7.5k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.acl-long.190.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/jimmycode/gen-debiased-nli\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003edebiased versions of SNLI \u0026 MultiNLI\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eXNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e7.5k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D18-1269.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/facebookresearch/XNLI/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ecross-lingual, based on MultiNLI\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMPE\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e10k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/I17-1011.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/aylai/MultiPremiseEntailment\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003emultiple premises\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003cth colspan=4 align=\"center\"\u003escience\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSciTail\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e27k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://ai2-website.s3.amazonaws.com/team/ashishs/scitail-aaai2018.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://data.allenai.org/scitail\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ethe first NLI dataset with entirely existing text\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSciNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e107k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.acl-long.511.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/msadat3/SciNLI\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003edata from scholarly papers\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nRecently, some datasets are proposed to model different subjective opinions (on classifying into which class) of crowdworkers.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=5 align=\"center\"\u003eSubjective Opinions\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eDomain\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eUNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e61k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.acl-main.774.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://nlp.jhu.edu/unli\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esubjective probability assessment (regression rather than binary), based on SNLI\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eChaosNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e464k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.emnlp-main.734.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/easonnie/ChaosNLI\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ehuman opinion distribution, based on SNLI, MultiNLI and \u0026alpha;NLI\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nSome datasets for other language:\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=4 align=\"center\"\u003eOther Languages\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLanguage\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eNLI-TR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eTurkish\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.emnlp-main.662.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/boun-tabi/NLI-TR\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003etranslate SNLI and MultiNLI\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eIndoNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eIndonesian\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.emnlp-main.821.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/ir-nlp-csui/indonli\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003edata collection protocol from MultiNLI\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nPapers on dataset artifacts:\n\n1. **Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment** LREC (2018)\n\n   *Masatoshi Tsuchiya* [[pdf](https://aclanthology.org/N19-1101.pdf)]\n\n2. **Annotation Artifacts in Natural Language Inference Data** NAACL (2018)\n\n   *Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith* [[pdf](https://aclanthology.org/N18-2017.pdf)]\n\n3. **Hypothesis Only Baselines in Natural Language Inference** SEM (2019)\n\n   *Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, Benjamin Van Durme* [[pdf](https://aclanthology.org/S18-2023.pdf)]\n\n\n\n\u003ch4 id=\"2.2.2\"\u003eRelated Works\u003c/h4\u003e\n\n\n1. **NILE : Natural Language Inference with Faithful Natural Language Explanations** ACL (2020)\n\n   *Sawan Kumar, Partha P. Talukdar* [[pdf](https://aclanthology.org/2020.acl-main.771.pdf)] [[project](https://github.com/SawanKumar28/nile)]\n\n\n2. **Identifying inherent disagreement in natural language inference** NAACL (2021)\n\n   *Xinliang Frederick Zhang, Marie-Catherine de Marneffe* [[pdf](https://aclanthology.org/2021.naacl-main.390.pdf)] [[project](https://github.com/FrederickXZhang/FgNLI)]\n\n\n3. **KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference** ACL (2021)\n\n   *Qianglong Chen, Feng Ji, Xiangji Zeng, Feng-Lin Li, Ji Zhang, Haiqing Chen, Yin Zhang* [[pdf](https://aclanthology.org/2021.acl-long.196.pdf)] [[project](https://github.com/AI4NLP/KACE)]\n\n4. **Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference** ACL findings (2021)\n\n   *Hai Hu, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Patterson, Yanting Li, Yixin Nie, Kyle Richardson* [[pdf](https://aclanthology.org/2021.findings-acl.331.pdf)] [[project](https://github.com/huhailinguist/ChineseNLIProbing)]\n\n5. **Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates** ACL (2022)\n\n   *Kunxun Qi, Hai Wan, Jianfeng Du, Haolan Chen* [[pdf](https://aclanthology.org/2022.acl-long.134.pdf)] [[project](https://github.com/qikunxun/PCT)]\n\n6. **Generating Intermediate Steps for NLI with Next-Step Supervision** arXiv (2022)\n\n   *Deepanway Ghosal, Somak Aditya, Monojit Choudhury* [[pdf](https://arxiv.org/pdf/2208.14641.pdf)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003ch3 id=\"2.3\"\u003eMulti-hop Question Answering\u003c/h3\u003e\n\nThis topic studies answering the complex questions that require to reason over evidences scattered in different contexts. The term ``hop'' here indicates the number of contexts required to reason. There are two settings on the required contexts: (1) all provided along with some distractors (i.e. distractor), (2) need to be retrieved (i.e. retrieval).\n\n\n\n\n\u003ch4 id=\"2.3.1\"\u003eDatasets \u0026 Benchmarks\u003c/h4\u003e\n\nSome datasets annotate the ground supporting evidences (paragraph-level, sentence-level, or triple-level), decomposed sub-questions (and the corresponding evidences), or reasoning paths.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eKnowledge Source\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSetting\u003c/th\u003e\n      \u003cth align=\"center\"\u003eAnswer Type\u003c/th\u003e\n      \u003cth align=\"center\"\u003eEvidence\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003cth colspan=8 align=\"center\"\u003egeneric\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eWikiHop\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e51k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003edistractor\u003c/td\u003e\n      \u003ctd align=\"center\"\u003echoice\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/Q18-1021.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://qangaroo.cs.ucl.ac.uk/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eone of the most typical\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eHotpotQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e112k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003edistractor, retrieval\u003c/td\u003e\n      \u003ctd align=\"center\"\u003espan, yes/no\u003c/td\u003e\n      \u003ctd align=\"center\"\u003esentence\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D18-1259.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://hotpotqa.github.io/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ethe most popular one\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eR4C\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e4.6k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etriple\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.acl-main.602.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://naoya-i.github.io/r4c/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eannotate atomic facts for HotpotQA\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eBeerQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e530\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eretrieval\u003c/td\u003e\n      \u003ctd align=\"center\"\u003espan, yes/no\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.emnlp-main.292.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://beerqa.github.io/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003emore hops\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e2WikiMultiHopQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e192k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003edistractor\u003c/td\u003e\n      \u003ctd align=\"center\"\u003espan\u003c/td\u003e\n      \u003ctd align=\"center\"\u003esentence \u003cbr /\u003e triple\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.coling-main.580.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/Alab-NII/2wikimultihop\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esimilar to WikiHop\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMuSiQue\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e25k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003edistractor\u003c/td\u003e\n      \u003ctd align=\"center\"\u003espan\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eparagraph \u003cbr /\u003e sub-questions\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.tacl-1.31.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/stonybrooknlp/musique\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003emore hops\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eStrategyQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2.7k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eretrieval\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eyes/no\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eparagraph \u003cbr /\u003e sub-questions\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.tacl-1.21.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/strategyqa\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eimplicit mult-hop questions\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003cth colspan=8 align=\"center\"\u003especific domain\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMedHop\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2.5k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eMedline\u003c/td\u003e\n      \u003ctd align=\"center\"\u003edistractor\u003c/td\u003e\n      \u003ctd align=\"center\"\u003echoice\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/Q18-1021.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://qangaroo.cs.ucl.ac.uk/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003emedicine. similar to WikiHop\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eQASC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e9.9k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWorldTree\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eretrieval\u003c/td\u003e\n      \u003ctd align=\"center\"\u003echoice\u003c/td\u003e\n      \u003ctd align=\"center\"\u003esentence\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1910.11473.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/allenai/qasc\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003escience\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eeQASC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ereasoning path\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/2020.emnlp-main.10.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/eqasc\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eannotate reasoning paths for QASC\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nPapers on dataset artifacts:\n\n1. **Understanding Dataset Design Choices for Multi-hop Reasoning** NAACL (2019)\n\n   *Jifan Chen, Greg Durrett* [[pdf](https://aclanthology.org/N19-1405.pdf)]\n\n2. **Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA** ACL (2019)\n\n   *Yichen Jiang, Mohit Bansal* [[pdf](https://aclanthology.org/P19-1262.pdf)] [[project](https://github.com/jiangycTarheel-zz/Adversarial-MultiHopQA)]\n\n3. **Compositional Questions Do Not Necessitate Multi-hop Reasoning** ACL (2019)\n\n      *Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer* [[pdf](https://aclanthology.org/P19-1416.pdf)] [[project](https://github.com/shmsw25/single-hop-rc)]\n\n4. **Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning** EMNLP (2020)\n\n      *Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal* [[pdf](https://aclanthology.org/2020.emnlp-main.712.pdf)] [[project](https://github.com/stonybrooknlp/dire)]\n\n\n\n\u003ch4 id=\"2.3.2\"\u003eRelated Works\u003c/h4\u003e\n\n1. **Dynamically Fused Graph Network for Multi-hop Reasoning** ACL (2019)\n\n   *Lin Qiu, Yunxuan Xiao, Yanru Qu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu* [[pdf](https://aclanthology.org/P19-1617.pdf)] [[project](https://github.com/woshiyyya/DFGN-pytorch)]\n\n2. **Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs** ACL (2019)\n\n   *Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou* [[pdf](https://aclanthology.org/P19-1260.pdf)]\n\n3. **Answering while Summarizing: Multi-task Learning for Multi-hop QA with Evidence Extraction** ACL (2019)\n\n   *Kosuke Nishida, Kyosuke Nishida, Masaaki Nagata, Atsushi Otsuka, Itsumi Saito, Hisako Asano, Junji Tomita* [[pdf](https://aclanthology.org/P19-1225.pdf)]\n\n4. **Multi-hop Reading Comprehension through Question Decomposition and Rescoring** ACL (2019)\n\n   *Sewon Min, Victor Zhong, Luke Zettlemoyer, Hannaneh Hajishirzi* [[pdf](https://aclanthology.org/P19-1613.pdf)] [[project](https://github.com/shmsw25/DecompRC)]\n\n\n5. **Differentiable Reasoning over a Virtual Knowledge Base** ICLR (2020)\n\n   *Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen* [[pdf](https://openreview.net/pdf?id=SJxstlHFPH)] [[project](http://www.cs.cmu.edu/~bdhingra/pages/drkit.html)]\n\n6. **Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention** ICLR Poster (2020)\n\n   *Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul N. Bennett, Saurabh Tiwary* [[pdf](https://openreview.net/pdf?id=r1eIiCNYwS)] [[project](https://aka.ms/transformer-xh)]\n\n7. **Low-Resource Generation of Multi-hop Reasoning Questions** ACL (2020)\n\n   *Jianxing Yu, Wei Liu, Shuang Qiu, Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin* [[pdf](http://aclanthology.lst.uni-saarland.de/2020.acl-main.601.pdf)]\n\n8. **SRLGRN: Semantic Role Labeling Graph Reasoning Network** EMNLP (2020)\n\n   *Chen Zheng, Parisa Kordjamshidi* [[pdf](https://aclanthology.org/2020.emnlp-main.714.pdf)]\n\n9. **Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering** ICLR (2020)\n\n   *Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong* [[pdf](https://openreview.net/pdf?id=SJgVHkrYDH)] [[project](https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths)]\n\n10. **Robustifying Multi-hop QA through Pseudo-Evidentiality Training** ACL (2021)\n\n      *Kyungjae Lee, Seung-won Hwang, Sang-eun Han, Dohyeon Lee* [[pdf](https://aclanthology.org/2021.acl-long.476.pdf)]\n\n11. **Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension** EMNLP (2021)\n\n      *Naoya Inoue, Harsh Trivedi, Steven Sinha, Niranjan Balasubramanian, Kentaro Inui* [[pdf](https://aclanthology.org/2021.emnlp-main.490.pdf)] [[project](https://github.com/StonyBrookNLP/suqa)]\n\n12. **Generative Context Pair Selection for Multi-hop Question Answering** EMNLP (2021)\n\n      *Dheeru Dua, Cícero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh* [[pdf](https://aclanthology.org/2021.emnlp-main.561.pdf)] [[project](https://github.com/dDua/JointQA)]\n\n13. **Breadth First Reasoning Graph for Multi-hop Question Answering** NAACL (2021)\n\n      *Yongjie Huang, Meng Yang* [[pdf](https://aclanthology.org/2021.naacl-main.464.pdf)]\n\n14. **Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval** ICLR Poster (2021)\n\n      *Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick S. H. Lewis, William Yang Wang, Yashar Mehdad, Scott Yih, Sebastian Riedel, Douwe Kiela, Barlas Oguz* [[pdf](https://openreview.net/pdf?id=EMHoBG0avc1)] [[project](https://github.com/facebookresearch/multihop_dense_retrieval)]\n\n15. **Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval** NAACL (2021)\n\n      *Chen Zhao, Chenyan Xiong, Jordan L. Boyd-Graber, Hal Daumé III* [[pdf](https://aclanthology.org/2021.naacl-main.368.pdf)] [[project](https://github.com/henryzhao5852/BeamDR)]\n\n16. **Unsupervised Multi-hop Question Answering by Question Generation** NAACL (2021)\n\n      *Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang* [[pdf](https://aclanthology.org/2021.naacl-main.469.pdf)] [[project](https://github.com/teacherpeterpan/Unsupervised-Multi-hop-QA)]\n\n17. **If You Want to Go Far Go Together: Unsupervised Joint Candidate Evidence Retrieval for Multi-hop Question Answering** NAACL (2021)\n\n      *Vikas Yadav, Steven Bethard, Mihai Surdeanu* [[pdf](https://aclanthology.org/2021.naacl-main.363.pdf)] [[project](https://github.com/vikas95/WAIR_interpretability)]\n\n18. **Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval** NIPS (2021)\n\n      *Omar Khattab, Christopher Potts, Matei A. Zaharia* [[pdf](https://papers.nips.cc/paper/2021/file/e8b1cbd05f6e6a358a81dee52493dd06-Paper.pdf)] [[project](https://github.com/stanford-futuredata/Baleen)]\n\n19. **Modeling Multi-hop Question Answering as Single Sequence Prediction** ACL (2022)\n\n      *Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Nitish Shirish Keskar, Caiming Xiong* [[pdf](https://aclanthology.org/2022.acl-long.69.pdf)]\n\n20. **CQG: A Simple and Effective Controlled Generation Framework for Multi-hop Question Generation** ACL (2022)\n\n      *Zichu Fei, Qi Zhang, Tao Gui, Di Liang, Sirui Wang, Wei Wu, Xuanjing Huang* [[pdf](https://aclanthology.org/2022.acl-long.475.pdf)] [[project](https://github.com/sion-zcfei/CQG)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003ch3 id=\"2.4\"\u003eCommonsense Reasoning\u003c/h3\u003e\n\nCommonsense reasoning deals with implicit commonsense knowledge, which may be non-trivial to machines since they are difficult to retrieve from the web due to reporting bias. While it is named with \"reasoning\", the common theme of this topic is commonsense knowledge rather than reasoning. Here, we only list reasoning datasets.\n\n\n\n\n\u003ch4 id=\"2.4.1\"\u003eDatasets \u0026 Benchmarks\u003c/h4\u003e\n\nThere are mainly three types of reasoning datasets towards \"what\" (i.e. assertions or events), \"what if / why\" (e.g. causal and temporal relations between events), and \"how\" (i.e. actions) respectively.\n\n\n\"What\" commonsense reasoning require combining multiple pieces of knowledge that some are from external knowledge sources while others are commonsense knowledge.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=6 align=\"center\"\u003e\"What\" Commonsense Reasoning\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eOther Knowledge Type / Source\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRationale\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eOpenBookQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e6k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003escience \u003cbr /\u003e / WorldTree\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/D18-1260.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://data.allenai.org/OpenBookQA\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eground science facts\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eOpenCSR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e20k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003escience \u003cbr /\u003e / WorldTree, ARC corpus\u003c/td\u003e\n      \u003ctd align=\"center\"\u003efree-form QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.naacl-main.366.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://open-csr.github.io/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCREAK\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e13k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eentity \u003cbr /\u003e / Wikipedia\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eclaim verification\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/5737c6ec2e0716f3d8a7a5c4e0de0d9a-Paper-round2.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://www.cs.utexas.edu/~yasumasa/creak\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eexplanation\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\"What if / Why\" commonsense reasoning often reasons for causal and temporal relations between events. There are two causal relations: causes and effects, which can be seen as backward causal reasoning and forward causal reasoning respectively.\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=6 align=\"center\"\u003e\"What if / Why\" Commonsense Reasoning\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eDirection\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eROCStories\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e50k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etemporal\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/N16-1098.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://www.cs.rochester.edu/nlp/rocstories/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSWAG\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e113k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etemporal\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/D18-1009.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://rowanzellers.com/swag/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eHellaSwag\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e20k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etemporal\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/P19-1472.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://rowanzellers.com/hellaswag/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ean upgraded SWAG\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCOPA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eboth\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://people.ict.usc.edu/~gordon/publications/AAAI-SPRING11A.PDF\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://people.ict.usc.edu/~gordon/copa.html\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSocial-IQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e38k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eboth\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D19-1454.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://tinyurl.com/socialiqa\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esocial situations\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003ee-CARE\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e21k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eboth\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.acl-long.33.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/Waste-Wood/e-CARE/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ewith ground supporting facts\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eWIQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e40k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eforward\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/D19-1629.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://data.allenai.org/wiqa/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eabout nature processes\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eTIMETRAVEL\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e29k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eforward\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneration\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/D19-1509.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/qkaren/Counterfactual-StoryRW\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ecounterfactual reasoning\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eART\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e20k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ebackward\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2-choice/generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://openreview.net/pdf?id=Byg1v1HKDB\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://abductivecommonsense.xyz/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eabductive commonsense reasoning\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eTellMeWhy\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e30k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ebackward\u003c/td\u003e\n      \u003ctd align=\"center\"\u003efree-form QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.findings-acl.53v2.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://lunr.cs.stonybrook.edu/tellmewhy\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eeach annotated 3 possible answers\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eWikiWhy\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e9k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ebackward\u003c/td\u003e\n      \u003ctd align=\"center\"\u003efree-form QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2210.12152.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/matt-seb-ho/WikiWhy\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eabout Wikipedia entities / events\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\n\"How\" commonsense reasoning is mainly about \"how to do it\". It is often related to problem-solving or decision-making.\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=6 align=\"center\"\u003e\"How\" Commonsense Reasoning\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSource\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eWikiHow Goal-Step\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1489k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eWikiHow, generated\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.emnlp-main.374v2.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/zharry29/wikihow-goal-step/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003egoals, steps, and temporal ordering\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003ePIQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e21k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ehuman-authored\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2-choice\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://yonatanbisk.com/piqa/\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://abductivecommonsense.xyz/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ephysical\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\nSome datasets involve multiple types of reasoning. \n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=5 align=\"center\"\u003eHybrid Commonsense Reasoning\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCSQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e12k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/N19-1421.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"www.tau-nlp.org/commonsenseqa\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eConceptNet concepts\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCoS-E\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/P19-1487.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/salesforce/cos-e\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eannotate explanations for CSQA\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eECQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.acl-long.238.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/dair-iitd/ECQA-Dataset\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eannotate commonsense facts for CSQA\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCSQA2\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e14k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eboolen QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://openreview.net/pdf?id=qF7FlUT5dxa\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://allenai.github.io/csqa2\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003edata construction via gamification\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCosmosQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e35k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"http://aclanthology.lst.uni-saarland.de/D19-1243.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://wilburone.github.io/cosmos\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ereading comprehension on blogs\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMoral Stories\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e12k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.emnlp-main.54.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/demelin/moral_stories\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003esituated reasoning with social norms\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\n\n\n\n\n\u003ch4 id=\"2.4.2\"\u003eRelated Works\u003c/h4\u003e\n\n1. **Attention Is (not) All You Need for Commonsense Reasoning** ACL (2019)\n\n   *Tassilo Klein, Moin Nabi* [[pdf](https://aclanthology.org/P19-1477.pdf)]\n\n2. **COMET: Commonsense Transformers for Automatic Knowledge Graph Construction** ACL (2019)\n\n   *Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi* [[pdf](https://aclanthology.org/P19-1470.pdf)] [[project](https://github.com/atcbosselut/comet-commonsense)]\n\n3. **Explain Yourself! Leveraging Language Models for Commonsense Reasoning** ACL (2019)\n\n   *Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher* [[pdf](https://aclanthology.org/P19-1487.pdf)] [[project](https://github.com/salesforce/cos-e)]\n\n4. **Commonsense Knowledge Mining from Pretrained Models** EMNLP (2019)\n\n   *Joe Davison, Joshua Feldman, Alexander M. Rush* [[pdf](https://aclanthology.org/D19-1109.pdf)]\n\n5. **How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG** EMNLP (2019)\n\n   *Paul Trichelair, Ali Emami, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung* [[pdf](https://aclanthology.org/D19-1335.pdf)] [[project](https://github.com/ptrichel/How-Reasonable-are-Common-Sense-Reasoning-Tasks)]\n\n6. **Guided Generation of Cause and Effect** IJCAI (2020)\n\n   *Zhongyang Li, Xiao Ding, Ting Liu, J. Edward Hu, Benjamin Van Durme* [[pdf](https://www.ijcai.org/proceedings/2020/0502.pdf)] [[project](http://nlp.jhu.edu/causalbank)]\n\n7. **Contrastive Self-Supervised Learning for Commonsense Reasoning** ACL (2020)\n\n   *Tassilo Klein, Moin Nabi* [[pdf](https://aclanthology.org/2020.acl-main.671.pdf)] [[project](https://github.com/SAP-samples/acl2020-commonsense/)]\n\n8. **Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning** ACL (2020)\n\n   *Alexandre Tamborrino, Nicola Pellicanò, Baptiste Pannier, Pascal Voitot, Louise Naudin* [[pdf](https://aclanthology.org/2020.acl-main.357.pdf)]\n\n9. **Evidence-Aware Inferential Text Generation with Vector Quantised Variational AutoEncoder** ACL (2020)\n\n      *Daya Guo, Duyu Tang, Nan Duan, Jian Yin, Daxin Jiang, Ming Zhou* [[pdf](https://aclanthology.org/2020.acl-main.544.pdf)] [[project](https://github.com/microsoft/EA-VQ-VAE)]\n\n10. **Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering** EMNLP (2020)\n\n      *Yanlin Feng, Xinyue Chen, Bill Yuchen Lin, Peifeng Wang, Jun Yan, Xiang Ren* [[pdf](https://aclanthology.org/2020.emnlp-main.99.pdf)] [[project](https://github.com/INK-USC/MHGRN)]\n\n11. **Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning** EMNLP (2020)\n\n      *Lianhui Qin, Vered Shwartz, Peter West, Chandra Bhagavatula, Jena D. Hwang, Ronan Le Bras, Antoine Bosselut, Yejin Choi* [[pdf](https://aclanthology.org/2020.emnlp-main.58.pdf)] [[project](https://github.com/qkaren/unsup_gen_for_cms_reasoning)]\n\n12. **Self-Supervised Knowledge Triplet Learning for Zero-Shot Question Answering** EMNLP (2020)\n\n      *Pratyay Banerjee, Chitta Baral* [[pdf](https://aclanthology.org/2020.emnlp-main.11.pdf)]\n\n13. **Unsupervised Commonsense Question Answering with Self-Talk** EMNLP (2020)\n\n      *Vered Shwartz, Peter West, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi* [[pdf](https://aclanthology.org/2020.emnlp-main.373.pdf)]\n\n14. **Paragraph-level Commonsense Transformers with Recurrent Memory** AAAI (2021)\n\n      *Saadia Gabriel, Chandra Bhagavatula, Vered Shwartz, Ronan Le Bras, Maxwell Forbes, Yejin Choi* [[pdf](https://arxiv.org/pdf/2010.01486.pdf)] [[project](https://github.com/skgabriel/paracomet)]\n\n15. **Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering** AAAI (2021)\n\n      *Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari* [[pdf](https://arxiv.org/pdf/2011.03863.pdf)] [[project](https://github.com/Mayer123/HyKAS-CSKG)]\n\n16. **Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models** ACL (2021)\n\n      *Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena D. Hwang, Yejin Choi* [[pdf](https://aclanthology.org/2021.acl-long.114.pdf)] [[project](https://homes.cs.washington.edu/~pawest/ReflectiveDecoding.html)]\n\n17. **Doing Good or Doing Right? Exploring the Weakness of Commonsense Causal Reasoning Models** ACL (2021)\n\n      *Mingyue Han, Yinglin Wang* [[pdf](https://aclanthology.org/2021.acl-short.20.pdf)]\n\n18. **Learning Event Graph Knowledge for Abductive Reasoning** ACL (2021)\n\n      *Li Du, Xiao Ding, Ting Liu, Bing Qin* [[pdf](https://aclanthology.org/2021.acl-long.403.pdf)] [[project](https://github.com/sjcfr/ege-RoBERTa)]\n\n19. **ExCAR: Event Graph Knowledge Enhanced Explainable Causal Reasoning** ACL (2021)\n\n      *Li Du, Xiao Ding, Kai Xiong, Ting Liu, Bing Qin* [[pdf](https://aclanthology.org/2021.acl-long.183.pdf)] [[project](https://github.com/sjcfr/ExCAR)]\n\n20. **Differentiable Open-Ended Commonsense Reasoning** NAACL (2021)\n\n      *Bill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, William W. Cohen* [[pdf](https://aclanthology.org/2021.naacl-main.366.pdf)] [[project](https://open-csr.github.io/)]\n\n21. **QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering** NAACL (2021)\n\n      *Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, Jure Leskovec* [[pdf](https://aclanthology.org/2021.naacl-main.45.pdf)] [[project](https://github.com/michiyasunaga/qagnn)]\n\n22. **Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models** EMNLP (2021)\n\n      *Tassilo Klein, Moin Nabi* [[pdf](https://aclanthology.org/2021.emnlp-main.688.pdf)] [[project](https://github.com/SAP-samples/emnlp2021-contrastive-refinement)]\n\n23. **Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models** EMNLP (2021)\n\n      *Kaixin Ma, Filip Ilievski, Jonathan Francis, Satoru Ozaki, Eric Nyberg, Alessandro Oltramari* [[pdf](https://aclanthology.org/2021.emnlp-main.445.pdf)] [[project](https://github.com/Mayer123/CS_Model_Adaptation)]\n\n24. **Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning** EMNLP (2021)\n\n      *Ruben Branco, António Branco, João António Rodrigues, João Ricardo Silva* [[pdf](https://aclanthology.org/2021.emnlp-main.113.pdf)] [[project](https://github.com/nlx-group/Shortcutted-Commonsense-Reasoning)]\n\n25. **Improving Unsupervised Commonsense Reasoning Using Knowledge-Enabled Natural Language Inference** EMNLP findings (2021)\n\n      *Canming Huang, Weinan He, Yongmei Liu* [[pdf](https://aclanthology.org/2021.findings-emnlp.420.pdf)] [[project](https://github.com/sysuhcm/NLI-KB)]\n\n\n26. **SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning** NIPS (2021)\n\n      *Aaron Chan, Jiashu Xu, Boyuan Long, Soumya Sanyal, Tanishq Gupta, Xiang Ren* [[pdf](https://proceedings.neurips.cc/paper/2021/file/9752d873fa71c19dc602bf2a0696f9b5-Paper.pdf)] [[project](https://github.com/INK-USC/SalKG)]\n\n27. **GreaseLM: Graph REASoning Enhanced Language Models** ICLR Spotlight (2022)\n\n      *Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec* [[pdf](https://openreview.net/pdf?id=41e9o6cQPj)] [[project](https://github.com/snap-stanford/GreaseLM)]\n\n28. **Generated Knowledge Prompting for Commonsense Reasoning** ACL (2022)\n\n      *Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi* [[pdf](https://aclanthology.org/2022.acl-long.225.pdf)] [[project](https://github.com/liujch1998/GKP)]\n\n29. **JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering** NAACL (2022)\n\n      *Yueqing Sun, Qi Shi, Le Qi, Yu Zhang* [[pdf](https://aclanthology.org/2022.naacl-main.372.pdf)] [[project](https://github.com/Yueqing-Sun/JointLK)]\n\n30. **Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning** NAACL (2022)\n\n      *Yu Jin Kim, Beong-woo Kwak, Youngwook Kim, Reinald Kim Amplayo, Seung-won Hwang, Jinyoung Yeo* [[pdf](https://aclanthology.org/2022.naacl-main.163.pdf)]\n\n31. **On Curriculum Learning for Commonsense Reasoning** NAACL (2022)\n\n      *Yu Jin Kim, Beong-woo Kwak, Youngwook Kim, Reinald Kim Amplayo, Seung-won Hwang, Jinyoung Yeo* [[pdf](https://aclanthology.org/2022.naacl-main.72.pdf)] [[project](https://github.com/adymaharana/curriculum_learning)]\n\n32. **Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference** NAACL (2022)\n\n      *Emils Kadikis, Vaibhav Srivastav, Roman Klinger* [[pdf](https://aclanthology.org/2022.naacl-main.441.pdf)] [[project](https://github.com/Vaibhavs10/anli-performance-prediction)]\n\n33. **Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge** NAACL (2022)\n\n      *Ian Porada, Alessandro Sordoni, Jackie Chi Kit Cheung* [[pdf](https://aclanthology.org/2022.naacl-main.337.pdf)]\n\n34. **ROCK: Causal Inference Principles for Reasoning about Commonsense Causality** ICML (2022)\n\n      *Jiayao Zhang, Hongming Zhang, Weijie J. Su, Dan Roth* [[pdf](https://proceedings.mlr.press/v162/zhang22am/zhang22am.pdf)] [[project](https://github.com/zjiayao/ccr_rock)]\n\n35. **ALERT: Adapting Language Models to Reasoning Tasks** arXiv (2022)\n\n      *Ping Yu, Tianlu Wang, Olga Golovneva, Badr AlKhamissy, Gargi Ghosh, Mona T. Diab, Asli Celikyilmaz* [[pdf](https://arxiv.org/pdf/2212.08286.pdf)]\n\n36. **Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations** EMNLP (2022)\n\n      *Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi* [[pdf](https://arxiv.org/pdf/2205.11822.pdf)] [[project](https://github.com/jaehunjung1/)]\n\n37. **Using Commonsense Knowledge to Answer Why-Questions** EMNLP (2022)\n\n      *Yash Kumar Lal, Niket Tandon, Tanvi Aggarwal, Horace Liu, Nathanael Chambers, Raymond J. Mooney, Niranjan Balasubramanian* [[pdf](https://www.cs.utexas.edu/users/ml/papers/lal.emnlp22.pdf)] [[project](https://github.com/StonyBrookNLP/knowwhy)]\n\n\n\n\n\u003ch4 id=\"2.4.3\"\u003eKnowledge Bases\u003c/h4\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eKB\u003c/th\u003e\n      \u003cth align=\"center\"\u003eType of Knowledge\u003c/th\u003e\n      \u003cth align=\"center\"\u003eFormat of Knowledge\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n  \u003c/tr \u003e\n\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eCYC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLISP-style logic\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://dl.acm.org/doi/pdf/10.1145/219717.219745\"\u003epaper\u003c/a\u003e \u003cbr /\u003e project \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eConceptNet\u003c/td\u003e\n      \u003ctd align=\"center\"\u003elinguistics\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etriple\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://agents.media.mit.edu/projects/commonsense/ConceptNet-BTTJ.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://www.conceptnet.org/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eConceptNet 5.5\u003c/td\u003e\n      \u003ctd align=\"center\"\u003elinguistics\u003c/td\u003e\n      \u003ctd align=\"center\"\u003etriple\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1612.03975.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/commonsense/conceptnet5\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eGenericsKB\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estatement\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2005.00660.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/genericskb\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eEvent2Mind\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emental state\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estatement\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/P18-1043.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://tinyurl.com/event2mind\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eATOMIC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003esocial causality\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estatement\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1811.00146.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/atomic\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eATOMIC 2020\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e+physical and eventive causality\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estatement\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2010.05953.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/allenai/comet-atomic-2020\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eSocial-Chem-101\u003c/td\u003e\n      \u003ctd align=\"center\"\u003erules-of-thumb\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estatement\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2020.emnlp-main.48.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/mbforbes/social-chemistry-101\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003ch3 id=\"2.5\"\u003eComplex Reasoning\u003c/h3\u003e\n\nThere are some datasets collected from realistic examinations or tests or explicitly designed to challenge LLMs, which may require domain-specific knowledge and multiple types of reasoning skills.\n\n\n\n\u003ch4 id=\"2.5.1\"\u003eDatasets \u0026 Benchmarks\u003c/h4\u003e\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=7 align=\"center\"\u003eRealistic Examinations\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eDataset\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSize\u003c/th\u003e\n      \u003cth align=\"center\"\u003eDomain\u003c/th\u003e\n      \u003cth align=\"center\"\u003eSource\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTask\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eAR-LSAT\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003elaw\u003c/td\u003e\n      \u003ctd align=\"center\"\u003elaw school admission test\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2022.findings-naacl.177.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/zhongwanjun/AR-LSAT\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eHEAD-QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e6.7k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ehealthcare\u003c/td\u003e\n      \u003ctd align=\"center\"\u003especialized healthcare examination\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/P19-1092.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://aghie.github.io/head-qa/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eAI2-ARC\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e7.7k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003escience\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egrade-school standardized test\u003c/td\u003e\n      \u003ctd align=\"center\"\u003emulti-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/1803.05457.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://data.allenai.org/arc\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eEntailmentBank\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e2k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eentailment tree generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://aclanthology.org/2021.emnlp-main.585.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://allenai.org/data/entailmentbank\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ereasoning paths to hypotheses from AI2-ARC\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eReClor\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e6k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003estandardized graduate admission examination\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eRC + multi-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://openreview.net/pdf?id=HJgJtT4tvB\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"http://whyu.me/reclor/\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMetaLogic\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n      \u003ctd align=\"center\"\u003elogic metagraph generation\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2210.12487.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/tencent-ailab/MetaLogic\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ereasoning graphs for passages in ReClor\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eLogiQA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e8k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003enational civil servants examination of China\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eRC + multi-choice QA\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://www.ijcai.org/proceedings/2020/0501.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/lgw863/LogiQA-dataset\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eConTRoL\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e8k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003egeneric\u003c/td\u003e\n      \u003ctd align=\"center\"\u003ecompetitive selection and recruitment test\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eNLI\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2011.04864.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/csitfun/ConTRoL-dataset\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003epassage-level\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n      \u003cth colspan=4 align=\"center\"\u003eDiagnostic Benchmarks for LLMs\u003c/th\u003e\n  \u003c/tr \u003e\n  \u003ctr\u003e\n      \u003cth align=\"center\"\u003eBenchmark\u003c/th\u003e\n      \u003cth align=\"center\"\u003eTasks\u003c/th\u003e\n      \u003cth align=\"center\"\u003eLink\u003c/th\u003e\n      \u003cth align=\"center\"\u003eRemark\u003c/th\u003e\n  \u003c/tr \u003e\n  \n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eBIG-Bench\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e204\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2206.04615.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/google/BIG-bench\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003ebelieved to be beyond the capabilities of current PLMs\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eBBH\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e23\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2210.09261.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/suzgunmirac/BIG-Bench-Hard\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003echallenging BIG-Bench tasks\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eMMLU\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e57\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://arxiv.org/pdf/2009.03300.pdf\"\u003epaper\u003c/a\u003e \u003cbr /\u003e \u003ca href=\"https://github.com/hendrycks/test\"\u003eproject\u003c/a\u003e  \u003c/td\u003e\n      \u003ctd align=\"center\"\u003eacross a diverse set of subjects that humans learn\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Knowledge Graph Reasoning\n**Knowledge graph completion** task aims to complete the graph, while **multi-hop reasoning over KG** is the task querying in incomplete graphs, both of which require reasoning over knowledge graphs. **Temporal knowledge graph reasoning** aims to predict links in future with the past quadruples. \n\n### Knowledge Graph Completion\n1. **Collaborative Policy Learning for Open Knowledge Graph Reasoning** EMNLP (2019)\n\n   *Cong Fu, Tong Chen, Meng Qu, Woojeong Jin, Xiang Ren* [[pdf](http://aclanthology.lst.uni-saarland.de/D19-1269.pdf)] [[project](https://github.com/shanzhenren/CPL)]\n\n\n2. **DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning** EMNLP (2019)\n\n   *Ruiping Li, Xiang Cheng* [[pdf](https://aclanthology.org/D19-1266.pdf)] [[project](https://github.com/BUPT-Data-Intelligence-Lab/DIVINE)]\n\n3. **Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning** EMNLP (2020)\n\n   *Deren Lei, Gangrong Jiang, Xiaotao Gu, Kexuan Sun, Yuning Mao, Xiang Ren* [[pdf](https://aclanthology.org/2020.emnlp-main.688.pdf)] [[project](https://github.com/derenlei/KG-RuleGuider)]\n\n4. **Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning** EMNLP (2019)\n\n   *Heng Wang, Shuangyin Li, Rong Pan, Mingzhi Mao* [[pdf](https://aclanthology.org/D19-1264.pdf)] [[project](https://aclanthology.org/attachments/D19-1264.Attachment.zip)]\n\n5. **Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning** ICLR Poster (2020)\n\n   *Xiaoran Xu, Wei Feng, Yunsheng Jiang, Xiaohui Xie, Zhiqing Sun, Zhi-Hong Deng* [[pdf](https://openreview.net/pdf?id=rkeuAhVKvB)] [[project](https://github.com/netpaladinx/DPMPN)]\n\n6. **Inductive Relation Prediction by Subgraph Reasoning** ICML (2020)\n\n   *Komal K. Teru, Etienne G. Denis, William L. Hamilton* [[pdf](https://proceedings.mlr.press/v119/teru20a/teru20a.pdf)] [[project](https://github.com/kkteru/grail)]\n\n7. **Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations** EMNLP (2019)\n\n   *Xin Lv, Yuxian Gu, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu* [[pdf](https://aclanthology.org/D19-1334.pdf)] [[project](https://github.com/THU-KEG/MetaKGR)]\n\n8. **Dynamic Anticipation and Completion for Multi-Hop Reasoning over Sparse Knowledge Graph** EMNLP (2020)\n\n   *Xin Lv, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Wei Zhang, Yichi Zhang, Hao Kong, Suhui Wu* [[pdf](https://aclanthology.org/2020.emnlp-main.459.pdf)] [[project](https://github.com/THU-KEG/DacKGR)]\n\n9. **UniKER: A Unified Framework for Combining Embedding and Definite Horn Rule Reasoning for Knowledge Graph Inference** EMNLP (2021)\n\n   *Kewei Cheng, Ziqing Yang, Ming Zhang, Yizhou Sun* [[pdf](https://aclanthology.org/2021.emnlp-main.769.pdf)]\n\n10. **Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability** EMNLP (2021)\n   \n      *Xin Lv, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Yichi Zhang, Zelin Dai* [[pdf](https://aclanthology.org/2021.emnlp-main.700.pdf)] [[project](https://github.com/THU-KEG/BIMR)]\n\n11. **GMH: A General Multi-hop Reasoning Model for KG Completion** EMNLP (2021)\n\n      *Yao Zhang, Hongru Liang, Adam Jatowt, Wenqiang Lei, Xin Wei, Ning Jiang, Zhenglu Yang* [[pdf](https://aclanthology.org/2021.emnlp-main.276.pdf)]\n\n12. **Neural-Symbolic Commonsense Reasoner with Relation Predictors** ACL (2021)\n   \n      *Farhad Moghimifar, Lizhen Qu, Yue Zhuo, Gholamreza Haffari, Mahsa Baktashmotlagh* [[pdf](https://aclanthology.org/2021.acl-short.100.pdf)] [[project](https://github.com/farhadmfar/commonsense_reasoner)]\n\n\n### Multi-Hop Reasoning over KG\n1. **Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings** ICLR Poster (2020)\n\n   *Hongyu Ren, Weihua Hu, Jure Leskovec* [[pdf](https://openreview.net/pdf?id=BJgr4kSFDS)] [[project](http://snap.stanford.edu/query2box)]\n\n2. **Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs** NIPS (2020)\n\n   *Hongyu Ren, Jure Leskovec* [[pdf](https://papers.nips.cc/paper/2020/file/e43739bba7cdb577e9e3e4e42447f5a5-Paper.pdf)] [[project](http://snap.stanford.edu/betae)]\n\n3. **Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs** NIPS (2021)\n\n   *Nurendra Choudhary, Nikhil Rao, Sumeet Katariya, Karthik Subbian, Chandan K. Reddy* [[pdf](https://papers.nips.cc/paper/2021/file/c4d2ce3f3ebb5393a77c33c0cd95dc93-Paper.pdf)] [[project](https://github.com/Akirato/PERM-GaussianKG)]\n\n4. **ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs** NIPS (2021)\n\n   *Zhanqiu Zhang, Jie Wang, Jiajun Chen, Shuiwang Ji, Feng Wu* [[pdf](https://papers.nips.cc/paper/2021/file/a0160709701140704575d499c997b6ca-Paper.pdf)] [[project](https://github.com/MIRALab-USTC/QE-ConE)]\n\n5. **Complex Query Answering with Neural Link Predictors** ICLR Oral (2021)\n\n   *Erik Arakelyan, Daniel Daza, Pasquale Minervini, Michael Cochez* [[pdf](https://openreview.net/pdf?id=Mos9F9kDwkz)] [[project](https://github.com/uclnlp/cqd)]\n\n\n### Temporal Knowledge Graph Reasoning\n1. **Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs** ICLR Poster (2021)\n\n   *Zhen Han, Peng Chen, Yunpu Ma, Volker Tresp* [[pdf](https://openreview.net/pdf?id=pGIHq1m7PU)] [[project](https://github.com/TemporalKGTeam/xERTE)]\n\n2. **Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs** ACL (2021)\n\n   *Zixuan Li, Xiaolong Jin, Saiping Guan, Wei Li, Jiafeng Guo, Yuanzhuo Wang, Xueqi Cheng* [[pdf](https://aclanthology.org/2021.acl-long.365.pdf)]\n\n3. **Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning** ACL (2022)\n\n   *Zixuan Li, Saiping Guan, Xiaolong Jin, Weihua Peng, Yajuan Lyu, Yong Zhu, Long Bai, Wei Li, Jiafeng Guo, Xueqi Cheng* [[pdf](https://aclanthology.org/2022.acl-short.32.pdf)] [[project](https://github.com/Lee-zix/CEN)]\n\n\n\n### Others\n1. **Quantum Embedding of Knowledge for Reasoning** NIPS (2019)\n\n   *Dinesh Garg, Shajith Ikbal, Santosh K. Srivastava, Harit Vishwakarma, Hima P. Karanam, L. Venkata Subramaniam* [[pdf](https://papers.nips.cc/paper/2019/file/cb12d7f933e7d102c52231bf62b8a678-Paper.pdf)] [[project](https://github.com/IBM/e2r)]\n\n2. **Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base** ICLR Poster (2020)\n\n   *William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler* [[pdf](https://openreview.net/pdf?id=BJlguT4YPr)]\n\n3. **Probabilistic Logic Neural Networks for Reasoning** NIPS (2019)\n\n   *Meng Qu, Jian Tang* [[pdf](https://papers.nips.cc/paper/2019/file/13e5ebb0fa112fe1b31a1067962d74a7-Paper.pdf)]\n\n4. **RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs** ICLR Poster (2021)\n\n   *Meng Qu, Junkun Chen, Louis-Pascal A. C. Xhonneux, Yoshua Bengio, Jian Tang* [[pdf](https://openreview.net/pdf?id=tGZu6DlbreV)] [[project](https://github.com/DeepGraphLearning/RNNLogic)]\n\n5. **Efficient Probabilistic Logic Reasoning with Graph Neural Networks** ICLR Poster (2020)\n\n   *Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, Le Song* [[pdf](https://openreview.net/pdf?id=rJg76kStwH)]\n\n6. **Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning** NAACL (2021)\n\n   *Xuelu Chen, Michael Boratko, Muhao Chen, Shib Sankar Dasgupta, Xiang Lorraine Li, Andrew McCallum* [[pdf](https://aclanthology.org/2021.naacl-main.68.pdf)] [[project](https://github.com/stasl0217/beurre)]\n\n7. **Multimodal Analogical Reasoning over Knowledge Graphs** ICLR (2023)\n\n   *Ningyu Zhang, Lei Li, Xiang Chen, Xiaozhuan Liang, Shumin Deng, Huajun Chen* [[pdf](https://openreview.net/pdf?id=NRHajbzg8y0P)] [[project](https://github.com/zjunlp/MKG_Analogy)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Mathematical Reasoning\n\n### Benchmarks \u0026 Datasets\n1. **Analysing Mathematical Reasoning Abilities of Neural Models** ICLR Poster (2019)\n\n   *David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli* [[pdf](https://openreview.net/pdf?id=H1gR5iR5FX)] [[project](https://github.com/deepmind/mathematics_dataset)]\n\n2. **HOList: An Environment for Machine Learning of Higher-Order Theorem Proving** ICML (2019)\n\n   *Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, Stewart Wilcox* [[pdf](http://proceedings.mlr.press/v97/bansal19a/bansal19a.pdf)] [[project](http://deephol.org/t)]\n\n3. **DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs** EMNLP (2019)\n\n   *Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner* [[pdf](https://aclanthology.org/N19-1246.pdf)] [[project](https://allennlp.org/drop)]\n\n4. **IsarStep: a Benchmark for High-level Mathematical Reasoning** ICLR Poster (2021)\n\n   *Wenda Li, Lei Yu, Yuhuai Wu, Lawrence C. Paulson* [[pdf](https://openreview.net/pdf?id=Pzj6fzU6wkj)] [[project](https://github.com/Wenda302/IsarStep)]\n\n5. **Towards Table-to-Text Generation with Numerical Reasoning** ACL (2021)\n\n   *Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura, Hiroya Takamura* [[pdf](https://aclanthology.org/2021.acl-long.115.pdf)] [[project](https://github.com/titech-nlp/numeric-nlg)]\n\n6. **Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning** ACL (2021)\n\n   *Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu* [[pdf](https://aclanthology.org/2021.acl-long.528.pdf)] [[project](https://lupantech.github.io/inter-gps)]\n\n7. **FINQA: A Dataset of Numerical Reasoning over Financial Data** EMNLP (2021)\n\n   *Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R. Routledge, William Yang Wang* [[pdf](https://aclanthology.org/2021.emnlp-main.300.pdf)] [[project](https://github.com/czyssrs/FinQA)]\n\n8. **SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables** NIPS (2021)\n\n   *Nafise Sadat Moosavi, Andreas Rücklé, Dan Roth, Iryna Gurevych* [[pdf](https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/149e9677a5989fd342ae44213df68868-Paper-round2.pdf)] [[project](https://github.com/UKPLab/SciGen)]\n\n9. **MULTIHIERTT: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data** ACL (2022)\n\n   *Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang* [[pdf](https://aclanthology.org/2022.acl-long.454.pdf)] [[project](https://github.com/psunlpgroup/MultiHiertt)]\n\n10. **NUMGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks** ACL (2022)\n\n      *Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Singh Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan* [[pdf](https://aclanthology.org/2022.acl-long.246.pdf)] [[project](https://allenai.org/data/numglue)]\n\n\n### Papers\n1. **Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems** NAACL (2019)\n\n   *Ting-Rui Chiang, Yun-Nung Chen* [[pdf](https://aclanthology.org/N19-1272.pdf)] [[project](https://github/MiuLab/E2EMathSolver)]\n\n2. **A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning** EMNLP (2019)\n\n   *Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li* [[pdf](https://aclanthology.org/D19-1170.pdf)] [[project](https://github.com/huminghao16/MTMSN)]\n\n3. **NumNet: Machine Reading Comprehension with Numerical Reasoning** EMNLP (2019)\n\n   *Qiu Ran, Yankai Lin, Peng Li, Jie Zhou, Zhiyuan Liu* [[pdf](https://aclanthology.org/D19-1251.pdf)] [[project](https://github.com/ranqiu92/NumNet)]\n\n4. **Mathematical Reasoning in Latent Space** ICLR Oral (2020)\n\n   *Dennis Lee, Christian Szegedy, Markus N. Rabe, Sarah M. Loos, Kshitij Bansal* [[pdf](https://openreview.net/pdf?id=Ske31kBtPr)]\n\n5. **Neural Module Networks for Reasoning over Text** ICLR Poster (2020)\n\n   *Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner* [[pdf](https://openreview.net/pdf?id=SygWvAVFPr)] [[project](http://cogcomp.org/page/publication_view/899)]\n\n6. **Injecting Numerical Reasoning Skills into Language Models** ACL (2020)\n\n   *Mor Geva, Ankit Gupta, Jonathan Berant* [[pdf](https://aclanthology.org/2020.acl-main.89.pdf)] [[project](https://github.com/ag1988/injecting_numeracy)]\n\n7. **Question Directed Graph Attention Network for Numerical Reasoning over Text** EMNLP (2020)\n\n   *Kunlong Chen, Weidi Xu, Xingyi Cheng, Zou Xiaochuan, Yuyu Zhang, Le Song, Taifeng Wang, Yuan Qi, Wei Chu* [[pdf](https://aclanthology.org/2020.emnlp-main.549.pdf)]\n\n8. **Mathematical Reasoning via Self-supervised Skip-tree Training** ICLR Spotlight (2021)\n\n   *Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy* [[pdf](https://openreview.net/pdf?id=YmqAnY0CMEy)]\n\n9. **Incorporating External Knowledge to Enhance Tabular Reasoning** NAACL (2021)\n\n   *J. Neeraja, Vivek Gupta, Vivek Srikumar* [[pdf](https://aclanthology.org/2021.naacl-main.224.pdf)] [[project](https://github.com/utahnlp/knowledge_infotabs)]\n\n10. **Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning** ACL (2021)\n\n      *Piotr Piekos, Mateusz Malinowski, Henryk Michalewski* [[pdf](https://aclanthology.org/2021.acl-short.49.pdf)]\n\n11. **GraphMR: Graph Neural Network for Mathematical Reasoning** ACL (2021)\n\n      *Weijie Feng, Binbin Liu, Dongpeng Xu, Qilong Zheng, Yun Xu* [[pdf](https://aclanthology.org/2021.emnlp-main.273.pdf)] [[project](https://github.com/nhpcc502/GraphMR)]\n\n12. **LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning** ICML (2021)\n\n      *Yuhuai Wu, Markus N. Rabe, Wenda Li, Jimmy Ba, Roger B. Grosse, Christian Szegedy* [[pdf](http://proceedings.mlr.press/v139/wu21c/wu21c.pdf)] [[project](https://github.com/tonywu95/LIME)]\n\n13. **Numerical reasoning in machine reading comprehension tasks: are we there yet?** EMNLP (2021)\n\n      *Hadeel Al-Negheimish, Pranava Madhyastha, Alessandra Russo* [[pdf](https://aclanthology.org/2021.emnlp-main.759.pdf)]\n\n14. **Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction** ACL (2022)\n\n      *Zhanming Jie, Jierui Li, Wei Lu* [[pdf](https://aclanthology.org/2022.acl-long.410.pdf)] [[project](https://github.com/allanj/Deductive-MWP)]\n\n15. **FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining** ACL (2022)\n\n      *Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang* [[pdf](https://aclanthology.org/2022.acl-long.82.pdf)] [[project](https://github.com/microsoft/TUTA_table_understanding)]\n\n16. **Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning** ACL (2022)\n\n      *Vivek Gupta, Shuo Zhang, Alakananda Vempala, Yujie He, Temma Choji, Vivek Srikumar* [[pdf](https://aclanthology.org/2022.acl-long.231.pdf)] [[project](https://tabevidence.github.io/)]\n\n17. **Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills** ACL (2022)\n\n      *Ori Yoran, Alon Talmor, Jonathan Berant* [[pdf](https://aclanthology.org/2022.acl-long.416.pdf)] [[project](https://github.com/oriyor/turning_tables)]\n\n18. **OPERA: Operation-Pivoted Discrete Reasoning over Text** NAACL (2022)\n\n      *Yongwei Zhou, Junwei Bao, Chaoqun Duan, Haipeng Sun, Jiahui Liang, Yifan Wang, Jing Zhao, Youzheng Wu, Xiaodong He, Tiejun Zhao* [[pdf](https://aclanthology.org/2022.naacl-main.119.pdf)] [[project](https://github.com/JD-AI-Research-NLP/OPERA)]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Contributor\n\nFei YU\n\n## Reference\n```bibtex\n@article{yu2023natural,\n  title={Natural Language Reasoning, A Survey},\n  author={Yu, Fei and Zhang, Hongbo and Wang, Benyou},\n  journal={arXiv preprint arXiv:2303.14725},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreedomIntelligence%2FReasoningNLP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFreedomIntelligence%2FReasoningNLP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreedomIntelligence%2FReasoningNLP/lists"}