Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wellecks/naturalproofs

NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)
https://github.com/wellecks/naturalproofs

Last synced: 12 days ago
JSON representation

NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)

Awesome Lists containing this project

README

        

## NaturalProofs: Mathematical Theorem Proving in Natural Language

[NaturalProofs: Mathematical Theorem Proving in Natural Language](https://cs.nyu.edu/~welleck/welleck2021naturalproofs.pdf)\
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4632538.svg)](https://doi.org/10.5281/zenodo.4632538)

This repo contains:

- The **NaturalProofs Dataset**
- **Tokenized task data** for mathematical reference retrieval and generation.
- **Preprocessing** NaturalProofs and the task data.
- **Training** and **evaluation** for mathematical reference retrieval and generation.
- **Pretrained models** for mathematical reference retrieval and generation.

Please cite our work if you found the resources in this repository useful:
```
@inproceedings{welleck2021naturalproofs,
title={NaturalProofs: Mathematical Theorem Proving in Natural Language},
author={Sean Welleck and Jiacheng Liu and Ronan Le Bras and Hannaneh Hajishirzi and Yejin Choi and Kyunghyun Cho},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=Jvxa8adr3iY}
}
```

## Quick download
To download and unpack NaturalProofs, use:
```
pip install gdown
python download.py --naturalproofs --savedir /path/to/savedir
```

To download and unpack all files that we describe below, use:
```
python download.py --naturalproofs --tokenized --checkpoint --other --savedir /path/to/savedir
```
This creates the following file structure:
```
{savedir}/data # contains NaturalProofs base data (.json files) and tokenized task data (.pkl files)
{savedir}/ckpt # contains pretrained model checkpoints
{savedir}/other # contains precomputed files for evaluation (ref encodings, etc.)
```

## NaturalProofs Dataset
We provide the NaturalProofs Dataset (JSON per domain):

| NaturalProofs Dataset [[zenodo](https://doi.org/10.5281/zenodo.4632538)]| Domain|
|-|-|
|[naturalproofs_proofwiki.json](https://zenodo.org/record/4902289/files/naturalproofs_proofwiki.json?download=1)|ProofWiki|
|[naturalproofs_stacks.json](https://zenodo.org/record/4902289/files/naturalproofs_stacks.json?download=1)|Stacks|
|[naturalproofs_trench.json](https://zenodo.org/record/4902202/files/naturalproofs_trench.json?download=1)|Real Analysis textbook|
|[naturalproofs_stein.json](https://zenodo.org/record/4902289/files/naturalproofs_stein.py?download=1) (script)|Number Theory textbook|

To download NaturalProofs, use:
```
python download.py --naturalproofs --savedir /path/to/savedir
```

#### Combined ProofWiki+Stacks
The download includes an extra combined ProofWiki+Stacks file made with [notebooks/merge.ipynb](notebooks/merge.ipynb).

#### Preprocessing
To see the steps used to create each domain of NaturalProofs from raw data, see the following notebooks.\
This preprocessing is **not needed** if you are using a preprocessed dataset provided above.
| Domain| |
|-|-|
|ProofWiki|[notebook](notebooks/parse_proofwiki.ipynb)|
|Stacks|[notebook](notebooks/parse_stacks.ipynb)|
|Real Analysis textbook|[notebook](notebooks/parse_textbooks.ipynb)|
|Number Theory textbook|[notebook](notebooks/parse_textbooks.ipynb)|

## Mathematical Reference Retrieval and Generation
To use NaturalProofs for the reference retrieval and generation tasks described in the paper, the first step is tokenization.

### Tokenized dataset
We tokenize the raw NaturalProofs Dataset into two different formats:
- **Pairwise**: `(x, r)`
- `x` theorem (sequence of tokens)
- `r` reference (sequence of tokens)
- This version is used to train and evaluate the **pairwise model**.
- **Sequence**: `(x, [rid_1, ..., rid_Tx])`
- `x` theorem (sequence of tokens)
- `rid_i` reference id
- This version is used to train and evaluate the **autoregressive** and **joint** models.

We provide the following versions used in the paper (`bert-based-cased` tokenizer):
| Type | Domain| Splits|
|-|-|-|
|Pairwise, `bert-base-cased`|Proofwiki | train,valid,test |
||Stacks| train,valid,test |
||Real Analysis (textbook))| test |
||Number Theory (textbook)| test |
|Sequence, `bert-base-cased`|Proofwiki | train,valid,test |
||Stacks| train,valid,test |
||Real Analysis (textbook)| test |
||Number Theory (textbook)| test |

To download and unpack them, use:
```
python download.py --tokenized --savedir /path/to/savedir
```
Or use [google drive link](https://drive.google.com/file/d/1OCIvcCyKTyRJeV7QiHdtQQhPJ6QknMpV/view?usp=sharing).

### Pretrained Models
We provide the following models used in the paper:
| Type | |Domain|
|-|-|-|
|**Pairwise**|`bert-base-cased`|Proofwiki | [link]()|
|**Pairwise**|`bert-base-cased`|Stacks| [link]()|
|**Pairwise**|`bert-base-cased`|Proofwiki+Stacks | [link]()|
|**Joint**|`bert-base-cased`|Proofwiki | [link]()|
|**Joint**|`bert-base-cased`|Stacks| [link]()|
|**Joint**|`bert-base-cased`|Proofwiki+Stacks | [link]()|
|**Autoregressive**|`bert-base-cased`|Proofwiki |
|**Autoregressive**|`bert-base-cased`|Stacks| [link]()|

To download and unpack them, use:
```
python download.py --checkpoint --savedir /path/to/savedir
```
Or use [google drive link](https://drive.google.com/file/d/1uIBeI7fw5vJBhDOl2WL3SbXWmzHgfK3W/view?usp=sharing).

### Creating your own tokenized dataset
This step is **not needed** if you are using a tokenized dataset provided above.\
First, setup the code:
```bash
python setup.py develop
```

To create your own tokenized versions:
- **Pairwise**: `python naturalproofs/tokenize_pairwise.py`
- **Sequence**: `python naturalproofs/encoder_decoder/utils.py`

## Evaluation

We will show you how to run evaluation on the pretrained model checkpoints & associated files.
### Setup
We will assume the file structure given by using the download script.
```bash
python download.py --naturalproofs --tokenized --checkpoint --other --savedir
```

We provide a script which assembles an evaluation command for `(model type, domain, task)` combinations.\
We show example commands below.

```bash
python run_analysis.py \
--train-ds-names {proofwiki stacks}+ \ # one or more training domains to choose a model
--eval-ds-names {proofwiki stacks stein trench}+ \ # one or more evaluation domains
--model {pairwise, joint, autoregressive} \
--generation \ # for generation task (autoregressive or joint models only)
--split {valid, test} \
--gpu \
--codedir /path/to/naturalproofs_code \
--datadir /data \
--ckptdir /ckpt \
--outdir /output
```

To make sure your filepaths line up, please look inside `run_analysis.py` to see how the `--{}dir` arguments are used.

#### Example: pairwise retrieval
```
python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki stein trench \
--model pairwise \
--gpu 1 \
--split test
```

#### Example: joint retrieval
```
python run_analysis.py --train-ds-names proofwiki \
--eval-ds-names proofwiki \
--model joint \
--gpu 1 \
--split test
```

#### Example: joint retrieval OOD
For OOD evaluation on `stein` and `trench` textbooks, provide
reference embeddings from the pairwise model.\
These are the `__encs.pt` files from running the pairwise retrieval evaluation (we provide an example in `other/`).
```
python run_analysis.py --model joint \
--train-ds-names proofwiki \
--eval-ds-names stein trench \
--stein-rencs /other/pairwise__train_proofwiki__eval_stein__test__encs.pt \
--trench-rencs /other/pairwise__train_proofwiki__eval_trench__test__encs.pt \
--gpu 1 \
--split test
```

#### Example: joint retrieval proofwiki+stacks model
To align the model's combined output space with the individual dataset used for evaluation, give a `tok2tok.pkl` map (we provide an example in `other/`):
```
python run_analysis.py --model joint \
--train-ds-names both \
--eval-ds-names proofwiki stacks \
--modeltok2datatok /other/tok2tok.pkl \
--gpu 1 \
--split test
```
Note that OOD evaluation (`stein` or `trench`) is not implemented for the combined model.

#### Example: autoregressive retrieval
Without the `--generation` flag, adjusts settings for retrieval evaluation:
```
python run_analysis.py --model autoregressive \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid
```

Note that OOD evaluation (`stein` or `trench`) is not implemented for the autoregressive model.

#### Example: autoregressive generation
```
python run_analysis.py --model autoregressive --generation \
--train-ds-names proofwiki \
--eval-ds-names proofwiki \
--gpu 1 \
--split valid
```

## Training

The provided code supports:
- Training a **pairwise** model
- Training an **autoregressive** or **joint** model, initialized with pairwise model components (parameters, reference embeddings)

#### Training a pairwise model

```bash
python naturalproofs/model.py --expr-name pairwise \
--datapath /path/to/_tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
```

#### Training a joint model
The joint (and autoregressive) model uses a pairwise checkpoint, and reference encodings for initialization.

- The pairwise checkpoint is saved during pairwise training.
- The reference encodings are saved in a `encs.pt` file during pairwise Evaluation.

```bash
python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence__tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__.ckpt \
--encs-file /path/to/train___eval___valid__encs.pt \ # obtained from running evaluation on trained pairwise model
--parallel 1 \
--set-mode 1 # discard duplicates
```

Our implementation uses the same encoder-decoder architecture for the autoregressive and joint models,
considering the joint model as a one-step special case (with KL-div loss).
See the Appendix for a discussion on this design decision and technical details.

#### Training an autoregressive model

```bash
python naturalproofs/encoder_decoder/model.py \
--datapath /path/to/sequence__tokenized__bert-base-cased.pkl \
--default-root-dir /path/to/output
--pretrained-retrieval-checkpoint /path/to/pairwise__.ckpt \
--encs-file /path/to/train___eval___valid__encs.pt \ # obtained from running evaluation on trained pairwise model
--parallel 0 \
--set-mode 0 # keep duplicates
```

### Non-neural baselines
TF-IDF example:
```bash
python naturalproofs/baselines.py \
--method tfidf \
--datapath /path/to/_tokenized__bert-base-cased_200.pkl \
--datapath-base /path/to/naturalproofs_.json \
--savedir /path/to/out/

==> /path/to/out/tfidf__eval.pkl
```
Then use `analyze.py` to compute metrics:
```bash
python naturalproofs/analyze.py \
--method tfidf \
--eval-path /path/to/out/tfidf__eval.pkl \
--datapath-base /path/to/naturalproofs_.json

==> /path/to/out/tfidf__analysis.pkl
```