https://github.com/allenai/unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System
https://github.com/allenai/unifiedqa
Last synced: 2 months ago
JSON representation
UnifiedQA: Crossing Format Boundaries With a Single QA System
Host: GitHub
URL: https://github.com/allenai/unifiedqa
Owner: allenai
License: apache-2.0
Created: 2020-04-23T17:41:56.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2022-05-09T20:51:31.000Z (about 3 years ago)
Last Synced: 2024-11-18T09:33:37.066Z (8 months ago)
Language: Python
Homepage: https://arxiv.org/abs/2005.00700
Size: 28.2 MB
Stars: 428
Watchers: 14
Forks: 43
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

Awesome-instruction-tuning - UnifiedQA - 340 M | (Datasets and Models / Modified from Traditional NLP)
README

        # UnifiedQA

You may want to check out:

 - Paper: https://arxiv.org/abs/2005.00700

 - Demo: https://unifiedqa.apps.allenai.org/

Update (Feb '22): UnifiedQA-v2 

 - Paper: https://arxiv.org/abs/2202.12359

## Using the models in PyTorch/HuggingFace

You can very easily load the models with [Transformers](https://github.com/huggingface/transformers/) >=3.1, instead of downloading them manually. 

The models are listed on [this page](https://huggingface.co/allenai). Here is a list of model these model names hosted on HuggingFace model hub: 

| Model Name                | Huggingface ID (s)                      |

|---------------------------|-----------------------------------------|

| UnifiedQA (T5) - small    | `allenai/unifiedqa-t5-small`            |

| UnifiedQA (T5) - base     | `allenai/unifiedqa-t5-base`             |

| UnifiedQA (T5) - large    | `allenai/unifiedqa-t5-large`            |

| UnifiedQA (T5) - 3B       | `allenai/unifiedqa-t5-3b`               |

| UnifiedQA (T5) - 11B      | `allenai/unifiedqa-t5-11b`              |

| UnifiedQA-v2 (T5) - small | `allenai/unifiedqa-v2-t5-small-[ckpt]`  |

| UnifiedQA-v2 (T5) - base  | `allenai/unifiedqa-v2-t5-base-[ckpt]`   |

| UnifiedQA-v2 (T5) - large | `allenai/unifiedqa-v2-t5-large-[ckpt]`  |

| UnifiedQA-v2 (T5) - 3B    | `allenai/unifiedqa-v2-t5-3b-[ckpt]`     |

| UnifiedQA-v2 (T5) - 11B   | `allenai/unifiedqa-v2-t5-11b-[ckpt]`    |

Where `[ckpt]` can be either `1251000` or `1363200`.  The numbers in [the paper](https://arxiv.org/abs/2202.12359) are reported based on `1251000` checkpoints. 

Here is an examples: 

```python

from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here

tokenizer = T5Tokenizer.from_pretrained(model_name)

model = T5ForConditionalGeneration.from_pretrained(model_name)

def run_model(input_string, **generator_args):

    input_ids = tokenizer.encode(input_string, return_tensors="pt")

    res = model.generate(input_ids, **generator_args)

    return tokenizer.batch_decode(res, skip_special_tokens=True)

```

For instance, here is how you can use it to answer a multiple-choice question: 

```python

run_model("which is best conductor? \\n (a) iron (b) feather")

```

which gives: `['iron']`

```python

run_model("scott filled a tray with juice and put it in a freezer. the next day, scott opened the freezer. how did the juice most likely change? \\n (a) it condensed. (b) it evaporated. (c) it became a gas. (d) it became a solid.")

```

which produces: `['it condensed.']`.

Note that you can also pass in the arguments for text generation to the `run_model(.)` function:

```python

run_model("which is best conductor? \\n (a) iron (b) feather (c) wood (d) plastic",

         temperature=0.9, num_return_sequences=4, num_beams=20)

```

## Feeding data into UnifiedQA

Datasets should be converted into a textin/text-out format.

 - Question always comes first.

 - We use `\n` separators between different parts of the input. This ensures having a humanlike encoding while not making it overly-specific to a certain format.  Note that this separator isn't the newline character (which it looks suspiciously like), but rather backslash-n.

 - Make sure the whole input is correctly [pre-processed](https://github.com/allenai/unifiedqa/blob/7bf0653c6fb68a51019924fd4c51615155acbebe/tasks.py#L54-L58) (e.g., lower-cased)

Here are several examples:

|  **Dataset** | **SQuAD 1.1 (extractive QA)** |

| :---: | :--- |

|  **Encoded Input** | `At what speed did the turbine operate? \n (Nikola_Tesla) On his 50th birthday in 1906, Tesla demonstrated his 200 horsepower (150 kilowatts) 16,000 rpm bladeless turbine. ...` |

|  **Encoded Output** | `16,000 rpm` |

|  **Dataset** | **NarrativeQA (Abstractive QA)** |

|  **Encoded Input** | `What does a drink from narcissus's spring cause the drinker to do?  \n  Mercury has awakened Echo, who weeps for Narcissus, and states that a drink from Narcissus's spring causes the drinkers to ''Grow dotingly enamored of themselves.'' ...` |

|  **Encoded Output** | `fall in love with themselves` |

|  **Dataset** | **ARC-challenge (Multiple-choice QA)** |

|  **Encoded Input** | `What does photosynthesis produce that helps plants grow? \n (A) water (B) oxygen (C) protein (D) sugar` |

|  **Encoded Output** | `sugar` |

|  **Dataset** | **MCTest (Multiple-choice QA)** |

|  **Encoded Input** | `Who was Billy? \n (A) The skinny kid (B) A teacher (C) A little kid (D) The big kid \n Billy was like a king on the school yard. A king without a queen. He was the biggest kid in our grade, so he made all the rules during recess. ...` |

|  **Encoded Output** | `The big kid` |

|  **Dataset** | **BoolQ (Yes-no QA)** |

|  **Encoded Input** | `Was America the first country to have a president?  \n (President) The first usage of the word president to denote the highest official in a government was during the Commonwealth of England ...` |

|  **Encoded Output** | `no` |

If you wanna see how this encoding is done on our datasets, check out this [script](encode_datasets.py).

### The datasets/tasks used in the experiments

While the datasets we used are all public, it could be a bit time-confusing to convert them all into text-to-text format. We're releasing the already-proccessed text-to-text datasets based on the encoding used in this work. Files are included in [this Google Cloud bucket](https://console.cloud.google.com/storage/browser/unifiedqa/data). [Here](encode_datasets.py) is the script we used in order to convert each dataset into text-in-text-out format.

## Prediction files

Reach out to DanielK if you want them! :)

## Released Model Checkpoints

If you intend to create a QA system, you can use our QA-specialized models for your purpose:

### T5 models

 - UnifiedQA (T5, small) [gs://unifiedqa/models/small](https://console.cloud.google.com/storage/browser/unifiedqa/models/small)

 - UnifiedQA (T5, base) [gs://unifiedqa/models/base](https://console.cloud.google.com/storage/browser/unifiedqa/models/base)

 - UnifiedQA (T5, large) [gs://unifiedqa/models/large](https://console.cloud.google.com/storage/browser/unifiedqa/models/large)

 - UnifiedQA (T5, 3B) [gs://unifiedqa/models/3B](https://console.cloud.google.com/storage/browser/unifiedqa/models/3B)

 - UnifiedQA (T5, 11B) [gs://unifiedqa/models/11B](https://console.cloud.google.com/storage/browser/unifiedqa/models/11B)

Note: In the experiments reported in our paper we always used the checkpoint closest to 100k steps (it usually corresponds to checkpoint 1100500)

You can use these in two ways:

- If you don't have any training data, you can use them for [the evaluation](https://github.com/google-research/text-to-text-transfer-transformer#eval).

- If you training data, you can use them as your initial models and [fine-tune on them](https://github.com/google-research/text-to-text-transfer-transformer#fine-tuning).

For more details see [the T5 repository](https://github.com/google-research/text-to-text-transfer-transformer).

### BART models

The BART models are downloaded from [this link](https://nlp.cs.washington.edu/ambigqa/models/unifiedQA/unifiedQA-bart.zip) (3.6G).

For detailed instructions on running the code (training/finetuning/testing), please refer to [here](https://github.com/allenai/unifiedqa/tree/master/bart).

The `uncased` models usually gave us better and more robust results.

### v2 T5 models 

 - UnifiedQA v2 (T5, small) [gs://unifiedqa/models_v2/small](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/small)

 - UnifiedQA v2 (T5, base) [gs://unifiedqa/models_v2/base](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/base)

 - UnifiedQA v2 (T5, large) [gs://unifiedqa/models_v2/large](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/large)

 - UnifiedQA v2 (T5, 3B) [gs://unifiedqa/models_v2/3B](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/3B)

 - UnifiedQA v2 (T5, 11B) [gs://unifiedqa/models_v2/11B](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/11B)

Note: In the experiments reported in our paper we always used the checkpoint closest to 250k steps. 

## FAQ

**I am not getting the expected results.** An common issue with using UnifiedQA is making sure you use the separator (`\n`) when encoding encoding your inputs. See [the earlier section](#feeding-data-into-unifiedqa) where we delineate how to encode the inputs.

**Help! I am getting the following error!** See [this discussion](https://github.com/google-research/text-to-text-transfer-transformer/issues/180) if you're getting the following error:

```bash

ValueError: Configurable 'make_layer_stack' doesn't have a parameter named 'use_universal_transformer'.

  In file "gs://danielk-files/t5-models/union_mixture/11B/operative_config.gin", line 83

```

## How to cite

If you extend or use this work, please cite the relevant papers:

```bibtex

@inproceedings{2020unifiedqa,

    title={UnifiedQA: Crossing Format Boundaries With a Single QA System},

    author={D. Khashabi and S. Min and T. Khot and A. Sabhwaral and O. Tafjord and P. Clark and H. Hajishirzi},

    journal={EMNLP - findings},

    year={2020}

}

@article{khashabi2022unifiedqa,

    title={UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training},

    author={Khashabi, Daniel and Kordi, Yeganeh and Hajishirzi, Hannaneh},

    journal={arXiv preprint arXiv:2202.12359},

    year={2022}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/allenai/unifiedqa

Awesome Lists containing this project

README