Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/allenai/unifiedqa
UnifiedQA: Crossing Format Boundaries With a Single QA System
https://github.com/allenai/unifiedqa
Last synced: 3 months ago
JSON representation
UnifiedQA: Crossing Format Boundaries With a Single QA System
- Host: GitHub
- URL: https://github.com/allenai/unifiedqa
- Owner: allenai
- License: apache-2.0
- Created: 2020-04-23T17:41:56.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-05-09T20:51:31.000Z (almost 3 years ago)
- Last Synced: 2024-08-04T23:09:30.841Z (6 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2005.00700
- Size: 28.2 MB
- Stars: 427
- Watchers: 14
- Forks: 43
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-instruction-tuning - UnifiedQA - 340 M | (Datasets and Models / Modified from Traditional NLP)
README
# UnifiedQA
You may want to check out:
- Paper: https://arxiv.org/abs/2005.00700
- Demo: https://unifiedqa.apps.allenai.org/Update (Feb '22): UnifiedQA-v2
- Paper: https://arxiv.org/abs/2202.12359## Using the models in PyTorch/HuggingFace
You can very easily load the models with [Transformers](https://github.com/huggingface/transformers/) >=3.1, instead of downloading them manually.
The models are listed on [this page](https://huggingface.co/allenai). Here is a list of model these model names hosted on HuggingFace model hub:| Model Name | Huggingface ID (s) |
|---------------------------|-----------------------------------------|
| UnifiedQA (T5) - small | `allenai/unifiedqa-t5-small` |
| UnifiedQA (T5) - base | `allenai/unifiedqa-t5-base` |
| UnifiedQA (T5) - large | `allenai/unifiedqa-t5-large` |
| UnifiedQA (T5) - 3B | `allenai/unifiedqa-t5-3b` |
| UnifiedQA (T5) - 11B | `allenai/unifiedqa-t5-11b` |
| UnifiedQA-v2 (T5) - small | `allenai/unifiedqa-v2-t5-small-[ckpt]` |
| UnifiedQA-v2 (T5) - base | `allenai/unifiedqa-v2-t5-base-[ckpt]` |
| UnifiedQA-v2 (T5) - large | `allenai/unifiedqa-v2-t5-large-[ckpt]` |
| UnifiedQA-v2 (T5) - 3B | `allenai/unifiedqa-v2-t5-3b-[ckpt]` |
| UnifiedQA-v2 (T5) - 11B | `allenai/unifiedqa-v2-t5-11b-[ckpt]` |Where `[ckpt]` can be either `1251000` or `1363200`. The numbers in [the paper](https://arxiv.org/abs/2202.12359) are reported based on `1251000` checkpoints.
Here is an examples:
```python
from transformers import T5Tokenizer, T5ForConditionalGenerationmodel_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)def run_model(input_string, **generator_args):
input_ids = tokenizer.encode(input_string, return_tensors="pt")
res = model.generate(input_ids, **generator_args)
return tokenizer.batch_decode(res, skip_special_tokens=True)
```For instance, here is how you can use it to answer a multiple-choice question:
```python
run_model("which is best conductor? \\n (a) iron (b) feather")
```
which gives: `['iron']````python
run_model("scott filled a tray with juice and put it in a freezer. the next day, scott opened the freezer. how did the juice most likely change? \\n (a) it condensed. (b) it evaporated. (c) it became a gas. (d) it became a solid.")
```
which produces: `['it condensed.']`.Note that you can also pass in the arguments for text generation to the `run_model(.)` function:
```python
run_model("which is best conductor? \\n (a) iron (b) feather (c) wood (d) plastic",
temperature=0.9, num_return_sequences=4, num_beams=20)
```## Feeding data into UnifiedQA
Datasets should be converted into a textin/text-out format.- Question always comes first.
- We use `\n` separators between different parts of the input. This ensures having a humanlike encoding while not making it overly-specific to a certain format. Note that this separator isn't the newline character (which it looks suspiciously like), but rather backslash-n.
- Make sure the whole input is correctly [pre-processed](https://github.com/allenai/unifiedqa/blob/7bf0653c6fb68a51019924fd4c51615155acbebe/tasks.py#L54-L58) (e.g., lower-cased)Here are several examples:
| **Dataset** | **SQuAD 1.1 (extractive QA)** |
| :---: | :--- |
| **Encoded Input** | `At what speed did the turbine operate? \n (Nikola_Tesla) On his 50th birthday in 1906, Tesla demonstrated his 200 horsepower (150 kilowatts) 16,000 rpm bladeless turbine. ...` |
| **Encoded Output** | `16,000 rpm` |
| **Dataset** | **NarrativeQA (Abstractive QA)** |
| **Encoded Input** | `What does a drink from narcissus's spring cause the drinker to do? \n Mercury has awakened Echo, who weeps for Narcissus, and states that a drink from Narcissus's spring causes the drinkers to ''Grow dotingly enamored of themselves.'' ...` |
| **Encoded Output** | `fall in love with themselves` |
| **Dataset** | **ARC-challenge (Multiple-choice QA)** |
| **Encoded Input** | `What does photosynthesis produce that helps plants grow? \n (A) water (B) oxygen (C) protein (D) sugar` |
| **Encoded Output** | `sugar` |
| **Dataset** | **MCTest (Multiple-choice QA)** |
| **Encoded Input** | `Who was Billy? \n (A) The skinny kid (B) A teacher (C) A little kid (D) The big kid \n Billy was like a king on the school yard. A king without a queen. He was the biggest kid in our grade, so he made all the rules during recess. ...` |
| **Encoded Output** | `The big kid` |
| **Dataset** | **BoolQ (Yes-no QA)** |
| **Encoded Input** | `Was America the first country to have a president? \n (President) The first usage of the word president to denote the highest official in a government was during the Commonwealth of England ...` |
| **Encoded Output** | `no` |If you wanna see how this encoding is done on our datasets, check out this [script](encode_datasets.py).
### The datasets/tasks used in the experiments
While the datasets we used are all public, it could be a bit time-confusing to convert them all into text-to-text format. We're releasing the already-proccessed text-to-text datasets based on the encoding used in this work. Files are included in [this Google Cloud bucket](https://console.cloud.google.com/storage/browser/unifiedqa/data). [Here](encode_datasets.py) is the script we used in order to convert each dataset into text-in-text-out format.## Prediction files
Reach out to DanielK if you want them! :)## Released Model Checkpoints
If you intend to create a QA system, you can use our QA-specialized models for your purpose:
### T5 models
- UnifiedQA (T5, small) [gs://unifiedqa/models/small](https://console.cloud.google.com/storage/browser/unifiedqa/models/small)
- UnifiedQA (T5, base) [gs://unifiedqa/models/base](https://console.cloud.google.com/storage/browser/unifiedqa/models/base)
- UnifiedQA (T5, large) [gs://unifiedqa/models/large](https://console.cloud.google.com/storage/browser/unifiedqa/models/large)
- UnifiedQA (T5, 3B) [gs://unifiedqa/models/3B](https://console.cloud.google.com/storage/browser/unifiedqa/models/3B)
- UnifiedQA (T5, 11B) [gs://unifiedqa/models/11B](https://console.cloud.google.com/storage/browser/unifiedqa/models/11B)Note: In the experiments reported in our paper we always used the checkpoint closest to 100k steps (it usually corresponds to checkpoint 1100500)
You can use these in two ways:
- If you don't have any training data, you can use them for [the evaluation](https://github.com/google-research/text-to-text-transfer-transformer#eval).
- If you training data, you can use them as your initial models and [fine-tune on them](https://github.com/google-research/text-to-text-transfer-transformer#fine-tuning).For more details see [the T5 repository](https://github.com/google-research/text-to-text-transfer-transformer).
### BART models
The BART models are downloaded from [this link](https://nlp.cs.washington.edu/ambigqa/models/unifiedQA/unifiedQA-bart.zip) (3.6G).
For detailed instructions on running the code (training/finetuning/testing), please refer to [here](https://github.com/allenai/unifiedqa/tree/master/bart).
The `uncased` models usually gave us better and more robust results.### v2 T5 models
- UnifiedQA v2 (T5, small) [gs://unifiedqa/models_v2/small](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/small)
- UnifiedQA v2 (T5, base) [gs://unifiedqa/models_v2/base](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/base)
- UnifiedQA v2 (T5, large) [gs://unifiedqa/models_v2/large](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/large)
- UnifiedQA v2 (T5, 3B) [gs://unifiedqa/models_v2/3B](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/3B)
- UnifiedQA v2 (T5, 11B) [gs://unifiedqa/models_v2/11B](https://console.cloud.google.com/storage/browser/unifiedqa/models_v2/11B)Note: In the experiments reported in our paper we always used the checkpoint closest to 250k steps.
## FAQ
**I am not getting the expected results.** An common issue with using UnifiedQA is making sure you use the separator (`\n`) when encoding encoding your inputs. See [the earlier section](#feeding-data-into-unifiedqa) where we delineate how to encode the inputs.**Help! I am getting the following error!** See [this discussion](https://github.com/google-research/text-to-text-transfer-transformer/issues/180) if you're getting the following error:
```bash
ValueError: Configurable 'make_layer_stack' doesn't have a parameter named 'use_universal_transformer'.
In file "gs://danielk-files/t5-models/union_mixture/11B/operative_config.gin", line 83
```## How to cite
If you extend or use this work, please cite the relevant papers:
```bibtex
@inproceedings{2020unifiedqa,
title={UnifiedQA: Crossing Format Boundaries With a Single QA System},
author={D. Khashabi and S. Min and T. Khot and A. Sabhwaral and O. Tafjord and P. Clark and H. Hajishirzi},
journal={EMNLP - findings},
year={2020}
}
@article{khashabi2022unifiedqa,
title={UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training},
author={Khashabi, Daniel and Kordi, Yeganeh and Hajishirzi, Hannaneh},
journal={arXiv preprint arXiv:2202.12359},
year={2022}
}
```