An open API service indexing awesome lists of open source software.

https://github.com/obss/turkish-question-generation

Automated question generation and question answering from Turkish texts using text-to-text transformers
https://github.com/obss/turkish-question-generation

arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad

Last synced: 17 days ago
JSON representation

Automated question generation and question answering from Turkish texts using text-to-text transformers

Awesome Lists containing this project

README

          



Turkish Question Generation


Offical source code for

"Automated question generation & question answering from Turkish texts"


citation


If you use this software in your work, please cite as:

```
@article{akyon2022questgen,
author = {Akyon, Fatih Cagatay and Cavusoglu, Ali Devrim Ekin and Cengiz, Cemil and Altinuc, Sinan Onur and Temizel, Alptekin},
doi = {10.3906/elk-1300-0632.3914},
journal = {Turkish Journal of Electrical Engineering and Computer Sciences},
title = {{Automated question generation and question answering from Turkish texts}},
url = {https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/},
year = {2022}
}
```

install

```bash
git clone https://github.com/obss/turkish-question-generation.git
cd turkish-question-generation
pip install -r requirements.txt
```

train

- start a training using args:

```bash
python run.py --model_name_or_path google/mt5-small --output_dir runs/exp1 --do_train --do_eval --tokenizer_name_or_path mt5_qg_tokenizer --per_device_train_batch_size 4 --gradient_accumulation_steps 2 --learning_rate 1e-4 --seed 42 --save_total_limit 1
```

- download [json config](configs/default/config.json) file and start a training:

```bash
python run.py config.json
```

- downlaod [yaml config](configs/default/config.yaml) file and start a training:

```bash
python run.py config.yaml
```

evaluate

- arrange related params in config:

```yaml
do_train: false
do_eval: true
eval_dataset_list: ["tquad2-valid", "xquad.tr"]
prepare_data: true
mt5_task_list: ["qa", "qg", "ans_ext"]
mt5_qg_format: "both"
no_cuda: false
```

- start an evaluation:

```bash
python run.py config.yaml
```

neptune

- install neptune:

```bash
pip install neptune-client
```

- download [config](configs/default/config.yaml) file and arrange neptune params:

```yaml
run_name: 'exp1'
neptune_project: 'name/project'
neptune_api_token: 'YOUR_API_TOKEN'
```

- start a training:

```bash
python train.py config.yaml
```

wandb

- install wandb:

```bash
pip install wandb
```

- download [config](configs/default/config.yaml) file and arrange wandb params:

```yaml
run_name: 'exp1'
wandb_project: 'turque'
```

- start a training:

```bash
python train.py config.yaml
```

finetuned checkpoints

[model_url1]: https://drive.google.com/uc?id=10hHFuavHCofDczGSzsH1xPHgTgAocOl1
[model_url2]: https://huggingface.co/google/mt5-small
[model_url3]: https://huggingface.co/google/mt5-base
[model_url4]: https://drive.google.com/uc?id=1Cnovcib1I276GmJVOGa33jySIwOthIa7
[model_url5]: 'https://drive.google.com/uc?id=1hVhR5hQHcIVKj5pPgvYkcl1WWDDHpOFL'
[model_url6]: https://drive.google.com/uc?id=1JG14mynmu-b3Dy2UDJr4AyJQyuW-uabh
[model_url7]: https://drive.google.com/uc?id=10hHFuavHCofDczGSzsH1xPHgTgAocOl1
[model_url8]: 'https://drive.google.com/uc?id=1W8PXgx6VDaThDdLNkL-HVWb1MNcQdxwp'
[data_url1]: https://github.com/obss/turkish-question-generation/releases/download/0.0.1/tquad_train_data_v2.json
[data_url2]: https://github.com/obss/turkish-question-generation/releases/download/0.0.1/tquad_dev_data_v2.json
[data_url3]: https://github.com/deepmind/xquad/blob/master/xquad.tr.json

|name |model |training
data |trained
tasks |model size
(GB) |
|--- |--- |--- |--- |--- |
|[mt5-small-3task-highlight-tquad2][model_url4] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-prepend-tquad2][model_url6] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-highlight-combined3][model_url7] |[mt5-small][model_url2] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |1.2GB |
|[mt5-base-3task-highlight-tquad2][model_url5] |[mt5-base][model_url3] |[tquad2-train][data_url1] |QA,QG,AnsExt |2.3GB |
|[mt5-base-3task-highlight-combined3][model_url8] |[mt5-base][model_url3] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |2.3GB |

format

- answer extraction:

input:
```
" Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```

target:
```
1258 Söğüt’te
```

- question answering:

input:
```
"question: Osman Bey nerede doğmuştur? context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```

target:
```
"Söğüt’te"
```

- question generation (prepend):

input:
```
"answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```

target:
```
"Osman Bey nerede doğmuştur?"
```

- question generation (highlight):

input:
```
"generate question: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```

target:
```
"Osman Bey nerede doğmuştur?"
```

- question generation (both):

input:
```
"answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```

target:
```
"Osman Bey nerede doğmuştur?"
```

paper results


BERTurk-base and mT5-base QA evaluation results for TQuADv2 fine-tuning.



mT5-base QG evaluation results for single-task (ST) and multi-task (MT) for TQuADv2 fine-tuning.



TQuADv1 and TQuADv2 fine-tuning QG evaluation results for multi-task mT5 variants. MT-Both means, mT5 model is fine-tuned with ’Both’ input format and in a multi-task setting.


paper configs

You can find the config files used in the paper under [configs/paper](configs/paper).

contributing

Before opening a PR:

- Install required development packages:

```bash
pip install "black==21.7b0" "flake8==3.9.2" "isort==5.9.2"
```

- Reformat with black and isort:

```bash
black . --config pyproject.toml
isort .
```