https://github.com/obss/turkish-question-generation
Automated question generation and question answering from Turkish texts using text-to-text transformers
https://github.com/obss/turkish-question-generation
arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad
Last synced: 17 days ago
JSON representation
Automated question generation and question answering from Turkish texts using text-to-text transformers
- Host: GitHub
- URL: https://github.com/obss/turkish-question-generation
- Owner: obss
- License: mit
- Created: 2021-11-10T15:21:49.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-08-12T15:41:20.000Z (about 3 years ago)
- Last Synced: 2025-09-18T05:57:16.635Z (25 days ago)
- Topics: arxiv, mt5, multilingual, neptune-ai, nlp, question-answering, question-generation, t5, transformers, turkish, wandb, xquad
- Language: Python
- Homepage: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
- Size: 39.1 KB
- Stars: 47
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
Turkish Question Generation
Offical source code for"Automated question generation & question answering from Turkish texts"
citation
If you use this software in your work, please cite as:
```
@article{akyon2022questgen,
author = {Akyon, Fatih Cagatay and Cavusoglu, Ali Devrim Ekin and Cengiz, Cemil and Altinuc, Sinan Onur and Temizel, Alptekin},
doi = {10.3906/elk-1300-0632.3914},
journal = {Turkish Journal of Electrical Engineering and Computer Sciences},
title = {{Automated question generation and question answering from Turkish texts}},
url = {https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/},
year = {2022}
}
```
install
```bash
git clone https://github.com/obss/turkish-question-generation.git
cd turkish-question-generation
pip install -r requirements.txt
```train
- start a training using args:
```bash
python run.py --model_name_or_path google/mt5-small --output_dir runs/exp1 --do_train --do_eval --tokenizer_name_or_path mt5_qg_tokenizer --per_device_train_batch_size 4 --gradient_accumulation_steps 2 --learning_rate 1e-4 --seed 42 --save_total_limit 1
```- download [json config](configs/default/config.json) file and start a training:
```bash
python run.py config.json
```- downlaod [yaml config](configs/default/config.yaml) file and start a training:
```bash
python run.py config.yaml
```evaluate
- arrange related params in config:
```yaml
do_train: false
do_eval: true
eval_dataset_list: ["tquad2-valid", "xquad.tr"]
prepare_data: true
mt5_task_list: ["qa", "qg", "ans_ext"]
mt5_qg_format: "both"
no_cuda: false
```- start an evaluation:
```bash
python run.py config.yaml
```neptune
- install neptune:
```bash
pip install neptune-client
```- download [config](configs/default/config.yaml) file and arrange neptune params:
```yaml
run_name: 'exp1'
neptune_project: 'name/project'
neptune_api_token: 'YOUR_API_TOKEN'
```- start a training:
```bash
python train.py config.yaml
```wandb
- install wandb:
```bash
pip install wandb
```- download [config](configs/default/config.yaml) file and arrange wandb params:
```yaml
run_name: 'exp1'
wandb_project: 'turque'
```- start a training:
```bash
python train.py config.yaml
```finetuned checkpoints
[model_url1]: https://drive.google.com/uc?id=10hHFuavHCofDczGSzsH1xPHgTgAocOl1
[model_url2]: https://huggingface.co/google/mt5-small
[model_url3]: https://huggingface.co/google/mt5-base
[model_url4]: https://drive.google.com/uc?id=1Cnovcib1I276GmJVOGa33jySIwOthIa7
[model_url5]: 'https://drive.google.com/uc?id=1hVhR5hQHcIVKj5pPgvYkcl1WWDDHpOFL'
[model_url6]: https://drive.google.com/uc?id=1JG14mynmu-b3Dy2UDJr4AyJQyuW-uabh
[model_url7]: https://drive.google.com/uc?id=10hHFuavHCofDczGSzsH1xPHgTgAocOl1
[model_url8]: 'https://drive.google.com/uc?id=1W8PXgx6VDaThDdLNkL-HVWb1MNcQdxwp'
[data_url1]: https://github.com/obss/turkish-question-generation/releases/download/0.0.1/tquad_train_data_v2.json
[data_url2]: https://github.com/obss/turkish-question-generation/releases/download/0.0.1/tquad_dev_data_v2.json
[data_url3]: https://github.com/deepmind/xquad/blob/master/xquad.tr.json|name |model |training
data |trained
tasks |model size
(GB) |
|--- |--- |--- |--- |--- |
|[mt5-small-3task-highlight-tquad2][model_url4] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-prepend-tquad2][model_url6] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-highlight-combined3][model_url7] |[mt5-small][model_url2] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |1.2GB |
|[mt5-base-3task-highlight-tquad2][model_url5] |[mt5-base][model_url3] |[tquad2-train][data_url1] |QA,QG,AnsExt |2.3GB |
|[mt5-base-3task-highlight-combined3][model_url8] |[mt5-base][model_url3] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |2.3GB |format
- answer extraction:
input:
```
" Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```target:
```
1258 Söğüt’te
```- question answering:
input:
```
"question: Osman Bey nerede doğmuştur? context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```target:
```
"Söğüt’te"
```- question generation (prepend):
input:
```
"answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```target:
```
"Osman Bey nerede doğmuştur?"
```- question generation (highlight):
input:
```
"generate question: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```target:
```
"Osman Bey nerede doğmuştur?"
```- question generation (both):
input:
```
"answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi."
```target:
```
"Osman Bey nerede doğmuştur?"
```paper results
BERTurk-base and mT5-base QA evaluation results for TQuADv2 fine-tuning.
![]()
mT5-base QG evaluation results for single-task (ST) and multi-task (MT) for TQuADv2 fine-tuning.
![]()
TQuADv1 and TQuADv2 fine-tuning QG evaluation results for multi-task mT5 variants. MT-Both means, mT5 model is fine-tuned with ’Both’ input format and in a multi-task setting.
![]()
paper configs
You can find the config files used in the paper under [configs/paper](configs/paper).
contributing
Before opening a PR:
- Install required development packages:
```bash
pip install "black==21.7b0" "flake8==3.9.2" "isort==5.9.2"
```- Reformat with black and isort:
```bash
black . --config pyproject.toml
isort .
```