https://github.com/heraclex12/nlp2sparql
Translate Natural Language Processing to SPARQL Query and vice versa
https://github.com/heraclex12/nlp2sparql
bert2bert knowledge-base machine-translation pretrained-language-model question-answering sparql spbert
Last synced: about 1 year ago
JSON representation
Translate Natural Language Processing to SPARQL Query and vice versa
- Host: GitHub
- URL: https://github.com/heraclex12/nlp2sparql
- Owner: heraclex12
- Created: 2020-10-27T05:27:25.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-06-01T17:35:24.000Z (about 3 years ago)
- Last Synced: 2025-03-29T08:51:07.853Z (about 1 year ago)
- Topics: bert2bert, knowledge-base, machine-translation, pretrained-language-model, question-answering, sparql, spbert
- Language: Python
- Homepage:
- Size: 223 KB
- Stars: 50
- Watchers: 2
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
SPBERT: A Pre-trained Model for SPARQL Query Language
In this project, we provide the code for reproducing the experiments in [our paper](https://arxiv.org/abs/2106.09997). SPBERT is a BERT-based language model pre-trained on massive SPARQL query logs. SPBERT can learn general-purpose representations in both natural language and SPARQL query language and make the most of the sequential order of words that are crucial for structured language like SPARQL.
### Prerequisites
To reproduce the experiment of our model, please install the requirements.txt according to the following instructions:
* transformers==4.5.1
* pytorch==1.8.1
* python 3.7.10
```sh
$ pip install -r requirements.txt
```
### Pre-trained models
We release three versions of pre-trained weights. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google, and training details are described in our paper. You can download all versions from the table below:
| Pre-training objective | Model | Steps | Link |
|---|:---:|:---:|:---:|
| MLM | SPBERT (scratch) | 200k | 🤗 [razent/spbert-mlm-zero](https://huggingface.co/razent/spbert-mlm-zero) |
| MLM | SPBERT (BERT-initialized) | 200k | 🤗 [razent/spbert-mlm-base](https://huggingface.co/razent/spbert-mlm-base) |
| MLM+WSO | SPBERT (BERT-initialized) | 200k | 🤗 [razent/spbert-mlm-wso-base](https://huggingface.co/razent/spbert-mlm-wso-base) |
### Datasets
All evaluation datasets can download [here](https://drive.google.com/drive/folders/1m_pJ0prUDpCWAFuxlvp_S48hGG_AASjb?usp=sharing).
### Example
To fine-tune models:
```bash
python run.py \
--do_train \
--do_eval \
--model_type bert \
--model_architecture bert2bert \
--encoder_model_name_or_path bert-base-cased \
--decoder_model_name_or_path sparql-mlm-zero \
--source en \
--target sparql \
--train_filename ./LCQUAD/train \
--dev_filename ./LCQUAD/dev \
--output_dir ./ \
--max_source_length 64 \
--weight_decay 0.01 \
--max_target_length 128 \
--beam_size 10 \
--train_batch_size 32 \
--eval_batch_size 32 \
--learning_rate 5e-5 \
--save_inverval 10 \
--num_train_epochs 150
```
To evaluate models:
```bash
python run.py \
--do_test \
--model_type bert \
--model_architecture bert2bert \
--encoder_model_name_or_path bert-base-cased \
--decoder_model_name_or_path sparql-mlm-zero \
--source en \
--target sparql \
--load_model_path ./checkpoint-best-bleu/pytorch_model.bin \
--dev_filename ./LCQUAD/dev \
--test_filename ./LCQUAD/test \
--output_dir ./ \
--max_source_length 64 \
--max_target_length 128 \
--beam_size 10 \
--eval_batch_size 32 \
```
## Contact
Email: [heraclex12@gmail.com](mailto:heraclex12@gmail.com) - Hieu Tran
## Citation
```
@inproceedings{Tran2021SPBERTAE,
title={SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs},
author={Hieu Tran and Long Phan and James T. Anibal and Binh Thanh Nguyen and Truong-Son Nguyen},
booktitle={ICONIP},
year={2021}
}
```