{"id":24034104,"url":"https://github.com/heraclex12/nlp2sparql","last_synced_at":"2025-04-19T14:44:31.010Z","repository":{"id":45153642,"uuid":"307594344","full_name":"heraclex12/NLP2SPARQL","owner":"heraclex12","description":"Translate Natural Language Processing to SPARQL Query and vice versa","archived":false,"fork":false,"pushed_at":"2023-06-01T17:35:24.000Z","size":228,"stargazers_count":50,"open_issues_count":4,"forks_count":12,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T08:51:07.853Z","etag":null,"topics":["bert2bert","knowledge-base","machine-translation","pretrained-language-model","question-answering","sparql","spbert"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heraclex12.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-27T05:27:25.000Z","updated_at":"2025-03-07T18:16:25.000Z","dependencies_parsed_at":"2022-07-13T14:44:09.890Z","dependency_job_id":null,"html_url":"https://github.com/heraclex12/NLP2SPARQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heraclex12%2FNLP2SPARQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heraclex12%2FNLP2SPARQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heraclex12%2FNLP2SPARQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heraclex12%2FNLP2SPARQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heraclex12","download_url":"https://codeload.github.com/heraclex12/NLP2SPARQL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249716863,"owners_count":21315068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert2bert","knowledge-base","machine-translation","pretrained-language-model","question-answering","sparql","spbert"],"created_at":"2025-01-08T18:57:32.860Z","updated_at":"2025-04-19T14:44:30.990Z","avatar_url":"https://github.com/heraclex12.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003c!--\n*** Thanks for checking out this README Template. If you have a suggestion that would\n*** make this better, please fork the repo and create a pull request or simply open\n*** an issue with the tag \"enhancement\".\n*** Thanks again! Now go create something AMAZING! :D\n--\u003e\n\n\n\n\n\n\u003c!-- PROJECT SHIELDS --\u003e\n\u003c!--\n*** I'm using markdown \"reference style\" links for readability.\n*** Reference links are enclosed in brackets [ ] instead of parentheses ( ).\n*** See the bottom of this document for the declaration of the reference variables\n*** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use.\n*** https://www.markdownguide.org/basic-syntax/#reference-style-links\n--\u003e\n\u003c!-- PROJECT LOGO --\u003e\n\u003cbr /\u003e\n\u003ch1\u003eSPBERT: A Pre-trained Model for SPARQL Query Language \u003c/h1\u003e\n    \u003cbr /\u003e\n\n\u003c!-- ABOUT THE PROJECT --\u003e\nIn this project, we provide the code for reproducing the experiments in [our paper](https://arxiv.org/abs/2106.09997). SPBERT is a BERT-based language model pre-trained on massive SPARQL query logs. SPBERT can learn general-purpose representations in both natural language and SPARQL query language and make the most of the sequential order of words that are crucial for structured language like SPARQL.\n\n### Prerequisites\n\nTo reproduce the experiment of our model, please install the requirements.txt according to the following instructions:\n* transformers==4.5.1\n* pytorch==1.8.1\n* python 3.7.10\n```sh\n$ pip install -r requirements.txt\n```\n\n### Pre-trained models\nWe release three versions of pre-trained weights. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google, and training details are described in our paper. You can download all versions from the table below:\n| Pre-training objective | Model | Steps | Link |\n|---|:---:|:---:|:---:|\n| MLM  | SPBERT (scratch) | 200k | 🤗 [razent/spbert-mlm-zero](https://huggingface.co/razent/spbert-mlm-zero) |\n| MLM  | SPBERT (BERT-initialized) | 200k | 🤗 [razent/spbert-mlm-base](https://huggingface.co/razent/spbert-mlm-base) |\n| MLM+WSO  | SPBERT (BERT-initialized) | 200k | 🤗 [razent/spbert-mlm-wso-base](https://huggingface.co/razent/spbert-mlm-wso-base) |\n\n### Datasets\nAll evaluation datasets can download [here](https://drive.google.com/drive/folders/1m_pJ0prUDpCWAFuxlvp_S48hGG_AASjb?usp=sharing).\n\n### Example\nTo fine-tune models:\n```bash\npython run.py \\\n        --do_train \\\n        --do_eval \\\n        --model_type bert \\\n        --model_architecture bert2bert \\\n        --encoder_model_name_or_path bert-base-cased \\\n        --decoder_model_name_or_path sparql-mlm-zero \\\n        --source en \\\n        --target sparql \\\n        --train_filename ./LCQUAD/train \\\n        --dev_filename ./LCQUAD/dev \\\n        --output_dir ./ \\\n        --max_source_length 64 \\\n        --weight_decay 0.01 \\\n        --max_target_length 128 \\\n        --beam_size 10 \\\n        --train_batch_size 32 \\\n        --eval_batch_size 32 \\\n        --learning_rate 5e-5 \\\n        --save_inverval 10 \\\n        --num_train_epochs 150\n```\n\nTo evaluate models:\n```bash\npython run.py \\\n        --do_test \\\n        --model_type bert \\\n        --model_architecture bert2bert \\\n        --encoder_model_name_or_path bert-base-cased \\\n        --decoder_model_name_or_path sparql-mlm-zero \\\n        --source en \\\n        --target sparql \\\n        --load_model_path ./checkpoint-best-bleu/pytorch_model.bin \\\n        --dev_filename ./LCQUAD/dev \\\n        --test_filename ./LCQUAD/test \\\n        --output_dir ./ \\\n        --max_source_length 64 \\\n        --max_target_length 128 \\\n        --beam_size 10 \\\n        --eval_batch_size 32 \\\n```\n\n\u003c!-- CONTACT --\u003e\n## Contact\nEmail: [heraclex12@gmail.com](mailto:heraclex12@gmail.com) - Hieu Tran\n\n\n## Citation\n```\n@inproceedings{Tran2021SPBERTAE,\n  title={SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs},\n  author={Hieu Tran and Long Phan and James T. Anibal and Binh Thanh Nguyen and Truong-Son Nguyen},\n  booktitle={ICONIP},\n  year={2021}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheraclex12%2Fnlp2sparql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheraclex12%2Fnlp2sparql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheraclex12%2Fnlp2sparql/lists"}