Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/autoliuweijie/K-BERT
Source code of K-BERT (AAAI2020)
https://github.com/autoliuweijie/K-BERT
aaai2020 bert k-bert nlp
Last synced: about 1 month ago
JSON representation
Source code of K-BERT (AAAI2020)
- Host: GitHub
- URL: https://github.com/autoliuweijie/K-BERT
- Owner: autoliuweijie
- Created: 2019-09-17T05:40:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-27T09:41:49.000Z (almost 2 years ago)
- Last Synced: 2024-08-03T09:07:08.848Z (5 months ago)
- Topics: aaai2020, bert, k-bert, nlp
- Language: Python
- Homepage: https://ojs.aaai.org//index.php/AAAI/article/view/5681
- Size: 11.8 MB
- Stars: 947
- Watchers: 16
- Forks: 213
- Open Issues: 59
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - autoliuweijie/K-BERT
README
# K-BERT
![](https://img.shields.io/badge/license-MIT-000000.svg)Sorce code and datasets for ["K-BERT: Enabling Language Representation with Knowledge Graph"](https://aaai.org/Papers/AAAI/2020GB/AAAI-LiuW.5594.pdf), which is implemented based on the [UER](https://github.com/dbiir/UER-py) framework.
**News**
- [EasyNLP](https://github.com/alibaba/EasyNLP) integrated the K-BERT. For details, see [EasyNLP集成K-BERT算法,借助知识图谱实现更优Finetune](https://zhuanlan.zhihu.com/p/553816104).
## Requirements
Software:
```
Python3
Pytorch >= 1.0
argparse == 1.1
```## Prepare
* Download the ``google_model.bin`` from [here](https://share.weiyun.com/5GuzfVX), and save it to the ``models/`` directory.
* Download the ``CnDbpedia.spo`` from [here](https://share.weiyun.com/5BvtHyO), and save it to the ``brain/kgs/`` directory.
* Optional - Download the datasets for evaluation from [here](https://share.weiyun.com/5Id9PVZ), unzip and place them in the ``datasets/`` directory.The directory tree of K-BERT:
```
K-BERT
├── brain
│ ├── config.py
│ ├── __init__.py
│ ├── kgs
│ │ ├── CnDbpedia.spo
│ │ ├── HowNet.spo
│ │ └── Medical.spo
│ └── knowgraph.py
├── datasets
│ ├── book_review
│ │ ├── dev.tsv
│ │ ├── test.tsv
│ │ └── train.tsv
│ ├── chnsenticorp
│ │ ├── dev.tsv
│ │ ├── test.tsv
│ │ └── train.tsv
│ ...
│
├── models
│ ├── google_config.json
│ ├── google_model.bin
│ └── google_vocab.txt
├── outputs
├── uer
├── README.md
├── requirements.txt
├── run_kbert_cls.py
└── run_kbert_ner.py
```## K-BERT for text classification
### Classification example
Run example on Book review with CnDbpedia:
```sh
CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_cls.py \
--pretrained_model_path ./models/google_model.bin \
--config_path ./models/google_config.json \
--vocab_path ./models/google_vocab.txt \
--train_path ./datasets/book_review/train.tsv \
--dev_path ./datasets/book_review/dev.tsv \
--test_path ./datasets/book_review/test.tsv \
--epochs_num 5 --batch_size 32 --kg_name CnDbpedia \
--output_model_path ./outputs/kbert_bookreview_CnDbpedia.bin \
> ./outputs/kbert_bookreview_CnDbpedia.log &
```Results:
```
Best accuracy in dev : 88.80%
Best accuracy in test: 87.69%
```Options of ``run_kbert_cls.py``:
```
useage: [--pretrained_model_path] - Path to the pre-trained model parameters.
[--config_path] - Path to the model configuration file.
[--vocab_path] - Path to the vocabulary file.
--train_path - Path to the training dataset.
--dev_path - Path to the validating dataset.
--test_path - Path to the testing dataset.
[--epochs_num] - The number of training epoches.
[--batch_size] - Batch size of the training process.
[--kg_name] - The name of knowledge graph, "HowNet", "CnDbpedia" or "Medical".
[--output_model_path] - Path to the output model.
```### Classification benchmarks
Accuracy (dev/test %) on different dataset:
| Dataset | HowNet | CnDbpedia |
| :----- | :----: | :----: |
| Book review | 88.75/87.75 | 88.80/87.69 |
| ChnSentiCorp | 95.00/95.50 | 94.42/95.25 |
| Shopping | 97.01/96.92 | 96.94/96.73 |
| Weibo | 98.22/98.33 | 98.29/98.33 |
| LCQMC | 88.97/87.14 | 88.91/87.20 |
| XNLI | 77.11/77.07 | 76.99/77.43 |## K-BERT for named entity recognization (NER)
### NER example
Run an example on the msra_ner dataset with CnDbpedia:
```
CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_ner.py \
--pretrained_model_path ./models/google_model.bin \
--config_path ./models/google_config.json \
--vocab_path ./models/google_vocab.txt \
--train_path ./datasets/msra_ner/train.tsv \
--dev_path ./datasets/msra_ner/dev.tsv \
--test_path ./datasets/msra_ner/test.tsv \
--epochs_num 5 --batch_size 16 --kg_name CnDbpedia \
--output_model_path ./outputs/kbert_msraner_CnDbpedia.bin \
> ./outputs/kbert_msraner_CnDbpedia.log &
```Results:
```
The best in dev : precision=0.957, recall=0.962, f1=0.960
The best in test: precision=0.953, recall=0.959, f1=0.956
```Options of ``run_kbert_ner.py``:
```
useage: [--pretrained_model_path] - Path to the pre-trained model parameters.
[--config_path] - Path to the model configuration file.
[--vocab_path] - Path to the vocabulary file.
--train_path - Path to the training dataset.
--dev_path - Path to the validating dataset.
--test_path - Path to the testing dataset.
[--epochs_num] - The number of training epoches.
[--batch_size] - Batch size of the training process.
[--kg_name] - The name of knowledge graph.
[--output_model_path] - Path to the output model.
```## K-BERT for domain-specific tasks
Experimental results on domain-specific tasks (Precision/Recall/F1 %):
| KG | Finance_QA | Law_QA | Finance_NER | Medicine_NER |
| :----- | :----: | :----: | :----: | :----: |
| HowNet | 0.805/0.888/0.845 | 0.842/0.903/0.871 | 0.860/0.888/0.874 | 0.935/0.939/0.937 |
| CN-DBpedia | 0.814/0.881/0.846 | 0.814/0.942/0.874 | 0.860/0.887/0.873 | 0.935/0.937/0.936 |
| MedicalKG | -- | -- | -- | 0.944/0.943/0.944 |## Acknowledgement
This work is a joint study with the support of Peking University and Tencent Inc.
If you use this code, please cite this paper:
```
@inproceedings{weijie2019kbert,
title={{K-BERT}: Enabling Language Representation with Knowledge Graph},
author={Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang},
booktitle={Proceedings of AAAI 2020},
year={2020}
}
```