{"id":13535246,"url":"https://github.com/sakuranew/BERT-AttributeExtraction","last_synced_at":"2025-04-02T00:33:09.057Z","repository":{"id":44007294,"uuid":"161633979","full_name":"sakuranew/BERT-AttributeExtraction","owner":"sakuranew","description":"USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction.                                                使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。","archived":false,"fork":false,"pushed_at":"2019-04-01T13:12:48.000Z","size":5616,"stargazers_count":261,"open_issues_count":1,"forks_count":65,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-08-02T08:10:03.499Z","etag":null,"topics":["ai","attribute-extraction","bert","deeplearning","feature-extraction","fine-tuning","knowledge-graph","nlp","relation-extraction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sakuranew.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-12-13T12:11:05.000Z","updated_at":"2024-06-25T13:15:47.000Z","dependencies_parsed_at":"2022-07-09T14:46:19.450Z","dependency_job_id":null,"html_url":"https://github.com/sakuranew/BERT-AttributeExtraction","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sakuranew%2FBERT-AttributeExtraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sakuranew%2FBERT-AttributeExtraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sakuranew%2FBERT-AttributeExtraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sakuranew%2FBERT-AttributeExtraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sakuranew","download_url":"https://codeload.github.com/sakuranew/BERT-AttributeExtraction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222788514,"owners_count":17037777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","attribute-extraction","bert","deeplearning","feature-extraction","fine-tuning","knowledge-graph","nlp","relation-extraction"],"created_at":"2024-08-01T08:00:51.947Z","updated_at":"2024-11-02T23:31:13.522Z","avatar_url":"https://github.com/sakuranew.png","language":"Python","funding_links":[],"categories":["Tasks","BERT  Knowledge Graph Task :"],"sub_categories":["Knowledge Graph"],"readme":"\n# BERT-Attribute-Extraction\n##  基于bert的知识图谱属性抽取\nUSING BERT FOR Attribute Extraction in KnowledgeGraph with two method,fine-tuning and feature extraction.\n \n知识图谱百度百科人物词条属性抽取，使用基于bert的微调fine-tuning和特征提取feature-extraction方法进行实验。\n\n\n### Prerequisites\n\n\n```\nTensorflow \u003e=1.10\nscikit-learn\n```\n### Pre-trained models\n **[`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)**:\n    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M\n    parameters\n    \n### Installing\n\nNone\n## Dataset\n\nThe dataset is constructed according to Baidu Encyclopedia character entries.\nFilter out corpus that does not contain entities and attributes.\n\nEntities and attributes are  obtained from name entity recognition.\n\nLabels are  obtained from the  Baidu Encyclopedia infobox, and most of them are labeled manually,so some are not very good.  \nFor example:\n    \n    黄维#1904年#1#黄维（1904年-1989年），字悟我，出生于江西贵溪一农户家庭。        \n    陈昂#山东省滕州市#1#邀请担任诗词嘉宾。1992年1月26日，陈昂出生于山东省滕州市一个普通的知识分子家庭，其祖父、父亲都\n    陈伟庆#肇庆市鼎湖区#0#长。任免信息2016年10月21日下午，肇庆市鼎湖区八届人大一次会议胜利闭幕。陈伟庆当选区人民政府副区长。\n## Getting Started\n\n* run `strip.py` can get striped data\n* run `data_process.py` can process data to get numpy file input\n* `parameters` file is the parameters that run model need\n\n## Running the tests\n\nFor example with birthplace dataset：\n    \n* fine-tuning\n    * run `run_classifier.py` to get predicted probability outputs\n    ```shell\n    python run_classifier.py \\\n            --task_name=my \\\n            --do_train=true \\\n            --do_predict=true \\\n            --data_dir=a \\\n            --vocab_file=/home/tiny/zhaomeng/bertmodel/vocab.txt \\\n            --bert_config_file=/home/tiny/zhaomeng/bertmodel/bert_config.json \\\n            --init_checkpoint=/home/tiny/zhaomeng/bertmodel/bert_model.ckpt \\\n            --max_seq_length=80 \\\n            --train_batch_size=32 \\\n            --learning_rate=2e-5 \\\n            --num_train_epochs=1.0 \\\n            --output_dir=./output\n    ```    \n    * then run `proba2metrics.py` to get final result with wrong classification\n\n* feature-extraction\n    * run `extract_features.py` to get the vector representation of train and test data in json file format\n    ```shell\n    python extract_features.py \\\n            --input_file=../data/birth_place_train.txt \\\n            --output_file=../data/birth_place_train.jsonl \\\n            --vocab_file=/home/tiny/zhaomeng/bertmodel/vocab.txt \\\n            --bert_config_file=/home/tiny/zhaomeng/bertmodel/bert_config.json \\\n            --init_checkpoint=/home/tiny/zhaomeng/bertmodel/bert_model.ckpt \\\n            --layers=-1 \\\n            --max_seq_length=80 \\\n            --batch_size=16\n    ```    \n    * then run `json2vector.py` to transfer json file to vector representation\n    * finally run `run_classifier.py` to use machine learning methods to do classification,MLP usually peforms best \n\n## Result\nThe predicted results and misclassified corpus are saved in result dir.\n* For example with birthplace dataset using fine-tuning method,the result is:    \n\n                  precision    recall  f1-score   support\n\n           0      0.963     0.967     0.965       573\n           1      0.951     0.946     0.948       389\n## Authors\n\n* **zhao meng** \n## License\n\nThis project is licensed under the MIT License \n\n## Acknowledgments\n\n* etc\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsakuranew%2FBERT-AttributeExtraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsakuranew%2FBERT-AttributeExtraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsakuranew%2FBERT-AttributeExtraction/lists"}