{"id":13754358,"url":"https://github.com/zjunlp/openue","last_synced_at":"2025-06-13T23:04:53.238Z","repository":{"id":50462092,"uuid":"271465993","full_name":"zjunlp/OpenUE","owner":"zjunlp","description":"[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text","archived":false,"fork":false,"pushed_at":"2022-09-19T11:57:20.000Z","size":82678,"stargazers_count":325,"open_issues_count":0,"forks_count":59,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-06-10T15:46:16.926Z","etag":null,"topics":["bert","event-extraction","intent-classification","named-entity-recognition","natural-language-processing","nlp","nlp-extraction-tasks","openue","pytorch","relation-extraction","slot-filling","triple-extraction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-11T06:09:45.000Z","updated_at":"2025-04-30T09:23:00.000Z","dependencies_parsed_at":"2022-08-12T21:21:37.406Z","dependency_job_id":null,"html_url":"https://github.com/zjunlp/OpenUE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/OpenUE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOpenUE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOpenUE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOpenUE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOpenUE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/OpenUE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FOpenUE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732772,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","event-extraction","intent-classification","named-entity-recognition","natural-language-processing","nlp","nlp-extraction-tasks","openue","pytorch","relation-extraction","slot-filling","triple-extraction"],"created_at":"2024-08-03T09:01:56.656Z","updated_at":"2025-06-13T23:04:53.221Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"[**中文说明**](https://github.com/zjunlp/OpenUE/blob/main/README.md) | [**English**](https://github.com/zjunlp/OpenUE/blob/main/README_EN.md)\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/zjunlp/openue\"\u003e \u003cimg src=\"https://github.com/zjunlp/OpenUE/blob/main/imgs/logo.png\" width=\"400\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cstrong\u003e OpenUE is a lightweight toolkit for knowledge graph extraction. \n    \u003c/strong\u003e\n\u003c/p\u003e\n    \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://badge.fury.io/py/openue\"\u003e\n        \u003cimg src=\"https://badge.fury.io/py/openue.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/zjunlp/OpenUE/blob/main/LICENSE\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/zjunlp/openue.svg?color=green\"\u003e\n    \u003c/a\u003e\n        \u003ca href=\"http://openue.zjukg.org\"\u003e\n        \u003cimg alt=\"Documentation\" src=\"https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red\u0026down_message=offline\u0026up_message=online\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n[OpenUE](https://aclanthology.org/2020.emnlp-demos.1/) 是一个轻量级知识图谱抽取工具。\n\n**特点**\n\n\n  - 基于预训练语言模型的知识图谱抽取任务 (兼容BERT, Roberta等预训练模型.)\n    - 实体关系抽取\n    - 事件抽取\n    - 槽位和意图抽取\n    - \u003cem\u003e 更多的任务 \u003c/em\u003e\n  - 训练和测试接口\n  - 快速部署NLP模型\n\n## 环境\n\n  - python3.8\n  - requirements.txt\n\n\n## 框架图\n\n![框架](./imgs/overview1.png)\n\n其中主要分为三个模块，`models`,`lit_models`和`data`模块。\n\n### models 模块\n\n其存放了我们主要的三个模型，针对整句的关系识别模型，针对已知句中关系的命名实体识别模型，还有将前两者整合起来的推理验证模型。其主要源自`transformers`库中的已定义好的预训练模型。\n\n### lit_models 模块\n\n其中的代码主要继承自`pytorch_lightning.Trainer`。其可以自动构建单卡，多卡，GPU,TPU等不同硬件下的模型训练。我们在其中定义了`training_steps`和`validation_step`即可自动构建训练逻辑进行训练。\n\n由于其硬件不敏感，所以我们可以使用多种不同环境下调用OpenUE训练模块。\n\n### data 模块\n\n`data`中存放了针对不同数据集进行不同操作的代码。使用了`transformers`库中的`tokenizer`先对数据进行分词处理再根据不同需要将数据变成我们需要的features。\n\n## 快速开始\n\n### 安装\n\n#### Anaconda 环境\n\n```\nconda create -n openue python=3.8\nconda activate openue\npip install -r requirements.txt\nconda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia # 视自己Nvidia驱动环境选择对应的cudatoolkit版本\npython setup.py install\n```\n\n#### pip安装\n\n```shell\npip install openue\n```\n\n#### pip本地开发\n\n```shell\npython setup.py develop\n```\n\n#### 使用方式\n\n数据格式为`json`文件，具体例子如下。\n\n```json\n{\n\t\"text\": \"查尔斯·阿兰基斯（Charles Aránguiz），1989年4月17日出生于智利圣地亚哥，智利职业足球运动员，司职中场，效力于德国足球甲级联赛勒沃库森足球俱乐部\",\n\t\"spo_list\": [{\n\t\t\"predicate\": \"出生地\",\n\t\t\"object_type\": \"地点\",\n\t\t\"subject_type\": \"人物\",\n\t\t\"object\": \"圣地亚哥\",\n\t\t\"subject\": \"查尔斯·阿兰基斯\"\n\t}, {\n\t\t\"predicate\": \"出生日期\",\n\t\t\"object_type\": \"Date\",\n\t\t\"subject_type\": \"人物\",\n\t\t\"object\": \"1989年4月17日\",\n\t\t\"subject\": \"查尔斯·阿兰基斯\"\n\t}]\n}\n```\n\n### 训练模型\n\n将数据存放在`./dataset/`目录下之后进行训练。如目录为空，运行以下脚本，将自动下载数据集和预训练模型并开始训练，过程中请保持网络畅通以免模型和数据下载失败。\n\n```shell\n# 训练NER命名实体识别模块\n./scripts/run_ner.sh\n# 训练SEQ句中关系分类模块\n./scripts/run_seq.sh\n```\n\n下面使用一个小demo简要展示训练过程，其中仅训练一个batch来加速展示。\n![框架](./imgs/demo.gif)\n\n### 验证模型\n\n由于我们使用pipeline模型，所以无法联合训练，需要分别训练后进行统一验证。 在运行了两个训练脚本后，在`output`路径下会得到两个模型权重`output/ner/${dataset}`以及`output/seq/${dataset}`根据不同数据集放在对应的目录中。将模型权重目录分别作为`ner_model_name_or_path`和`seq_model_name_or_path`输入到 `run_infer.yaml`或者是`run_infer.sh`运行脚本中，即可进行验证。\n\n### Notebook快速开始\n\n[ske数据集训练notebook](https://github.com/zjunlp/OpenUE/blob/pytorch/ske.ipynb)\n使用中文数据集作为例子具体介绍了如何使用openue中的`lit_models`,`models`和`data`。方便用户构建自己的训练逻辑。\n\n[![Colab 打开](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1VNhFYcqDbXl1b3HzU8sc-NgbhV2ZyYzW?usp=sharing)\n 使用colab云端环境，无需配置环境。\n\n\u003c!-- ![image](https://user-images.githubusercontent.com/31753427/140022588-c3b38495-89b1-4f3c-8298-bcc1086f78bf.png) --\u003e\n\n### 支持自动调参（wandb）\n\n```python\n# 在代码中将logger 部分替换成wandb logger即可支持wandb\nlogger = pl.loggers.WandbLogger(project=\"openue\")\n```\n\n### 支持英文 \n\n针对英文数据集，唯一需要改变的参数为`model_name_or_path`即预训练语言模型的权重参数，由于`transformers`库强大的兼容性，所以针对英文只需要将原先的中文预训练语言模型`bert-base-chinese`改为英文的预训练语言模型`bert-base-uncased`即可运行。\n\n## 快速部署模型\n\n### 下载torchserve-docker\n\n[docker下载](https://github.com/pytorch/serve/blob/master/docker/README.md)\n\n### 创建模型对应的handler类\n\n我们已经在`deploy`文件夹下放置了对应的部署类`handler_seq.py`和`handler_ner.py`。\n\n```shell\n# 使用torch-model-archiver 将模型文件进行打包，其中\n# extra-files需要加入以下文件 \n# config.json, setup_config.json 针对模型和推理的配置config。 \n# vocab.txt : 分词器tokenizer所使用的字典\n# model.py : 模型具体代码\n\ntorch-model-archiver --model-name BERTForNER_en  \\\n\t--version 1.0 --serialized-file ./ner_en/pytorch_model.bin \\\n\t--handler ./deploy/handler.py \\\n\t--extra-files \"./ner_en/config.json,./ner_en/setup_config.json,./ner_en/vocab.txt,./deploy/model.py\" -f\n\n# 将打包好的.mar文件加入到model-store文件夹下，并使用curl命令将打包的文件部署到docker中。\nsudo cp ./BERTForSEQ_en.mar /home/model-server/model-store/\ncurl -v -X POST \"http://localhost:3001/models?initial_workers=1\u0026synchronous=false\u0026url=BERTForSEQ_en.mar\u0026batch_size=1\u0026max_batch_delay=200\"\n```\n## 项目成员\n\n浙江大学：[张宁豫](https://person.zju.edu.cn/ningyu)、谢辛、毕祯、王泽元、陈想、余海阳、邓淑敏、叶宏彬、田玺、郑国轴、陈华钧\n\n达摩院：陈漠沙、谭传奇、黄非\n\n\u003cbr\u003e\n\n## 引用\n\n如果您使用或扩展我们的工作，请引用以下文章：\n\n```\n@inproceedings{DBLP:conf/emnlp/ZhangDBYYCHZC20,\n  author    = {Ningyu Zhang and\n               Shumin Deng and\n               Zhen Bi and\n               Haiyang Yu and\n               Jiacheng Yang and\n               Mosha Chen and\n               Fei Huang and\n               Wei Zhang and\n               Huajun Chen},\n  editor    = {Qun Liu and\n               David Schlangen},\n  title     = {OpenUE: An Open Toolkit of Universal Extraction from Text},\n  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural\n               Language Processing: System Demonstrations, {EMNLP} 2020 - Demos,\n               Online, November 16-20, 2020},\n  pages     = {1--8},\n  publisher = {Association for Computational Linguistics},\n  year      = {2020},\n  url       = {https://doi.org/10.18653/v1/2020.emnlp-demos.1},\n  doi       = {10.18653/v1/2020.emnlp-demos.1},\n  timestamp = {Wed, 08 Sep 2021 16:17:48 +0200},\n  biburl    = {https://dblp.org/rec/conf/emnlp/ZhangDBYYCHZC20.bib},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n# 其他开源知识抽取工具\n\n- [CogIE](https://github.com/jinzhuoran/CogIE)\n- [OpenNRE](https://github.com/thunlp/OpenNRE)\n- [OmniEvent](https://github.com/THU-KEG/OmniEvent)\n- [DeepKE](https://github.com/zjunlp/deepke)\n- [OpenIE](https://stanfordnlp.github.io/CoreNLP/openie.html)\n- [RESIN](https://github.com/RESIN-KAIROS/RESIN-pipeline-public)\n","funding_links":[],"categories":["关系抽取、信息抽取"],"sub_categories":["其他_文本生成、文本对话"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fopenue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fopenue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fopenue/lists"}