{"id":13651139,"url":"https://github.com/songjiang0909/awesome-knowledge-graph-construction","last_synced_at":"2025-04-22T22:30:32.292Z","repository":{"id":59839622,"uuid":"230822240","full_name":"songjiang0909/awesome-knowledge-graph-construction","owner":"songjiang0909","description":null,"archived":false,"fork":false,"pushed_at":"2021-11-17T05:51:11.000Z","size":72,"stargazers_count":122,"open_issues_count":0,"forks_count":14,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-22T10:11:34.543Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/songjiang0909.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-30T00:51:47.000Z","updated_at":"2025-03-30T12:44:56.000Z","dependencies_parsed_at":"2022-09-22T22:22:21.700Z","dependency_job_id":null,"html_url":"https://github.com/songjiang0909/awesome-knowledge-graph-construction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/songjiang0909%2Fawesome-knowledge-graph-construction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/songjiang0909%2Fawesome-knowledge-graph-construction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/songjiang0909%2Fawesome-knowledge-graph-construction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/songjiang0909%2Fawesome-knowledge-graph-construction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/songjiang0909","download_url":"https://codeload.github.com/songjiang0909/awesome-knowledge-graph-construction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250333857,"owners_count":21413470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T02:00:45.637Z","updated_at":"2025-04-22T22:30:32.047Z","avatar_url":"https://github.com/songjiang0909.png","language":null,"funding_links":[],"categories":["Knowledge Graph","知识图谱"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# Awesome Knowledge Graph Construction [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)\n\nA collection of knowledge graph construction resources. [Last update: Jan 2020]\n\n## Contents\n* [Research Trends and Surveys](#research-trends-and-surveys)\n* [Papers](#papers)\n\t* [Curated Approaches](#curated-approaches)\n\t* [Collaborative Approaches](#collaborative-approaches)\n\t* [Automated Semi-structured Approaches](#automated-semi-structured-approaches)\n\t* [Automated Unstructured Approaches](#automated-unstructured-approaches)\n\t\t* [Schema-based Approaches](#schema-based-approaches)\n\t\t* [Open Information Extraction](#open-information-extraction)\n* [Lectures](#lectures)\n\t* [Tutorials](#tutorials)\n\t* [Videos and Slides](#videos-and-slides)\n* [Datasets](#datasets)\n* [Systems and Tools](#systems-and-tools)\n\n\n## Research Trends and Surveys\n\n* From Information to Knowledge: Harvesting Entities and Relationships from Web Sources (Weikum et al, 2010) [[paper]](https://people.mpi-inf.mpg.de/~weikum/pods2010-weikum\u0026theobald.pdf)\n* Advances in Automated Knowledge Base Construction (Suchanek et al, 2012) [[paper]](https://pdfs.semanticscholar.org/709e/64be9cc9eb7c8b29bf49237cd2df835efd24.pdf)\n* TAC-Knowledge Base Population challenge (Ji et al) [[2019]](https://blender.cs.illinois.edu/paper/ji2019kbp.pdf) [[2017]](http://nlp.cs.rpi.edu/paper/kbp2017.pdf) [[2016]](http://nlp.cs.rpi.edu/paper/kbp2016.pdf) [[2015]](https://pdfs.semanticscholar.org/955a/78a8a5e4e31d10ffc827f365bd4c4f30d563.pdf)\n* A Survey on Open Information Extraction (Niklaus el al 2018) [[paper]](https://www.aclweb.org/anthology/C18-1326.pdf)\n\n## Papers\n\n### Curated Approaches \n\nTriples are collected by domain experts.\n\n* CYC: A Large-scale Investment in Knowledge Infrastructure [[paper]](https://www.cc.gatech.edu/~isbell/classes/reading/papers/lenat95cyc.pdf)\n\t* Brief introduction: A universal schema of roughly 105 general concepts spanning human reality. \n\t* Authors: Douglas B. Lenat\n\t* Venue: Communications of the ACM, 1995\n* WordNet: A Lexical Database for English [[paper]](http://l2r.cs.uiuc.edu/Teaching/CS598-05/Papers/miller95.pdf)\n\t* Brief introduction: WordNet is an online lexical database under program control.\n\t* Authors: GA Miller (Princeton University)\n\t* Venue: Communications of the ACM, 1995\n* The Unified Medical Language System (UMLS): integrating biomedical terminology [[paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308795/)\n\t* Brief introduction: A biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 900000 concepts, as well as 12 million relations among these concepts.\n\t* Authors: Olivier Bodenreider (Lister Hill National Center for Biomedical Communications)\n\t* Venue: Nucleic acids research, 2004\n\n### Collaborative Approaches\n\nTriples are collected by volunteers.\n\n* Wikidata: a free collaborative knowledgebase [[paper]](http://ws.nju.edu.cn/courses/ke/reading/3_wikidata.pdf)\n\t* Wikidata is a collaborative knowledge base, collecting structured data to provide support for Wikipedia, Wikimedia Commons.\n\t* Authors: DENNY VRANDECˇIC´ and  MARKUS KRÖTZSCH \n\t* Venue: Communications of the ACM, 2014\n* Freebase: a collaboratively created graph database for structuring human knowledge [[paper]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.538.7139\u0026rep=rep1\u0026type=pdf)\n\t* Brief introduction: Freebase is a  tuple knowledge base used to structure general human knowledge, which is collaboratively created, structured, and maintained.\n\t* Authors: Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor (Metaweb Technologies, Inc)\n\t* Venue: SIGMOD'08\n\n### Automated Semi-structured Approaches\n\nTriples are collected from the semi-structured data source via some rule based methods.\n\n* YAGO: A Core of Semantic Knowledge [[paper]](https://www2007.org/papers/paper391.pdf)\n\t* Brief introduction: Triples are automatically extracted from Wikipedia and unified with WordNet, using a combination of rule-based and heuristic methods.\n\t* Authors: Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck-Institut)\n\t* Venue: WWW'07\n* YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia [[paper]](https://www.sciencedirect.com/science/article/pii/S0004370212000719)\n\t* Brief introduction: An extension of the YAGO knowledge base, in which triples are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet.\n\t* Authors: Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich and Gerhard Weikum (Max-Planck-Institut)\n\t* Venue: Artificial Intelligence, 2013\n* DBpedia: A Nucleus for a Web of Open Data [[paper]](https://www.cis.upenn.edu/~zives/research/dbpedia.pdf)\n\t* Brief introduction: Extract triples from Wikipedia encyclopedia based on a templated pattern matching method.\n\t* Authors: S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (University of Pennsylvania \u0026 Universit¨at Leipzig)\n\t* Venue: The Semantic Web'07\n* CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web [[paper]](http://www.vldb.org/pvldb/vol11/p1084-lockard.pdf)\n\t* Brief introduction: Propose an automatic knowledge extraction framework that improves the distant supervision assumption for triples extraction.\n\t* Authors: Colin Lockard, Xin Luna Dong, Arash Einolghozati and Arash Einolghozati\n\t* Venue: VLDB'18\n\n### Automated Unstructured Approaches\n\nTriples are extracted from unstructured data via data-driven techniques\n\n#### Schema-based Approaches\n\n* NELL: Toward an Architecture for Never-Ending Language Learning [[paper]](http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf)\n\t* Brief introduction: Continuously extract extract new knowledge from the Web through self-learning on a small number of samples.\n\t* Authors: Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell (CMU)\n\t* Venue: AAAI'10\n* PROSPERA: Scalable knowledge harvesting with high precision and high recall [[paper]](http://www.nakashole.com/papers/2011-wsdm-prospera.pdf)\n\t* Brief introduction: Reconcile precision, recall and scalability by extended n-gram patten matching.\n\t* Authors: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum (Max Planck Institute)\n\t* Venue: WSDM'11\n* DeepDive/Elementary: Large-scale knowledge-base construction via machine learning and statistical inference [[paper]](http://infolab.stanford.edu/hazy/papers/elementary_journal.pdf)\n\t* Brief introductions: Propose a Markov logic-based  model and architecture for knowledge base construction (KBC) by integrating different kinds of data resources and KBC techniques.\n\t* Authors: Feng Niu, Ce Zhang, Christopher Ré, and Jude Shavlik (University of Wisconsin-Madison, Stanford University)\n\t* Venue: IJSWIS'12\n* Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion [[paper]](https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf)\n\t* Brief introduction:  Build Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content with prior knowledge derived from existing knowledge repositories based on distant supervision method.\n\t* Authors: Xin Luna Dong et al (Google)\n\t* Venue: KDD'14\n* Sealing Pipeline Leaks and Understanding Chinese [[paper]](https://www.cs.princeton.edu/~danqic/papers/tac2016.pdf)\n\t* Brief introudction: Propose a combinational system consists of several ruled-based relation extractors and a distantly supervised extractor.\n\t* Authors: Yuhao Zhang, Arun Chaganty, Ashwin Paranjape, Danqi Chen, Jason Bolton, Peng Qi, Christopher D. Manning (Stanford University)\n\t* Venue: TAC'16\n* CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases [[paper]](https://arxiv.org/pdf/1610.08763.pdf)\n\t* Brief introduction: Joint extraction of typed entities and relations with labeled data obtained from knowledge bases with distant supervision.\n\t* Authors: Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han (UIUC \u0026 Army Research Laboratory)\n\t* Venue: WWW'17\n* Discovering Implicit Knowledge with Unary Relations [[paper]](https://www.aclweb.org/anthology/P18-1147.pdf)\n\t* Brief introduction: Extract the implicit relation in text through coverting binary relations to unary relations.\n\t* Authors: Michael Glass, Alfio Gliozzo (IBM Research)\n\t* Venue: ACL'18\n\n\n\n#### Open Information Extraction\n\n* Open Information Extraction from the Web [[paper]](https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf)\n\t* Brief introduction: First paper for open information extraction with a rule based method.\n\t* Authors: Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni (University of Washington)\n\t* Venue: AAAI'07\n* Identifying relations for open information extraction [[paper]](https://www.aclweb.org/anthology/D11-1142.pdf)\n\t* Brief introduction: Introduce syntactic and lexical constraints on binary relations expressed by verbs to reduce the uninformative and incoherent extractions.\n\t* Authors: Anthony Fader, Stephen Soderland, and Oren Etzioni (University of Washington)\n\t* Venue: EMNLP'11\n* Open Language Learning for Information Extraction [[paper]](https://homes.cs.washington.edu/~mausam/papers/emnlp12a.pdf)\n\t* Brief introduction: An extention of OpenIE by adding noun, adjectives mediated relation, as well as taking context into consideration.\n\t* Authors: Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni (University of Washington)\n\t* Venue: EMNLP'12\n* Neural Open Information Extraction [[paper]](https://arxiv.org/pdf/1805.04270.pdf)\n\t* Brief introduction: Propose a neural encoder-decoder OpenIE framework. The model is trained with highly confident binary extractions bootstrapped from a state-of-the-art Open IE system, therefore can generate highquality tuples without any hand-crafted patterns.\n\t* Authors: Lei Cui, Furu Wei, and Ming Zhou (MSRA)\n\t* Veune: ACL'18\n* COMET: Commonsense Transformers for Automatic Knowledge Graph Construction [[paper]](https://arxiv.org/pdf/1906.05317.pdf)\n\t* Brief introduction: Commonsense knowledge graph construction by using existing tuples as a seed set of knowledge for training. Using this seed set, a pre-trained language model (ELMO) learns to adapt its learned representations to knowledge generation, and produces novel tuples.\n\t* Authors: Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz and Yejin Choi (University of Washington)\n\t* Venue: ACL'19\n\n\n## Lectures\n\n### Tutorials\n* Mining Knowledge Graphs from Text. [[link]](https://kgtutorial.github.io/)\n\t* Jay Pujara (USC), Sameer Singh (UCI)\n\t* WSDM'18 \n* Constructing Domain-specific Knowledge Graphs. [[link]](https://usc-isi-i2.github.io/AAAI18Tutorial/)\n\t* Craig Knoblock (USC), Pedro Szekely (USC), Mayank Kejriwal (USC)\n\t* AAAI'18\n\n### Videos and Slides\n* [Stanford University: CS124](https://web.stanford.edu/class/cs124/), Dan Jurafsky\n\t* (Video) [Week 5: Relation Extraction and Question](https://www.youtube.com/watch?v=5SUzf6252_0\u0026list=PLaZQkZp6WhWyszpcteV4LFgJ8lQJ5WIxK\u0026ab_channel=FromLanguagestoInformation)\n* [Washington University: CSE517](https://courses.cs.washington.edu/courses/cse517/), Luke Zettlemoyer\n\t* (Slide) [Relation Extraction 1](https://courses.cs.washington.edu/courses/cse517/13wi/slides/cse517wi13-RelationExtraction.pdf)\n\t* (Slide) [Relation Extraction 2](https://courses.cs.washington.edu/courses/cse517/13wi/slides/cse517wi13-RelationExtractionII.pdf)\n* [New York University: CSCI-GA.2590](https://cs.nyu.edu/courses/spring17/CSCI-GA.2590-001/), Ralph Grishman\n\t* (Slide) [Relation Extraction: Rule-based Approaches](https://cs.nyu.edu/courses/spring17/CSCI-GA.2590-001/DependencyPaths.pdf)\n* [Michigan University: Coursera](https://ai.umich.edu/portfolio/natural-language-processing/), Dragomir R. Radev\n\t* (Video) [Lecture 48: Relation Extraction](https://www.youtube.com/watch?v=TbrlRei_0h8\u0026ab_channel=ArtificialIntelligence-AllinOne)\n\n\n\n\n## Datasets\n* New York Times (NYT) Corpus [[paper]](http://www.riedelcastro.org//publications/papers/riedel10modeling.pdf) [[download]](https://catalog.ldc.upenn.edu/LDC2008T19)\n\t* This dataset was generated by aligning *Freebase* relations with the NYT corpus, with sentences from the years 2005-2006 used as the training corpus and sentences from 2007 used as the testing corpus.\n* FewRel: Few-Shot Relation Classification Dataset [[paper]](https://arxiv.org/abs/1810.10147) [[Website]](http://zhuhao.me/fewrel)\n\t* This dataset is a supervised few-shot relation classification dataset. The corpus is Wikipedia and the knowledge base used to annotate the corpus is Wikidata.\n* TupleInf Open IE Dataset [[Website]](http://data.allenai.org/tuple-ie/)\n\t* The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in \"Answering Complex Questions Using Open Information Extraction\".\n\n\n\n## Systems and Tools\n* DeepDive (Christopher Ré el al, Stanford University) [[paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361060/pdf/nihms826683.pdf) [[System]](http://deepdive.stanford.edu/kbc)\n* Open Information Extraction (Stanford University NLP) [[System]](https://nlp.stanford.edu/software/openie.html)\n\n## References\n* This repo is built based on [Sargur N. Srihari's slides](https://cedar.buffalo.edu/~srihari/CSE674/Chap22/22.1%20Knowledge%20Graphs.pdf). Many thanks!\n\n\n[Back to Top](#contents)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsongjiang0909%2Fawesome-knowledge-graph-construction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsongjiang0909%2Fawesome-knowledge-graph-construction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsongjiang0909%2Fawesome-knowledge-graph-construction/lists"}