An open API service indexing awesome lists of open source software.

https://github.com/gyorilab/outbreak_kg

Knowledge graphs for disease outbreak scenarios
https://github.com/gyorilab/outbreak_kg

Last synced: 24 days ago
JSON representation

Knowledge graphs for disease outbreak scenarios

Awesome Lists containing this project

README

        

Knowledge graph and ontology for disease outbreak scenarios
===========================================================

This project implements a knowledge graph framework for representing
disease outbreak scenarios. The knowledge graph is built by
processing disease outbreak alerts from ProMED and other sources and
combines this with ontological information to create a structured
representation of outbreak events.

Sources
-------

The KG builds on the following sources:
- Outbreak alerts: text from outbreak alerts are processed and
terms representing diseases/phenotypes/pathogens/symptoms/geolocations
are automatically extracted using the Gilda system. This produces `mentioned_in` relationships.
- Individual alerts are grouped by the inferred outbreak that they belong to, represented
as a `has_outbreak` relationship.
- Taxonomy of diseases: extracted from the Medical Subject Headings tree structure
- Taxonomy of pathogens: extracted from the Medical Subject Headings tree structure
- Taxonomy of geolocations: extracted from the Medical Subject Headings tree structure
- Additional taxonomy of geolocations: extracted from the Geonames dataset with prefix `geonames`
- Geolocation relations: geolocations are linked together to represent hierarchical inclusion
using the `isa` relationship where the subsumed region is the source of the relationship
and the containing region is the target.
- Pathogen-disease relations: relationships representing the fact that a pathogen causes a disease
are represented as `has_pathogen` relationships.
- Disease-phenotype/symptom relations: relationships representing the fact that a disease
causes a phenotype/symptom are represented as `has_phenotype` relationships.
- Development/health indicators: extracted from WDI data, geolocations are
linked to indicators with the `has_indicator` relationship.

Knowledge graph
---------------

The knowledge graph is represented as a set of nodes and edges. Nodes
represent entities such as diseases, pathogens, geolocations, and
phenotypes. Edges represent relationships between these entities.
The knowledge graph is deployed in a Neo4j database.

![Outbreak KG schema](kg/static/outbreak_kg_schema.png)

Interaction
-----------

The knowledge graph can be queried using the Cypher query language
directly through Neo4j. The repository also provides a Python client
and a REST API for querying the knowledge graph. The graph database
and the surrounding REST API are Dockerized and deployed on AWS.

Funding
-------

This work was supported by [CAPTRS](https://captrs.org/).