An open API service indexing awesome lists of open source software.

https://github.com/deib-geco/cov2k_data_collector


https://github.com/deib-geco/cov2k_data_collector

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

          

# cov2k_data_collector

This repository contains a set of scripts to generate a knowledge base of notions and their relations about sars-cov-2, including:
- variants (sources: Pangolin, World Health Organization, Covariants, Public Health England)
- variant characterizations in terms of amino acid changes and nucleotide mutations (sources: Covariants, Public Health England)
- reported effects of variants, amino acid chnages or groups of changes (sources: COG-UK Mutation Explorer, Scientific articles and online resources...)
- literature resources
- nucleotide annotations and translated regions (source: NCBI)
- chemical properties of single amino acids and amino acid changes (source NCBI)

The entities populating the knowledge base are collected from several data sources, cleaned, transformed and uniformed to allow the execution of queries over the entire set of processed data.

An instance of the data set generated by the integration pipeline is presented at http://gmql.eu/cov2k/api/redoc.

The paper "CoV2K model, a comprehensive representation of SARS-CoV-2 knowledge and data interplay" is currently under submission.

The content of this repository is relased under the Apache 2.0 license.