https://github.com/deib-geco/cov2k_data_collector
https://github.com/deib-geco/cov2k_data_collector
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deib-geco/cov2k_data_collector
- Owner: DEIB-GECO
- License: apache-2.0
- Created: 2021-11-09T16:20:27.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2022-04-08T16:00:44.000Z (over 3 years ago)
- Last Synced: 2025-03-18T07:12:33.983Z (7 months ago)
- Language: Python
- Size: 1.15 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cov2k_data_collector
This repository contains a set of scripts to generate a knowledge base of notions and their relations about sars-cov-2, including:
- variants (sources: Pangolin, World Health Organization, Covariants, Public Health England)
- variant characterizations in terms of amino acid changes and nucleotide mutations (sources: Covariants, Public Health England)
- reported effects of variants, amino acid chnages or groups of changes (sources: COG-UK Mutation Explorer, Scientific articles and online resources...)
- literature resources
- nucleotide annotations and translated regions (source: NCBI)
- chemical properties of single amino acids and amino acid changes (source NCBI)The entities populating the knowledge base are collected from several data sources, cleaned, transformed and uniformed to allow the execution of queries over the entire set of processed data.
An instance of the data set generated by the integration pipeline is presented at http://gmql.eu/cov2k/api/redoc.
The paper "CoV2K model, a comprehensive representation of SARS-CoV-2 knowledge and data interplay" is currently under submission.
The content of this repository is relased under the Apache 2.0 license.