Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dhimmel/entrez-gene
Processing the human Entrez Gene subset
https://github.com/dhimmel/entrez-gene
entrez-gene genes hetionet human rephetio terminology
Last synced: 29 days ago
JSON representation
Processing the human Entrez Gene subset
- Host: GitHub
- URL: https://github.com/dhimmel/entrez-gene
- Owner: dhimmel
- License: cc0-1.0
- Created: 2015-04-20T17:36:15.000Z (over 9 years ago)
- Default Branch: gh-pages
- Last Pushed: 2016-02-20T02:48:14.000Z (over 8 years ago)
- Last Synced: 2024-05-02T06:00:20.478Z (6 months ago)
- Topics: entrez-gene, genes, hetionet, human, rephetio, terminology
- Language: Jupyter Notebook
- Homepage: https://doi.org/10.15363/thinklab.d34
- Size: 10.1 MB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Creating user-friendly Entrez Gene datasets for humans
[![DOI: 10.5281/zenodo.45524](https://zenodo.org/badge/doi/10.5281/zenodo.45524.svg)](http://dx.doi.org/10.5281/zenodo.45524 "Zenodo deposition of this repository")
Entrez Gene is the NCBI database of gene-specific information. It provides "tracked, unique identifiers for genes" and reports "information associated with those identifiers for unrestricted public use [[source](https://doi.org/10.1093/nar/gki031 "Entrez Gene Publication: gene-centered information at NCBI")]." We [use Entrez Gene](https://doi.org/10.15363/thinklab.d34 "Thinklab discussion: Using Entrez Gene as our gene vocabulary") as the primary gene vocabulary for our drug repuposing research.
This repository creates user-friendly datasets from Entrez Gene. We currently focus on human genes only.
The python notebook [`process.ipynb`](process.ipynb) executes the analysis. Files downloaded from external locations are stored in [`download`](download). The following created datasets reside in [`data`](data):
+ [`genes-human.tsv`](data/genes-human.tsv): human genes with a select set of fields storing additional attributes
+ [`symbols-human.tsv`](data/symbols-human.tsv): a table of GeneID, symbol, and symbol type (synonym or primary)
+ [`symbols-human.json`](data/symbols-human.json): a Symbol–GeneID mapping of primary symbols only
+ [`synonyms-human.json`](data/synonyms-human.json): a Symbol–GeneIDs mapping for synonyms
+ [`symbol-map.json`](data/symbol-map.json): a Symbol–GeneID mapping with approved symbols and unambiguous synonyms
+ [`xrefs-human.tsv`](data/xrefs-human.tsv): mappings to external resources