https://github.com/cmdcolin/protein_data_service
https://github.com/cmdcolin/protein_data_service
Last synced: 25 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/cmdcolin/protein_data_service
- Owner: cmdcolin
- Created: 2019-02-23T23:42:33.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-01T02:25:25.000Z (over 7 years ago)
- Last Synced: 2025-01-15T08:44:07.355Z (over 1 year ago)
- Language: JavaScript
- Size: 51.9 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# protein-data-service
Collects some data from different resources about genes/proteins
## Install
yarn
## Run
node server.js
## Usage
http://localhost:2999/?ensemblGeneId=ENSG00000000003&ensemblTranscriptId=ENST00000373020
* ensemblGeneId: required
* ensemblTranscriptId: optional, is a filter, cannot be specified without a gene id
The reason a transcript ID cannot be specified by itself is that the variation biomart only filters on gene ID
## Data sources
Accesses protein domains and COSMIC variants from BioMart
The count level information is parsed from CosmicCodingMuts.vcf from release v87, 13th November 2018. It is pre-processed and loaded into sqlite
Currently accessing Ensembl 95 from biomart
## Sample
Returns JSON object of format
{
variants: [],
domains: [],
protein: []
}
A sample element of the variant array is
{
"uniqueId": "COSM1736737",
"start": 236,
"end": 237,
"seq_id": "TSPAN6",
"score": 1
}
A sample element of the domain array is
{
"uniqueId": "IPR000301_8_245",
"start": 8,
"end": 245,
"seq_id": "TSPAN6",
"type": "Tetraspanin"
}
A sample sequence is
{
"name": "TSPAN6",
"sequences": {
"aminoAcid": "MASPSRRLQTKPVITCFKSVLLIYTFIFWITGVILLAVGIWGKVSLENYFSLLNEKATNVPFVLIATGTVIILLGTFGCFATCRASAWMLKLYAMFLTLVFLVELVAAIVGFVFRHEIKNSFKNNYEKALKQYNSTGDYRSHAVDKIQNTLHCCGVTDYRDWTDTNYYSEKGFPKSCCKLEDCTPQRDADKVNNEGCFIKVMTIIESEMGVVAGISFGVACFQLIGIFLAYCLSRAITNNQYEIV",
"translatedDna": "ATGGCGTCCCCGTCTCGGAGACTGCAGACTAAACCAGTCATTACTTGTTTCAAGAGCGTTCTGCTAATCTACACTTTTATTTTCTGGATCACTGGCGTTATCCTTCTTGCAGTTGGCATTTGGGGCAAGGTGAGCCTGGAGAATTACTTTTCTCTTTTAAATGAGAAGGCCACCAATGTCCCCTTCGTGCTCATTGCTACTGGTACCGTCATTATTCTTTTGGGCACCTTTGGTTGTTTTGCTACCTGCCGAGCTTCTGCATGGATGCTAAAACTGTATGCAATGTTTCTGACTCTCGTTTTTTTGGTCGAACTGGTCGCTGCCATCGTAGGATTTGTTTTCAGACATGAGATTAAGAACAGCTTTAAGAATAATTATGAGAAGGCTTTGAAGCAGTATAACTCTACAGGAGATTATAGAAGCCATGCAGTAGACAAGATCCAAAATACGTTGCATTGTTGTGGTGTCACCGATTATAGAGATTGGACAGATACTAATTATTACTCAGAAAAAGGATTTCCTAAGAGTTGCTGTAAACTTGAAGATTGTACTCCACAGAGAGATGCAGACAAAGTAAACAATGAAGGTTGTTTTATAAAGGTGATGACCATTATAGAGTCAGAAATGGGAGTCGTTGCAGGAATTTCCTTTGGAGTTGCTTGCTTCCAACTGATTGGAATCTTTCTCGCCTACTGCCTCTCTCGTGCCATAACAAATAACCAGTATGAGATAGTG"
}
}