Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mikelove/igvf_spdi_demo
https://github.com/mikelove/igvf_spdi_demo
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mikelove/igvf_spdi_demo
- Owner: mikelove
- Created: 2023-06-16T08:59:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-10T14:01:38.000Z (4 months ago)
- Last Synced: 2024-07-10T16:15:41.678Z (4 months ago)
- Language: R
- Size: 33.2 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Demo of using SPDI tools for IGVF variant lists
Here I provide a demo for some of the NCBI tools for adding SPDI
unique IDs to variant lists. Briefly the benefits of SPDI:* resolves indel ambiguity
* validates, e.g. incorrect reference allele specification
* human readable, unique ID with broad usage across consortia
(dbSNP, ClinVar, etc.), NCBI support including API and toolkitsSee the following for a tutorial on NCBI tools:
## Additions:
* I have added an example script `make_spdi_list.R` for converting from
arbitrary variant specification (1-based) to SPDI input (0-based) for
the NCBI Variant Services.* I have added an option `SPDI` to the NCBI's original script
`spdi_batch.py` that will convert from 0-based input:```
chrom:position:ref:alt
```...to a unique SPDI (0-based and validated).
## IGVF use case:
Suppose we have a variant list that looks like "1_25253604_hg38_G_A"
(1-based positions, separated by underscore), in the file
`variants.txt`.Note that this is easily customizable with arguments within
`make_spdi_list.R`.```
> Rscript make_spdi_list.R variants.txt
> head -100 spdi_for_batch_processing.txt > spdi_100.txt
> python spdi_batch.py -i spdi_100.txt -t SPDINC_000001.11:25253603:G:A NC_000001.11:25253603:G:A
NC_000001.11:25336579:C:G NC_000001.11:25336579:C:G
NC_000001.11:25341419:G:A NC_000001.11:25341419:G:A
NC_000001.11:25341834:C:T NC_000001.11:25341834:C:T
NC_000001.11:25342222:T:C NC_000001.11:25342222:T:C
NC_000001.11:25348293:C:T NC_000001.11:25348293:C:T
...
```**What does `warnings` mean?** This typically means that you have
mis-specified the reference allele of hg38.You can check here (don't forget that the above positions are 0-based
while the genome browser is 1-based):## Example with HGVS
```
> python spdi_batch.py -i test.txt -t HGVSNC_000021.9:g.25716261G>A NC_000021.9:25716260:G:A
ERROR: status code = 400
NC_000021.9:g.25716536_25716537insAT NC_000021.9:25716536::AT
NC_000021.9:g.25716557del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716558del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716559del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716560del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716557dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716558dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716559dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716560dup NC_000021.9:25716556:TTTT:TTTTT
```