Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mikelove/igvf_spdi_demo


https://github.com/mikelove/igvf_spdi_demo

Last synced: 24 days ago
JSON representation

Awesome Lists containing this project

README

        

# Demo of using SPDI tools for IGVF variant lists

Here I provide a demo for some of the NCBI tools for adding SPDI
unique IDs to variant lists. Briefly the benefits of SPDI:

* resolves indel ambiguity
* validates, e.g. incorrect reference allele specification
* human readable, unique ID with broad usage across consortia
(dbSNP, ClinVar, etc.), NCBI support including API and toolkits

See the following for a tutorial on NCBI tools:

## Additions:

* I have added an example script `make_spdi_list.R` for converting from
arbitrary variant specification (1-based) to SPDI input (0-based) for
the NCBI Variant Services.

* I have added an option `SPDI` to the NCBI's original script
`spdi_batch.py` that will convert from 0-based input:

```
chrom:position:ref:alt
```

...to a unique SPDI (0-based and validated).

## IGVF use case:

Suppose we have a variant list that looks like "1_25253604_hg38_G_A"
(1-based positions, separated by underscore), in the file
`variants.txt`.

Note that this is easily customizable with arguments within
`make_spdi_list.R`.

```
> Rscript make_spdi_list.R variants.txt
> head -100 spdi_for_batch_processing.txt > spdi_100.txt
> python spdi_batch.py -i spdi_100.txt -t SPDI

NC_000001.11:25253603:G:A NC_000001.11:25253603:G:A
NC_000001.11:25336579:C:G NC_000001.11:25336579:C:G
NC_000001.11:25341419:G:A NC_000001.11:25341419:G:A
NC_000001.11:25341834:C:T NC_000001.11:25341834:C:T
NC_000001.11:25342222:T:C NC_000001.11:25342222:T:C
NC_000001.11:25348293:C:T NC_000001.11:25348293:C:T
...
```

**What does `warnings` mean?** This typically means that you have
mis-specified the reference allele of hg38.

You can check here (don't forget that the above positions are 0-based
while the genome browser is 1-based):

## Example with HGVS

```
> python spdi_batch.py -i test.txt -t HGVS

NC_000021.9:g.25716261G>A NC_000021.9:25716260:G:A
ERROR: status code = 400
NC_000021.9:g.25716536_25716537insAT NC_000021.9:25716536::AT
NC_000021.9:g.25716557del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716558del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716559del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716560del NC_000021.9:25716556:TTTT:TTT
NC_000021.9:g.25716557dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716558dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716559dup NC_000021.9:25716556:TTTT:TTTTT
NC_000021.9:g.25716560dup NC_000021.9:25716556:TTTT:TTTTT
```