Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jfjlaros/snp
Non-biallelic SNPs for population genetics and forensics.
https://github.com/jfjlaros/snp
dbsnp forensics snp-candidates tri-allelic
Last synced: 3 days ago
JSON representation
Non-biallelic SNPs for population genetics and forensics.
- Host: GitHub
- URL: https://github.com/jfjlaros/snp
- Owner: jfjlaros
- License: mit
- Created: 2015-06-10T11:53:40.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-03-18T19:59:13.000Z (almost 7 years ago)
- Last Synced: 2024-11-13T03:20:54.490Z (2 months ago)
- Topics: dbsnp, forensics, snp-candidates, tri-allelic
- Language: C
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Finding non-biallelic SNPs
SNPs are used as markers for various population genetic purposes. In particular
non-biallelic SNPs are of interest for identification purposes; to have the
same discriminative power, more biallelic than non-biallelic SNPs are required.In this project, we present a tool that uses the
[NCBI](http://www.ncbi.nlm.nih.gov/) public database
[dbSNP](https://www.ncbi.nlm.nih.gov/projects/SNP/) to identify non-biallelic
SNPs.This work was [published](https://www.ncbi.nlm.nih.gov/pubmed/19647708) in
[FSI Genetics](http://www.fsigenetics.com/) in 2009.## Installation
Install the [expat](http://expat.sourceforge.net/) development files:apt-get install libexpat1-dev
Retrieve the source code and compile the program:
git clone https://github.com/jfjlaros/snp.git
cd snp/src
make## Usage
The program requires a dump of the database in XML format. These files are
typically found in the subfolder named `genotype` of any of the builds hosted
on the [download site](https://ftp.ncbi.nih.gov/snp/organisms/) of the NCBI.For a file named `gt_chrXX.xml.gz`, use the following command to find the SNP
candidates:zcat gt_chrXX.xml.gz | ./snp > output.txt
The `treshold` parameter is used to specify the minimum allele frequency (in
percentages). If this option is omitted, the threshold defaults to 0. By
increasing this variable the amount of output can be greatly reduced, setting
it to 1 or higher is recommended.## Related work
Some [notes](dbsnp_allele_frequency.md) on allele frequencies on the X- and
Y chromosomes.Inspired by this research, we looked into the degradation of
[methylated nucleotides](https://github.com/jfjlaros/cpg) for similar purposes.