An open API service indexing awesome lists of open source software.

https://github.com/brendancsmith/vcf-isec

A simple python implementation of Variant Call Format intersection and complements for identifying genetic mutations
https://github.com/brendancsmith/vcf-isec

bioinformatics genome variant-call-format variant-calling

Last synced: 3 months ago
JSON representation

A simple python implementation of Variant Call Format intersection and complements for identifying genetic mutations

Awesome Lists containing this project

README

        

# vcf-isec

A simple python implementation of Variant Call Format intersection and complements.

## Background

Bioinformaticians store variants identified by next generation sequencing in a VCF file. The VCF specification was originally maintained by the 1000 Genomes Project, and the torch has since been passed to the Global Alliance for Genomics and Health Data Working group file format team.

Specifications for VCF v4.1 can be found [here](http://samtools.github.io/hts-specs/VCFv4.1.pdf).

Essentially, a variant is represented as a separate line in the VCF, where the chromosome, position, reference base(s), and alternate base(s) identified at that position are found in columns 1, 2, 4, and 5, resp. Additional information pertaining to the variant is listed in the remaining fields of the VCF.

## Task

A common task for bioinformaticians is to compare variants, whether to compare VCF files generated by different analytical pipelines or to simply compare variants between related individuals.

This script takes as input two VCFs and performs a comparison of the variants found in each file. The script outputs 3 VCFs, reflecting those variants that are shared and unique to each individual.

**NOTE**: An example VCF is provided at `tests/resources/sample.vcf`. VCFs can grow up to 4 million variants in size, as in the case of whole genome sequencing.