https://github.com/multimeric/improvar
A python script for generating sample VCF data based on a template VCF
https://github.com/multimeric/improvar
bioinformatics variant-calling vcf
Last synced: 12 months ago
JSON representation
A python script for generating sample VCF data based on a template VCF
- Host: GitHub
- URL: https://github.com/multimeric/improvar
- Owner: multimeric
- License: gpl-3.0
- Created: 2018-04-17T05:45:47.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-01T03:50:01.000Z (about 8 years ago)
- Last Synced: 2025-04-02T17:51:30.073Z (about 1 year ago)
- Topics: bioinformatics, variant-calling, vcf
- Language: Python
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Improvar
`improvar` generates fake VCF files using another VCF as a template. This is useful for generating
test data in bioinformatics situations. In particular, `improvar` generates values for all the fields
listed in the header, even if your template didn't, allowing you to test your analysis pipeline with
more completely.
## Installation
Install `improvar` using:
```bash
pip install 'git+https://github.com/TMiguelT/Improvar#egg=improvar'
```
Note that `improvar` will only work on Python 3.6 and above
## Usage
`improvar` installs a command-line utility called `improvar`. Its usage is as follows:
```
usage: improvar [-h] [--num-variants NUM_VARIANTS]
[--gt-opts {GenotypeOption.HOM_REF,GenotypeOption.HOM_ALT,GenotypeOption.HET}]
[--include-contig INCLUDE_CONTIG]
[--exclude-contig EXCLUDE_CONTIG]
template_vcf
Generates a fake VCF based on another VCF's header
positional arguments:
template_vcf The VCF to base the generated data off
optional arguments:
-h, --help show this help message and exit
--num-variants NUM_VARIANTS, -n NUM_VARIANTS
Number of variants to print
--gt-opts {GenotypeOption.HOM_REF,GenotypeOption.HOM_ALT,GenotypeOption.HET}
Constraints to apply when generating genotype. Leave
empty to generate entirely random genotypes. Use "het"
to generate only heterozygotes (e.g. 0|1), use "hom-
ref" to generate only homozygous referencegenotpyes
(e.g. 0|0), and use "hom-var" to generate only
homozygous variant genotypes (e.g. 1|1)
--include-contig INCLUDE_CONTIG
Only output contigs whose name matches this regex
pattern
--exclude-contig EXCLUDE_CONTIG
Do not output contigs whose name matches this regex
pattern
```
`improvar` prints the generated VCF to stdout, so you can pipe the results of this program to a file
or to other VCF processing tools