https://github.com/zilong-li/basevarc
The repo was not under development. Check out angsd toolkit for low depth data analyses.
https://github.com/zilong-li/basevarc
bioinformatics-tool low-coverage-sequencing population-genetics variant-calling
Last synced: 2 months ago
JSON representation
The repo was not under development. Check out angsd toolkit for low depth data analyses.
- Host: GitHub
- URL: https://github.com/zilong-li/basevarc
- Owner: Zilong-Li
- License: gpl-3.0
- Created: 2019-03-30T12:16:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-09-02T05:02:10.000Z (about 5 years ago)
- Last Synced: 2025-06-02T10:47:10.540Z (4 months ago)
- Topics: bioinformatics-tool, low-coverage-sequencing, population-genetics, variant-calling
- Language: C++
- Homepage: https://www.popgen.dk/angsd/index.php/SNP_calling
- Size: 47.1 MB
- Stars: 7
- Watchers: 2
- Forks: 5
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
BaseVarC - SNPs Calling From Low-Pass (<1.0x) WGS Data
===============================================================
**__Current Version: 1.0.0__**BaseVarC was implemented in C++, aiming at speeding up variants calling from large-scale population, and was used in the [CMDB](https://db.cngb.org/cmdb/) project for calling variants from one million samples
## Installation
```
git clone --recursive https://github.com/Zilong-Li/BaseVarC.git
cd BaseVarC
./configure
make
```If everything goes well, you can find BaseVarC program in the `src` directory.
## Command-Line
```
BaseVarC
Contact: Zilong Li [zimusen94@gmail.com]
Usage : BaseVarC [options]Commands:
basetype Variants Caller
popmatrix Create population matrix
concat Concat popmatrix
```### Variants Calling
```
Commands: BaseVarC basetype
Usage : BaseVarC basetype [options]Options :
--input, -i BAM/CRAM file list, one file per row
--output, -o Output file prefix
--reference, -r Reference file
--region, -s Samtools-like region
--group, -g Population group information
--mapq, -q Mapping quality >= INT [10]
--thread, -t Number of threads
--batch, -b Number of samples each batch
--maf, -a Minimum allele count frequency [min(0.001, 100/N, maf)]
--load, Load data only
--rerun, Read previous loaded data and rerun
--keep_tmp, Don't remove tmp files when basetype finished
--verbose, -v Set verbose output
```## Testing
In the tests directory, there is a script which contains a example using test data.
```
cd test/
sh test.sh
```## Note on performance
RAM, run time and I/O all rest squarely on three parameters: `--region`, `--thread` and `--batch`. Depending on your situation, you can customize these parameters for exploiting your HPC servers.
- `--batch` : BaseVarC converts reads from BAM files into an internal temp format. This parameter control how many samples will be bundled as a batch. RAM is linear with this. Larger number means more RAM but less file pointers(I/O).
- `--region`: The longer the genomic region is given, the more RAM is used. Be aware that reading BAM files repeatedly is overhead. So you should split the chromosome into long region as possible as you can.
- `--thread`: The number of threads to use. RAM and I/O are linear with threads. The more threads are given, the faster BaseVarC is.## License
BaseVarC and the code in this repo is available under a GPL3 license. For more information please see the [LICENSE](LICENSE)
## Citation
TBD