https://github.com/zilong-li/basevarc

The repo was not under development. Check out angsd toolkit for low depth data analyses.
https://github.com/zilong-li/basevarc

bioinformatics-tool low-coverage-sequencing population-genetics variant-calling

Last synced: 2 months ago
JSON representation

The repo was not under development. Check out angsd toolkit for low depth data analyses.

Host: GitHub
URL: https://github.com/zilong-li/basevarc
Owner: Zilong-Li
License: gpl-3.0
Created: 2019-03-30T12:16:28.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-09-02T05:02:10.000Z (about 5 years ago)
Last Synced: 2025-06-02T10:47:10.540Z (4 months ago)
Topics: bioinformatics-tool, low-coverage-sequencing, population-genetics, variant-calling
Language: C++
Homepage: https://www.popgen.dk/angsd/index.php/SNP_calling
Size: 47.1 MB
Stars: 7
Watchers: 2
Forks: 5
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          BaseVarC - SNPs Calling From Low-Pass (<1.0x) WGS Data

===============================================================

**__Current Version: 1.0.0__**

BaseVarC was implemented in C++, aiming at speeding up variants calling from large-scale population, and was used in the [CMDB](https://db.cngb.org/cmdb/) project for calling variants from one million samples

## Installation

```

git clone --recursive https://github.com/Zilong-Li/BaseVarC.git

cd BaseVarC

./configure

make 

```

If everything goes well, you can find BaseVarC program in the `src` directory.

## Command-Line

```

BaseVarC

Contact: Zilong Li [zimusen94@gmail.com]

Usage  : BaseVarC  [options]

Commands:

         basetype       Variants Caller

         popmatrix      Create population matrix

         concat         Concat popmatrix

```

### Variants Calling

```

Commands: BaseVarC basetype

Usage   : BaseVarC basetype [options]

Options :

  --input,      -i         BAM/CRAM file list, one file per row

  --output,     -o         Output file prefix

  --reference,  -r         Reference file

  --region,     -s         Samtools-like region 

  --group,      -g         Population group information 

  --mapq,       -q    Mapping quality >= INT [10]

  --thread,     -t    Number of threads

  --batch,      -b    Number of samples each batch

  --maf,        -a  Minimum allele count frequency [min(0.001, 100/N, maf)]

  --load,                  Load data only

  --rerun,                 Read previous loaded data and rerun

  --keep_tmp,              Don't remove tmp files when basetype finished

  --verbose,    -v         Set verbose output

```

## Testing

In the tests directory, there is a script which contains a example using test data.

```

cd test/

sh test.sh

```

## Note on performance

RAM, run time and I/O all rest squarely on three parameters: `--region`, `--thread` and `--batch`. Depending on your situation, you can customize these parameters for exploiting your HPC servers.

- `--batch` : BaseVarC converts reads from BAM files into an internal temp format. This parameter control how many samples will be bundled as a batch. RAM is linear with this. Larger number means more RAM but less file pointers(I/O).

- `--region`: The longer the genomic region is given, the more RAM is used. Be aware that reading BAM files repeatedly is overhead. So you should split the chromosome into long region as possible as you can.

- `--thread`: The number of threads to use. RAM and I/O are linear with threads. The more threads are given, the faster BaseVarC is.

## License

BaseVarC and the code in this repo is available under a GPL3 license. For more information please see the [LICENSE](LICENSE)

## Citation

TBD

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zilong-li/basevarc

Awesome Lists containing this project

README