https://github.com/chris1221/aavcf
Swap allels to ancestral / derived in a VCF.
https://github.com/chris1221/aavcf
cpp
Last synced: 8 months ago
JSON representation
Swap allels to ancestral / derived in a VCF.
- Host: GitHub
- URL: https://github.com/chris1221/aavcf
- Owner: Chris1221
- License: apache-2.0
- Created: 2018-09-13T18:29:09.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-12-10T14:54:09.000Z (over 7 years ago)
- Last Synced: 2024-12-27T08:16:22.210Z (over 1 year ago)
- Topics: cpp
- Language: C++
- Size: 35.2 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### A tiny tool for recoding alleles in a VCF.
This aims to be an easy way to recode alleles from major/minor to ancestral/derived. You only need:
- A `.vcf(.gz)` (Phased, or unphased, but you must specify this).
- A `.fasta(.gz)` with ancestral states of each base.
Usage is pretty straightforward:
```{sh}
aavcf -v your.vcf(.gz) \
-f your.fasta(.gz) \
-l your.log \
```
I've also provided `vcf`, `vcf_line`, and `fasta` classes in `fasta.cpp/.h` for extensibility. Each of these have a `read`,`write`, and `get_allele` (either for a particular entry or position) methods and are just abstractions of `std::vector` and `std::string` read in directly from data.
#### Installation
*Short version*: Same as any project build on `CMake`.
*Slightly longer version*:
```{sh}
mkdir build; cd build; cmake
make
```
Creating an empty `build` directory allows you to easily delete all the `CMake` build instructions if you so desire without a fresh clone.
#### Usage Flags
- `-v / --vcf`: The path to a `.vcf(.gz)` file in `4.1-4.3` file format.
- Currently only one sample VCFs are acceptable input.
- `-f / --fasta`: The path to a `.fasta(.gz)` file on the same build as the VCF (i.e. the genomic coordinates are the same).
- `-l / --log`: Path to a log file to store all information on actions taken.
- Defaults to `./log.txt`.
- `-h / --help`: Will unsurprisingly show you the help menu.
### Actions
`aavcf` will do one of four things for each variant in the VCF:
1. Do nothing: If the ancestral allele is already the reference, no action is taken.
2. Flip alleles (`Flip` in log): If the ancestral is the alternate, the coding is flipped.
3. Uninformative allele (`UnInfo` in log): The ancestral allele is unknown, so derived state cannot be found. Code all alleles to `0`.
4. Different alleles (`DifAlt` in log): The ancestral allele is neither the reference nor the alternative, so no information is known. Code all alleles to `0`.
#### C++ Class usage
To construct a `fasta`:
```{c++}
fasta new (std::string path_to_fasta);
```
- `new.get_allele(unsigned int pos)` will retrieve an allele at a position.
To construct a `vcf`
```{c++}
vcf new (std::string path_to_vcf, char delim = '\t')
```
- `new.entries` is a `std::vector`, so the `n`th variant is retrieved with `new.entries[i]`.
- Each entry in the VCF for each variant is mutable and can be accessed by its standard name: `new.entries[i].id = "rs1"`.
- To write the VCF out, including meta data, simply call the `new.write(std::string path)` method.