Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tom-tan/vcf-combiner
https://github.com/tom-tan/vcf-combiner
Last synced: 17 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/tom-tan/vcf-combiner
- Owner: tom-tan
- License: apache-2.0
- Created: 2021-06-09T03:06:25.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-27T00:15:48.000Z (about 1 year ago)
- Last Synced: 2024-10-16T13:34:27.174Z (2 months ago)
- Language: Python
- Size: 30.3 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# vcf-combiner
## Requirement
- SingularityCE or Apptainer
- You can configure it via the `RUNTIME` environment variable
- It must work with `fakeroot` to build a SIF container
- jq for `make status` (optional)
- Each worker node must have a pair of public and private keys in `~/.ssh` with standard file names such as `id_ecdsa` and `id_ecdsa.pub`.## How to use this repository for joint calling
- Building a SIF image
```console
$ make build
```
- It builds a SIF image `spark.sif` with spark runtime and hail.- Make the `workers` file that enumerates the worker nodes.
- See `workers.sample` for details.
- The rest of the instructions uses `master-node` as a host name of the master node.- Start a spark cluster
```console
master-node $ make start
```
- It starts daemons with the `$RUNTIME instance` command and starts the culster with `start-all.sh`.
- It generates the `log` directory for worker logs and the `work` directory.
- You can use `$EXTRA_ARGS` to pass extra arguments to `$RUNTIME`. For example:
```console
master-node $ make start EXTRA_ARGS="-B /path/to/extra/dir:/path/to/extra/dir"
```
- You can see the status of the cluster in Web UI via port 8080.- Run joint calling
```console
master-node $ $RUNTIME exec instance://spark $PWD/combiner.py samples.txt --master spark://master-node:7077 --output-base=~/hail-logs/n105-d_16-e_32 --driver-memory=16g --executor-memory=32g
```
- Run `$RUNTIME exec instance://spark $PWD/combiner.py -h` for other options.
- You can run the joint calling in other nodes with the following cammand: `$RUNTIME exec spark.sif $PWD/combiner.py`. Note that make sure the required directories (e.g., a directory that stores VCF files) are specified with `-B` option.- Stop the cluster
```console
master-node $ make stop
```## Appendix
- `make help` shows other subcommands to operate the cluster