https://github.com/vccri/caps
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/vccri/caps
- Owner: VCCRI
- License: mit
- Created: 2022-09-25T03:36:07.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-06-12T04:52:58.000Z (11 months ago)
- Last Synced: 2025-02-02T16:52:50.490Z (4 months ago)
- Language: Python
- Size: 11.4 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Context-Adjusted Proportion of Singletons (CAPS)
## Code for the paper
Tested with Hail version 0.2.107 and Snakemake 7.32
### The "Data" pipeline
`data/` contains [Hail](https://hail.is/) and Snakemake code that requires execution in Google Cloud and saves files to a Google Storage (GS) bucket. Copies of the generated files are available in `files/`.
#### How to run
1. Create a new cluster: `hailctl dataproc start --packages snakemake --requester-pays-allow-buckets gnomad-public-requester-pays --project --bucket --region --num-workers --image-version=2.0.27-debian10`
2. Connect to the cluster: `gcloud beta compute ssh @-m --project ""`
3. `git clone` this repository and navigate to `data/`
4. Run the pipeline: `snakemake --cores all --configfile config.yaml --config gcp_rootdir="/some_directory/"`Alternatively, in Step 4 you can submit the pipeline as a job. Create `job.py` containing the following:
```
import snakemake
snakemake.main(
[
"--snakefile",
"/path/to/Snakefile",
"--cores",
"all",
"--configfile",
"/path/to/config.yaml",
"--config",
'gcp_rootdir="/some_directory/"',
]
)
```
Submit the script with `hailctl dataproc submit job.py`### The "Analysis" pipeline
`analysis/` contains scripts that calculate and visualise CAPS scores using files created in `data/`.
#### How to run
1. Navigate to `CAPS/analysis/`
2. `snakemake --cores all --config gcp="False"` (faster: uses copies from `files/`) or `snakemake --cores all --config gcp="True" gcp_rootdir="/some_directory/"` (slower: uses GS files)## Using CAPS
### Custom sets of variants
To get CAPS estimates for your set of variants, use the `template` file: `snakemake -s template -c1 -C [KEY=VALUE ...]`. The required values are
- `obs` (grouped variants annotated with at least `context`, `ref`, `alt`, `methylation_level`, `singleton_count` and `variant_count` fields)
- `exp` (expected proportions, one per `context`-`ref`-`alt`-`methylation_level` group)
- `var` (variable of interest, must be a valid field in `obs`)
- `calculate_caps_script` (`calculate_caps.R`)
- `viz_scores_script` (`viz_scores.R`)
- `scores` (filename for the output scores)
- `plot` (filename for the output plot)For example, `snakemake -s template -c1 -C obs=analysis/canonical_splice_site_vars.tsv exp=model/phat.tsv var=worst_csq calculate_caps_script=analysis/calculate_caps.R viz_scores_script=analysis/viz_scores.R scores=scores.tsv plot=plot.pdf`.