https://github.com/cbg-ethz/scsommerclock
Test for a molecular clock based on the phylogenetic tree inferred from single-cell DNA sequenzing data
https://github.com/cbg-ethz/scsommerclock
cancer-genomics molecular-clock neutrality-test scdnaseq
Last synced: 5 months ago
JSON representation
Test for a molecular clock based on the phylogenetic tree inferred from single-cell DNA sequenzing data
- Host: GitHub
- URL: https://github.com/cbg-ethz/scsommerclock
- Owner: cbg-ethz
- License: mit
- Created: 2022-07-19T09:41:17.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-02T13:40:10.000Z (over 2 years ago)
- Last Synced: 2025-03-30T09:51:17.268Z (6 months ago)
- Topics: cancer-genomics, molecular-clock, neutrality-test, scdnaseq
- Language: Python
- Homepage:
- Size: 2.21 MB
- Stars: 6
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Single-Cell SOMatic MolEculaR Clock testing
This repo contains a **Poisson Tree (PT) Test** for the existence of a somatic clock in single-cell phylogenies.
In short, it tests if different cell lineages evolve at a similar rate, accumulating mutations according to a molecular clock.
As input the test requires a mutation matrix, a phylogeny of contemporaneously sampled cells, and error rates.This repo contains scripts for running
- the **PT Test**
- and, in a subfolder (AnalysisPipelines), scripts for
- the processing of real scDNA-seq data
- the analysis of real scDNA-seq data
- the simulation of scDNA-seq data (via coalescent)
- the analysis and plotting of simulated scDNA-seq data# Installation
## Requirements
- python3.X:
- ete3
- numpy
- pandas
- scipyThe requirements cant be installed using pip:
```bash
python -m pip install ete3 pandas scipy
```# Usage
The **PT test** can be run with the following shell command:
```bash
python run_PT_test.py [-o] [-excl] [-incl] [-w] [-FN] [-FP]
```## Input files
The **PT test** requires two input files:
- Called variants in VCF format ([VCF info](https://samtools.github.io/hts-specs/VCFv4.2.pdf)), where each sample is a cell
- An inferred phylogenetic tree in newick format (cell names need to be the same as in the VCF).> ## Note
> Trees can be inferred, for example, with [CellPhy](https://github.com/amkozlov/cellphy) or [infSCITE](https://github.com/cbg-ethz/infSCITE); both outputs are compatible with the **PT test**## Optional Arguments
- `-o `, Output file. Default = .poissonTree_LRT.tsv.
- `-excl `, Regex pattern for samples/cells to exclude. Default = none.
- `-incl `, Regex pattern for samples/cells to include. If set, only these samples/cells are included. Default = all cells.
- `-w `, Maximum weight values. Default = 100, 200, ..., 1000'.
- `-FN `, Estimated FN rate (for CellPhy and infSCITE: inferred from .log/stdout file).
- `-FP `, Estimated FP rate (for CellPhy and infSCITE: inferred from .log/stdout file).# Example
To run the PT test on the simulated data in the `example_data` folder, execute
```bash
python run_PT_test.py example_data example_data/data_simulated_clock.vcf.gz example_data/data_simulated_clock.raxml.bestTree
```
or
```bash
python run_PT_test.py example_data example_data/data_simulated_noclock.vcf.gz example_data/data_simulated_noclock.raxml.bestTree
```The former data is simulated under a molecular clock, the later with a deviation from the clock (evolutionary rate amplified by 5x in a subtree)
> ## Note
> FN and FP rate are inferred from the `.raxml.log` file