https://github.com/hammerlab/cohorts
Utilities for analyzing mutations and neoepitopes in patient cohorts
https://github.com/hammerlab/cohorts
Last synced: 9 months ago
JSON representation
Utilities for analyzing mutations and neoepitopes in patient cohorts
- Host: GitHub
- URL: https://github.com/hammerlab/cohorts
- Owner: hammerlab
- License: apache-2.0
- Created: 2016-03-14T15:52:59.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2018-06-07T20:38:26.000Z (over 7 years ago)
- Last Synced: 2025-04-11T21:49:28.238Z (9 months ago)
- Language: Python
- Size: 555 KB
- Stars: 20
- Watchers: 11
- Forks: 4
- Open Issues: 67
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-python-fa - cohorts - تجزیه و تحلیل گروهی بیماران در مطالعات بیولوژیکی. (زیست شناسی و بیوتکنولوژی / کار با زمان و تقویم)
README
[]() [](https://travis-ci.org/hammerlab/cohorts) [](https://coveralls.io/github/hammerlab/cohorts?branch=master)
Cohorts
=======
Cohorts is a library for analyzing and plotting clinical data, mutations and neoepitopes in patient cohorts.
It calls out to external libraries like [topiary](https://github.com/hammerlab/topiary) and caches the results for easy manipulation.
Cohorts requires Python 3 (3.3+). We are no longer maintaining compatability with Python 2. For context, see this [Python 3 statement](www.python3statement.org).
Installation
------------
You can install Cohorts using [pip](https://pip.pypa.io/en/latest/quickstart.html):
```bash
pip install cohorts
```
Features
--------
* Data management: construct a `Cohort` consisting of `Patient`s with `Sample`s.
* Use `varcode` and `topiary` to generate and cache variant effects and predicted neoantigens.
* Provenance: track the state of the world (package and data versions) for a given analysis.
* Aggregation functions: built-in functions such as `missense_snv_count`, `neoantigen_count`, `expressed_neoantigen_count`; or create your own functions.
* Plotting: survival curves via `lifelines`, response/no response plots (with Mann-Whitney and Fisher's Exact results), ROC curves. Example: `cohort.plot_survival(on=missense_snv_count, how="pfs")`.
* Filtering: filter collections of variants/effects/neoantigens by, for example, variant statistics.
* Pre-define data sets to work with. Example: `cohort.as_dataframe(join_with=["tcr", "pdl1"])`.
In addition, several other libraries make use of `cohorts`:
* [pygdc](http://github.com/hammerlab/pygdc)
* [query_tcga](http://github.com/jburos/query_tcga)
Quick Start
---------------
One way to get started using Cohorts is to use it to analyze TCGA data.
As an example, we can create a cohort using [query_tcga](http://github.com/jburos/query_tcga):
```python
from query_tcga import cohort, config
# provide authentication token
config.load_config('config.ini')
# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
project_data_dir='data')
# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
cache_dir='data-cache')
```
Then, use `plot_survival()` to summarize a potential biomarker (e.g. `snv_count`) by survival:.
```python
from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')
```
Which should produce a summary of results including this plot:

We could alternatively use `plot_benefit()` to summarize OS>12mo instead of survival:
```python
blca_cohort.plot_benefit(snv_count)
```

See the full example in the [quick-start notebook](http://nbviewer.jupyter.org/github/hammerlab/tcga-blca/blob/master/Quick-start%20-%20using%20Cohorts%20with%20TCGA%20data.ipynb)
Building from Scratch
--------------
```python
patient_1 = Patient(
id="patient_1",
os=70,
pfs=24,
deceased=True,
progressed=True,
benefit=False
)
patient_2 = Patient(
id="patient_2",
os=100,
pfs=50,
deceased=False,
progressed=True,
benefit=False
)
cohort = Cohort(
patients=[patient_1, patient_2],
cache_dir="/where/cohorts/results/get/saved"
)
cohort.plot_survival(on="os")
```
```python
sample_1_tumor = Sample(
is_tumor=True,
bam_path_dna="/path/to/dna/bam",
bam_path_rna="/path/to/rna/bam"
)
patient_1 = Patient(
id="patient_1",
...
snv_vcf_paths=["/where/my/mutect/vcfs/live",
"/where/my/strelka/vcfs/live"]
indel_vcfs_paths=[...],
tumor_sample=sample_1_tumor,
...
)
cohort = Cohort(
...
patients=[patient_1]
)
```