Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/clinical-genomics/chanjo
Chanjo provides a better way to analyze coverage data in clinical sequencing.
https://github.com/clinical-genomics/chanjo
bioconda coverage genomics python sambamba sql
Last synced: 6 days ago
JSON representation
Chanjo provides a better way to analyze coverage data in clinical sequencing.
- Host: GitHub
- URL: https://github.com/clinical-genomics/chanjo
- Owner: Clinical-Genomics
- License: mit
- Created: 2013-07-05T13:22:17.000Z (over 11 years ago)
- Default Branch: main
- Last Pushed: 2024-12-10T09:23:05.000Z (12 days ago)
- Last Synced: 2024-12-10T09:31:30.469Z (12 days ago)
- Topics: bioconda, coverage, genomics, python, sambamba, sql
- Language: Python
- Homepage: https://clinical-genomics.github.io/chanjo/
- Size: 14.7 MB
- Stars: 50
- Watchers: 10
- Forks: 12
- Open Issues: 31
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
README
# Chanjo
![Docker build - GitHub](https://github.com/Clinical-Genomics/chanjo/actions/workflows/docker_build_n_publish.yml/badge.svg)
[![PyPI version](https://badge.fury.io/py/chanjo.svg)](https://badge.fury.io/py/chanjo)
[![Coverage Status](https://coveralls.io/repos/github/Clinical-Genomics/chanjo/badge.svg?branch=master)](https://coveralls.io/github/Clinical-Genomics/chanjo?branch=master)Chanjo is coverage analysis for clinical sequencing. It's implemented in Python
with a command line interface that adheres to [UNIX pipeline philosophy][unix].If you find chanjo useful in your project, please cite the [article][publication].
## Installation
Chanjo is distributed through `pip`. Install the latest stable release by
running:```bash
pip install chanjo
```... or locally for development:
```bash
git clone https://github.com/Clinical-Genomics/chanjo.git
cd chanjo
conda install --channel bioconda sambamba
pip install -r requirements-dev.txt --editable .
```## Usage
Chanjo exposes a decomposable command line interface with a nifty config file
implementation.```bash
chanjo init --setup
chanjo load /path/to/sambamba.output.bed
chanjo calculate mean
{"metrics": {"completeness_10": 90.92, "mean_coverage": 193.85}, "sample_id": "sample1"}
```## Docker
When running the dockerized version of [Chanjo](https://hub.docker.com/r/clinicalgenomics/chanjo) the setup process is slightly different. Chanjo depends on a configuration file `config.yaml` and either a sqlite database `chanjo.coverage.sqlite3` or a `MySQL database`, which are created at initialization. For convenience, we provide a docker-compose file containing a mariadb (MySQL-based) service and the chanjo-command line that can be used to set up a demo instance of Chanjo.
Since the database set up (chanjo init command) and sample data insertion are executed by two distinct instances of the same service (chanjo-cli), Docker [volumes](https://docs.docker.com/storage/volumes/) must be used to make sure that the database instance has data continuity during the two steps.
The following examples demonstrate how to set up Chanjo using the docker-compose file using the default definition of exons (init demo files are present in folder `chanjo/init/demo-files`). The config file and the creted database will be stored on the host in a folder named `data`, which is mirrored by folder `/home/worker/data` in the chanjo container . Other exon definitions can be used by mounting them to the container.### Example with MySQL-based database (MariaDB)
```bash
# Build a docker image
docker-compose build
```
```bash
# Set up chanjo and populate demo database with exons definitions
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo -d mysql+pymysql://chanjoUser:chanjoPassword@mariadb/chanjo4_test init --auto /home/worker/data && chanjo --config /home/worker/data/chanjo.yaml link /home/worker/data/hgnc.grch37p13.exons.bed"
```
This initial step will create a `data` folder containing 2 files:
- hgnc.grch37p13.exons.bed --> Exons definitions
- chanjo.yaml --> Contains the database URI, so in the next step you can use this config file instead of `-d mysql+pymysql://chanjoUser:chanjoPassword@mariadb/chanjo4_test````bash
# Load sample
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo --config /home/worker/data/chanjo.yaml load /home/worker/app/chanjo/init/demo-files/sample1.coverage.bed"
```### Example with SQLite database
```bash
# setup chanjo and save populate demo database with exon definitions
docker-compose run --rm -v "${PWD}/data:/home/worker/data" -v "${PWD}/data/database:/home/worker/data/database" chanjo-cli bash -c "chanjo init --auto /home/worker/data && chanjo --config /home/worker/data/chanjo.yaml link /home/worker/data/hgnc.grch37p13.exons.bed"
# load sample
docker-compose run --rm -v "${PWD}/data/chanjo.coverage.sqlite3:/home/worker/app/chanjo.coverage.sqlite3" -v "${PWD}/data:/home/worker/data" chanjo-cli bash -c "chanjo --config /home/worker/data/chanjo.yaml load /home/worker/app/chanjo/init/demo-files/sample1.coverage.bed"## Documentation
Read the Docs is hosting the [official documentation][docs].If you are looking to learn more about handling sequence coverage data in
clinical sequencing, feel free to download and skim through my own
[Master's thesis][thesis] and article references.## Features
### What Chanjo does
Chanjo leverages [Sambamba][sambamba] to annotate coverage and completeness
for a general BED-file. The output can then easily to loaded into a SQL
database that enables investigation of coverage across regions and samples.
The database also works as an API to downstream tools like the Chanjo
Coverage Report generator.### What Chanjo doesn't
Chanjo is not the right choice if you care about coverage for every base across
the entire genome. Detailed histograms is something [BEDTools][bedtools]
already handles with confidence.## Contributors
- Robin Andeer ([robinandeer](https://github.com/robinandeer))
- Luca Beltrame ([lbeltrame](https://github.com/lbeltrame))
- John Kern ([kern3020](https://github.com/kern3020))
- Måns Magnusson ([moonso](https://github.com/moonso))
- Patrik Grenfeldt ([patrikgrenfeldt](https://github.com/patrikgrenfeldt))## License
MIT. See the [LICENSE](LICENSE) file for more details.## Contributing
Anyone can help make this project better - read [CONTRIBUTION](CONTRIBUTION.md)
to get started![bedtools]: http://bedtools.readthedocs.org/en/latest/
[docs]: https://clinical-genomics.github.io/chanjo/
[publication]: https://f1000research.com/articles/9-615/v1
[sambamba]: http://lomereiter.github.io/sambamba/
[thesis]: https://s3.amazonaws.com/tudo/chanjo/RobinAndeerMastersThesisFinal_2013.pdf
[unix]: http://en.wikipedia.org/wiki/Unix_philosophy