https://github.com/stjudecloud/ngsderive
Forensic analysis tool useful in backwards computing information from next-generation sequencing data.
https://github.com/stjudecloud/ngsderive
bioinformatics computational-biology gene-model genomics next-generation-sequencing ngs strandedness strandedness-inference workflow workflow-engine
Last synced: 3 months ago
JSON representation
Forensic analysis tool useful in backwards computing information from next-generation sequencing data.
- Host: GitHub
- URL: https://github.com/stjudecloud/ngsderive
- Owner: stjudecloud
- License: mit
- Created: 2019-11-25T14:28:36.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-08-05T07:34:34.000Z (about 1 year ago)
- Last Synced: 2024-08-10T21:27:47.314Z (about 1 year ago)
- Topics: bioinformatics, computational-biology, gene-model, genomics, next-generation-sequencing, ngs, strandedness, strandedness-inference, workflow, workflow-engine
- Language: Python
- Homepage: https://stjudecloud.github.io/ngsderive
- Size: 1.23 MB
- Stars: 11
- Watchers: 4
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
ngsderive
Forensic analysis tool useful in backwards computing information from next-generation sequencing data and annotating splice junctions.
Explore the docs »
Request Feature
·
Report Bug
·
⭐ Consider starring the repo! ⭐
> Notice: `ngsderive` is largely a forensic analysis tool useful in backwards computing information
> from next-generation sequencing data. Notably, most results are provided as a 'best guess' —
> the tool does not claim 100% accuracy and results should be considered with that understanding.
> An exception would be the `junction-annotation` tool which analyzes more concrete evidence than the other tools.## 🎨 Features
The following attributes can be guessed using ngsderive:
* Illumina Instrument. Infer which Illumina instrument was used to generate the data by matching against known instrument and flowcell naming patterns. Each guess comes with a confidence score.
* RNA-Seq Strandedness. Infer from the data whether RNA-Seq data was generated using a Stranded-Forward, Stranded-Reverse, or Unstranded protocol.
* Pre-trimmed Read Length. Compute the distribution of read lengths in the file and attempt to guess what the original read length of the experiment was.
* PHRED Score Encoding. Infers which encoding scheme was used to store PHRED scores as ASCII characters.
* Junction Annotation. Annotates splice junctions as novel, partial novel, or known in comparison to a reference gene model.## 📚 Getting Started
### Installation
You can install ngsderive using the Python Package Index ([PyPI](https://pypi.org/)).
```bash
pip install ngsderive
```## 🖥️ Development
If you are interested in contributing to the code, please first review our [CONTRIBUTING.md][contributing-md] document.
To bootstrap a development environment, please use the following commands.
```bash
# Clone the repository
git clone git@github.com:stjudecloud/ngsderive.git
cd ngsderive# Install the project using poetry
poetry install
```## 🚧️ Tests
ngsderive provides a (currently patchy) set of tests — both unit and end-to-end.
```bash
py.test
```## 🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://github.com/stjudecloud/ngsderive/issues). You can also take a look at the [contributing guide][contributing-md].## 📝 License
This project is licensed as follows:
* All code related to the `instrument` subcommand is licensed under the [AGPL v2.0][agpl-v2]. This is not due to any strict requirement, but out of deference to some [code][10x-inspiration] I drew inspiration from (and copied patterns from), the decision was made to license this code consistently.
* The rest of the project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.Copyright © 2020 [St. Jude Cloud Team](https://github.com/stjudecloud).
[10x-inspiration]: https://github.com/10XGenomics/supernova/blob/master/tenkit/lib/python/tenkit/illumina_instrument.py
[agpl-v2]: http://www.affero.org/agpl2.html
[contributing-md]: https://github.com/stjudecloud/ngsderive/blob/master/CONTRIBUTING.md
[license-md]: https://github.com/stjudecloud/ngsderive/blob/master/LICENSE.md