https://github.com/andersen-lab/bjorn_utils
Key utils for sequence processing moved to a new home from bjorn
https://github.com/andersen-lab/bjorn_utils
Last synced: about 1 year ago
JSON representation
Key utils for sequence processing moved to a new home from bjorn
- Host: GitHub
- URL: https://github.com/andersen-lab/bjorn_utils
- Owner: andersen-lab
- License: gpl-3.0
- Created: 2021-10-15T23:11:29.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-01-23T01:17:47.000Z (over 2 years ago)
- Last Synced: 2025-02-01T06:25:09.340Z (over 1 year ago)
- Language: Python
- Size: 264 KB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `björn_utils`
This is the code repository for `bjorn_utils` - a suite of miscellaneous tools that can be used to:
* prepare results and data files from SARS-CoV-2 sequencing analysis for release to public databases such as GISAID, Google Cloud, and GitHub
## Installation
* Install Anaconda: [instructions can be found here](https://docs.anaconda.com/anaconda/install/)
* Create the `bjorn` environment
```bash
conda env create -f environment.yml -n bjorn_utils
```
* Activate environment
```bash
conda activate bjorn_utils
```
* Install datafunk (inside the activated environment): [instructions (ensure environment is activated during installation)](https://github.com/cov-ert/datafunk)
## Usage
Current stable branch is "main", use all below instructions on that branch.
### Post-processing of SARS-CoV-2 Sequencing Results for Release to public databases
* Activate `bjorn` environment
```bash
conda activate bjorn_utils
```
* Open `run_alab_release.sh` to specify your parameters such as
* filepath to sample sheet containing sample metadata (input)
* filepath to updated metadata of samples that have already been uploaded
* output directory where results are saved
* number of CPU cores available for use
* minimum coverage required for each sample (QC filter)
* minimum average depth required for each sample (QC filter)
* sequencing technology used
* DEFAULT: test parameters
* Open `config.json` to specify your parameters such as
* list of SARS-CoV-2 genes that are considered non-concerning
* i.e. the occurrence of open-read frame (ORF) altering mutations can be accepted
* e.g. ['ORF8', 'ORF10']
* list of SARS-CoV-2 mutations that are considered non-concerning
* i.e. the occurrence of `ORF8:Q27_` can be accepted (B117 exists)
* e.g. ['ORF8:Q27_']
* Run the `run_alab_release.sh` script to initiate the data release pipeline
```bash
bash run_alab_release.sh
```
* `bjorn_utils` assumes the following file structure for the input sequencing data
