https://github.com/neherlab/nextclade_data_workflows
https://github.com/neherlab/nextclade_data_workflows
nextclade nextstrain phylogenetics snakemake virus-evolution
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/neherlab/nextclade_data_workflows
- Owner: neherlab
- Created: 2021-07-05T20:55:57.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-04-26T14:18:31.000Z (about 1 year ago)
- Last Synced: 2024-05-11T05:53:34.502Z (about 1 year ago)
- Topics: nextclade, nextstrain, phylogenetics, snakemake, virus-evolution
- Language: Python
- Homepage:
- Size: 6.74 MB
- Stars: 4
- Watchers: 4
- Forks: 1
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Checking new tree
1. Download generated files into nextclade data workflow repo:
```bash
scp -rC [email protected]:~/nextclade_data_workflows/sars-cov-2/output output
```1. Plug them into nextclade.org advanced view.
1. Filter to new nodes and check that:
- clades are clean
- no big outliers
1. Check `tag.json` is up to date (ideally update in `profiles/tag.json` for posterity)
1. Check `qc.json` does not regress (ideally update in `profiles/qc.json` for posterity) [beware, codons are 0 indexed]
1. Potentially run `scripts/common_stops.py` and `scripts/common_frameshifts.py` to add new stops/frameshifts that have become more common to `qc.json`## Identifying most common frame shifts and stop conds
1. Download metadata to `data/metadata_raw.tsv`
1. Run snakemake workflow with following commands/targets:```bash
snakemake --profile=profiles/clades pre-processed/frameshifts.tsv -R select_frameshifts
snakemake --profile=profiles/clades pre-processed/stops.tsv -R select_stops
```1. Format most commons stops/fs into qc.json JSON format using
```bash
python3 scripts/common_stops.py
python3 scripts/common_frameshifts.py
```1. Manually check resul for plausibility and add to qc.json
## Committing to data repo
1. Go to nextclade_data_workflow repo
1. Checkout branch, open PR to master
1. Copy output from workflow repo to data repo```bash
cp -r output/sars-cov-2/references/MN908947/versions/ ../../nextclade_data/data/datasets/sars-cov-2/references/MN908947/versions
```1. Update `changelog.md`
1. Get Ivan to review
1. Merge into master## Release process
Follow release guidelines as outlined here: https://github.com/nextstrain/nextclade_data#dataset-release-process