https://github.com/neherlab/nextclade_data_workflows
https://github.com/neherlab/nextclade_data_workflows
nextclade nextstrain phylogenetics snakemake virus-evolution
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/neherlab/nextclade_data_workflows
- Owner: neherlab
- Created: 2021-07-05T20:55:57.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2025-07-24T09:25:33.000Z (10 months ago)
- Last Synced: 2025-07-24T13:23:44.762Z (10 months ago)
- Topics: nextclade, nextstrain, phylogenetics, snakemake, virus-evolution
- Language: Python
- Homepage:
- Size: 6.96 MB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Checking new tree
1. Download generated files into nextclade data workflow repo:
```bash
scp -rC roemer0001@login-transfer.scicore.unibas.ch:~/nextclade_data_workflows/sars-cov-2/output output
```
1. Plug them into nextclade.org advanced view.
1. Filter to new nodes and check that:
- clades are clean
- no big outliers
1. Check `tag.json` is up to date (ideally update in `profiles/tag.json` for posterity)
1. Check `qc.json` does not regress (ideally update in `profiles/qc.json` for posterity) [beware, codons are 0 indexed]
1. Potentially run `scripts/common_stops.py` and `scripts/common_frameshifts.py` to add new stops/frameshifts that have become more common to `qc.json`
## Identifying most common frame shifts and stop conds
1. Download metadata to `data/metadata_raw.tsv`
1. Run snakemake workflow with following commands/targets:
```bash
snakemake --profile=profiles/clades pre-processed/frameshifts.tsv -R select_frameshifts
snakemake --profile=profiles/clades pre-processed/stops.tsv -R select_stops
```
1. Format most commons stops/fs into qc.json JSON format using
```bash
python3 scripts/common_stops.py
python3 scripts/common_frameshifts.py
```
1. Manually check resul for plausibility and add to qc.json
## Committing to data repo
1. Go to nextclade_data_workflow repo
1. Checkout branch, open PR to master
1. Copy output from workflow repo to data repo
```bash
cp -r output/sars-cov-2/references/MN908947/versions/ ../../nextclade_data/data/datasets/sars-cov-2/references/MN908947/versions
```
1. Update `changelog.md`
1. Get Ivan to review
1. Merge into master
## Release process
Follow release guidelines as outlined here: https://github.com/nextstrain/nextclade_data#dataset-release-process