https://github.com/kdm9/onlign
Online alignment prototypes for ANU improvements to AUGUR
https://github.com/kdm9/onlign
Last synced: about 2 months ago
JSON representation
Online alignment prototypes for ANU improvements to AUGUR
- Host: GitHub
- URL: https://github.com/kdm9/onlign
- Owner: kdm9
- License: mpl-2.0
- Created: 2020-04-03T02:28:34.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-04T23:49:35.000Z (about 5 years ago)
- Last Synced: 2025-02-14T21:47:11.367Z (4 months ago)
- Language: Python
- Size: 14.6 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# onlign
Online alignment prototypes for ANU improvements to AUGUR
### Install deps
`conda env create -f environment.yml && conda activate onlign`
### run GISAID ncov
```
mkdir data/
wget -O data/gisaid_cov2020_sequences.fasta $GISAID_DATA_URL# see `bash ./alignment.sh` for advanced options
bash alignment.sh data/gisaid_cov2020_sequences.fasta
```## TODOs
- [ ] A more robust way of detecting the N most diverse samples that doesn't pick long tips or otherwise strange sequences
- By which I mean prefiltering the alignments somehow so that the guide tree doesn't include strange samples
- [ ] remove known-dodgy sites and samples from alignment
- [ ] smarter handling of alignment funkyness that maintain compatibility with the recognised coordinate space
- Alignment funkyness e.g. regions gap-or-n-only columns due to funky samples
- [ ] Verify that the "core" alignment matrix doesn't change between new sequences before just concatenating the new seqs together (in `gatherprofilealn.py`)
- [ ] Integrate treebuilding logic *a la* Rob's state machine diagram
- [ ] run with bits of Sebastian's 100k seq simulation-