https://github.com/kdm9/onlign

Online alignment prototypes for ANU improvements to AUGUR
https://github.com/kdm9/onlign

Last synced: about 2 months ago
JSON representation

Online alignment prototypes for ANU improvements to AUGUR

Host: GitHub
URL: https://github.com/kdm9/onlign
Owner: kdm9
License: mpl-2.0
Created: 2020-04-03T02:28:34.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-04-04T23:49:35.000Z (about 5 years ago)
Last Synced: 2025-02-14T21:47:11.367Z (4 months ago)
Language: Python
Size: 14.6 KB
Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# onlign

Online alignment prototypes for ANU improvements to AUGUR

### Install deps

`conda env create -f environment.yml && conda activate onlign`

### run GISAID ncov

```
mkdir data/
wget -O data/gisaid_cov2020_sequences.fasta $GISAID_DATA_URL

# see `bash ./alignment.sh` for advanced options
bash alignment.sh data/gisaid_cov2020_sequences.fasta
```

## TODOs

- [ ] A more robust way of detecting the N most diverse samples that doesn't pick long tips or otherwise strange sequences
- By which I mean prefiltering the alignments somehow so that the guide tree doesn't include strange samples
- [ ] remove known-dodgy sites and samples from alignment
- [ ] smarter handling of alignment funkyness that maintain compatibility with the recognised coordinate space
- Alignment funkyness e.g. regions gap-or-n-only columns due to funky samples
- [ ] Verify that the "core" alignment matrix doesn't change between new sequences before just concatenating the new seqs together (in `gatherprofilealn.py`)
- [ ] Integrate treebuilding logic *a la* Rob's state machine diagram
- [ ] run with bits of Sebastian's 100k seq simulation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kdm9/onlign

Awesome Lists containing this project

README