Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nvictus/blindex
https://github.com/nvictus/blindex
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/nvictus/blindex
- Owner: nvictus
- Created: 2018-11-24T16:51:51.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-11-27T09:26:59.000Z (about 6 years ago)
- Last Synced: 2024-10-03T12:17:50.419Z (3 months ago)
- Size: 31.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# assembliotheque
Inspired by https://github.com/dpryan79/ChromosomeMappings.
1. [Find the ftp directory](https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#howtofind) of your favorite reference assembly. The field named "ftp_path" provides the path to the FTP directory containing the data for each assembly. Find the appropriate file with the assembly summary.
* Either the two master assembly summary files:
- ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt
- ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
* Or an assembly summary file for a taxonomic group from the appropriate directory under genbank or refseq. e.g.
- ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt
* Or an assembly summary file for a species from the appropriate directory under genbank or refseq. e.g.
- ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Salmonella_enterica/assembly_summary.txt2. Take the commented metadata and convert it into a `{assembly}.yaml` file. Normalize the property names to those in GRCh38.
3. Extract the columns in into a `{assembly}` TSV file and normalize the column names to those in GRCh38. If chromosome lengths are not available, you can get them from UCSC and join them into the table.
## Chromosome order
Ordered naturally by chromosome/plasmid.
Fully **assembled** chromosomes/plasmids are followed by **unlocalized** scaffolds, followed by **unplaced** scaffolds, followed by **alt** scaffolds.## Patch releases
Do not include patch sequences! F that noise.
## TODO
* yaml and table validator scripts