https://github.com/solida-core/toolkit
DNA Reference Creator: this pipeline is aimed at downloading and building indexes and accessory files for reference genomes
https://github.com/solida-core/toolkit
bioinformatics python
Last synced: 7 months ago
JSON representation
DNA Reference Creator: this pipeline is aimed at downloading and building indexes and accessory files for reference genomes
- Host: GitHub
- URL: https://github.com/solida-core/toolkit
- Owner: solida-core
- License: gpl-3.0
- Created: 2019-05-09T10:14:06.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-10-21T10:03:11.000Z (over 3 years ago)
- Last Synced: 2024-12-27T20:46:13.176Z (over 1 year ago)
- Topics: bioinformatics, python
- Language: Python
- Size: 19.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SOLIDA-CORE [TOOLKIT]()
Collection of [solida-core]() useful scripts for accessory and supplementary files management.
## Included Scripts:
* **[reference_organizer](#reference_organizerpy)**: this script performs the organization of reference files in [solida-core]() required directory-tree
## Download TOOLKIT:
TOOLKIT is a Git repository, to download it type:
```
git clone https://github.com/solida-core/toolkit.git
cd toolkit
```
##Script MANUALS:
### reference_organizer.py
This script performs the organization of reference files in [solida-core]() required directory-tree structure.
The user is required to entry the desired reference folder path and to choose between hg19 and hg38 human genome versions.
The scripts attempt to connect and download files from the FTP server of **[GATK resource bundle](https://software.broadinstitute.org/gatk/download/bundle)**.
Files are then extracted and placed in the [solida-core]() expected directory structure.
Given the limit of 25 users of FTP server, the script performs multiple connection attempts [default=5]. This value can be set with the `--reconnection_attempts` parameter.
To get script usage, type:
```bash
python reference_organizer.py -h
```
```bash
usage: reference_organizer.py [-h] --reference_dir PATH --release hg19/hg38
[--reconnection_attempts int] [--force]
Prepare reference files for solida-core pipelines
optional arguments:
-h, --help show this help message and exit
--reference_dir PATH, -w PATH
Destination folder for reference files
--release hg19/hg38, -r hg19/hg38
UCSC Genome Release to download: [hg19,hg38]
--reconnection_attempts int, -a int
Number of connection attempts to perform in case of
busy FTP server [default: 5]
--force Download files in the directory even if they exists
(Default: FALSE)
```